Skip to end of metadata
Go to start of metadata

Evaluated by C. Poirier, Statistics Canada, 1999

SYSTEM INFORMATION

Full name:

NIM - New Imputation Methodology

Version:

1.0

Year:

1999

Developer:

Statistics Canada

DESCRIPTION

The New Imputation Methodology (NIM) was developed at Statistics Canada.  NIM targets the social surveys because it deals mostly with qualitative variables.  The system was initially developed and put in place for the 1996 Canadian Census.  It uses donor imputation as a unique imputation method.  As detailed in Bankier et al. (1997), its goal is to minimize the number of changes while making sure the imputation actions are plausible.  It always performs record imputation based on a single donor.  Since the Census data are collected at the household level with information for each person within the household, the system is designed to identify donors for the entire household, not for individual persons.  NIM is an imputation system more than an edit system.  It is used after the collection and capture editing has been completed.  It uses edit rules to identify records that need imputation and records that can be used as donors.  In practice, only conflict rules are implemented but in theory, validity rules can be used as well.  A failed-edit record is identified if at least one of the conflict rules is true.  The rules are defined through decision logic tables (DLT).

When failed edit and passed edit records are identified, the system tries to find, for each record to be imputed, a record that can be used as a donor.  The search targets a donor coming from the set of passed records and being close to the failed edit record.  In making a choice amongst the records in the donor pool, the system takes into account all feasible actions for each potential donor.  A feasible action is the transfer of donor data into a set of recipient's fields such that the newly imputed record, say a, passes the edit rules.   NIM will randomly select a donor p and an imputation action a from the feasible actions which minimize the following composite distance, Dfpa, for the failed record f :

                 Dfpa=   ß D(f ,a) + (1-ß)D(a,p)              0 < ß < 1

where ß is a user-defined constant and D(f ,a) is a user-defined distance between f and a.  In this equation, a  ß close to one would give more importance to the minimum number of changes than to the similarity of the imputed action and the passed record.  Variations can be made by accepting not only the minimum composite distance but also some near minimum changes as possible imputation actions.  This is done by a random selection with unequal probabilities amongst the minimum and near minimum change scenarios.

Other modules also exist to specifically deal with the persons' sequence within the household in order to better identify couples and to optimize the process.  In practice, the function described above becomes costly to minimize as the number of passed-edit records and potential actions grows.  Highly efficient algorithms were introduced in order to alleviate the potential shortcoming.

The system was developed in the C language and runs in a mainframe environment.  Current limitations force the user to use a pre-processor to replicate DLTs for edits between persons.  The system was used successfully for the 1996 Canadian Census of Population but a generalization would be required for other applications.

STRENGTHS

NIM finds the donor before it identifies the minimum number of changes needed.  Because the minimum changes do not necessarily guarantee a plausible imputation, NIM was developed to meet two objectives at once: to minimize changes and to assure plausible imputations.  NIM includes a generic distance function for the donor imputation.  This means the user can define the distance function for each matching field.  Its first use for the 1996 Canadian Census was a success with the processing of 11 million households within a month.

WEAKNESSES

NIM was developed essentially for the Canadian Census which surveys persons within households.  In its current form, it may be difficult to reuse NIM for a wide variety of surveys.  Although its generalization is being considered, its feasibility has not been demonstrated yet.  NIM can process quantitative variables along with qualitative variables, but the performance of the system with more than a few quantitative variables has yet to be demonstrated.  Some recent theoritical results, however, suggest this may be feasible.

FUNCTIONAL EVALUATION

 LEGEND

***

The implementation offers sub-functions or options being required by a wide range of survey applications.  

**

The implementation have a less complete set of options.

*

The implementation offers a partial functionality. Options are too restrictive or not generalized enough.

-

No stars are assigned when the functionality is not offered at all.


TYPE OF DATA

 

 

Quantitative data

*

 

Qualitative data

***

 

EDITING FUNCTIONS

 

 

Data verification

*

 

On-line correction

-

 

Error localization

-

 

Minimum changes

***

 

User-defined changes

-

 

Outlier detection

-

 

IMPUTATION FUNCTIONS

 

 

Deterministic imputation

-

 

Donor imputation

***

 

Imputation by estimators

-

 

Multiple imputation

*

 

GENERAL FEATURES

 

 

Graphical user interface

-

 

User-friendliness

**

 

On-line help

-

 

On-line tutorial

-

 

Documentation

***

 

Diagnostic reports

**

 

Integration

-

 

Reusable code

**

 

Portability

**

 

Flexibility

-

 

User support

-

 

Acquisition cost

N.A.

 

REFERENCES

Bankier, M., Houle, A.-M., Luc, M., C., and Newcombe, P. (1997).  "1996 Canadian Census Demographic Variables Imputation".  Proceedings of the Section on Survey Research Methods, American Statistical Association.

  • No labels