Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Corrected links that should have been relative instead of absolute.

Evaluated by C. Poirier, Statistics Canada, 2003

Last updated in January, 2010

SYSTEM INFORMATION


Full name:

BANFF - Generalized Edit and Imputation System

Version:

2.3

Year:

2008

Developer:

Statistics Canada

DESCRIPTION

The Banff functions are close to the GEIS ones (see the evaluation of GEIS), but the system works in a SAS environment rather than Oracle. Banff works on any platform where SAS is available, as opposed to GEIS which worked only in Unix or mainframe.  An interesting improvement in Banff is its modular aspects where all the functions are independent from each another.

Banff is not an editing system as such but targets more the imputation process.  It is usually used after preliminary editing associated with the collection and capture phases and respondent follow-up have been completed.  Linear programming techniques are used to conduct the localization of fields to be imputed and search algorithms are used to perform automatic imputations.  The processing is entirely driven by edit/imputation linear rules defined by means of numeric variables.  More details are given in Statistics Canada (2009).  Banff steps are:

Edit specification and analysis:  This step serves to identify the relationships which characterize acceptable records.  The relationships are expressed as a set of n linear edit rules in the form:

a11x1+ a12x2+ ... + a1mxm <= b1
. . .
an1x1+ an2x2+ ... + anmxm <= bn

where the aij's and bi's are user-defined constants, and the xj's represent the m survey variables.  The rules are connected with logical 'and's, which means each rule must be satisfied for a record to pass the edits.  The system checks for edit consistency, redundancy and hidden equalities.  This step permits an iterative approach to the design of the best possible set of edits.

Outlier detection:  This step aims at the detection of univariate outliers.  It performs comparisons of selected variables across records and identifies outlying observations based on the median M, and the first and third quartiles Q1 and Q3 of the population.  An observed value x will be identified as an outlier if it is outside the acceptance interval (M-kQ1, M+kQ3), where k is set by the user.  This method can be used to identify variables to be imputed or to be excluded from subsequent calculations.

Error localization:  The error localization uses a linear programming approach to minimize the number of fields requiring imputation.  This is an application of the rule of minimum change.  The step identifies the fields that need to be imputed in order for the record to pass all the edit rules.  The problem is expressed as a constrained linear program and solved using Chernikova's algorithm.  The system also allows the use of weights for each variable when the user wishes to exert some influence on the identification of the fields to be imputed.  Although the algorithm is costly to run, it constitutes one of the main features of Banff.

Automatic imputation:  There are five imputation functions offered: Deterministic, Donor, Estimators, Mass Imputation and Prorating. The imputation function three imputation methods: Deterministic, Donor, and Estimators.  Based on the edit rules, the deterministic imputation identifies cases in which there is only one possible solution that would allow the record to satisfy the rules.  The donor imputation  replaces the values to be imputed using data from the closest valid record, also referred to as the nearest neighbour.  For a given record, a subset of the fields which do not need imputation are automatically used as matching fields, and the maximum standardized difference among these individual fields is used as the distance function.  The user can specify post-imputation edits to make sure the nearest neighbour is close enough to be used as a donor.  The imputation by estimators provides a wide set of techniques using historical or current information.  Built-in estimators are: Previous values, previous/current means, trends, and multiple regressions. If a non-standard estimator is required, a user-defined estimator can also be specified. Mass imputation is a special case of donor imputation where the variables to impute are always the same for each record. The required conditions are such that records will pass the edits no matter what values are imputed. The prorating function will adjust components of a sum so that it matches the total of a linear edit with an equality sign.

Banff allows the use of different imputation techniques across questionnaire sections and sub-populations.  The use of a sequence of techniques is also possible where, at each step, the user can include/exclude previously imputed data in the process.  The system works with SAS as each of the nine Banff functions is a customized SAS procedure. It is also available in SAS Enterprise Guide as Banff tasks.  Those tasks help the user in specifying the parameters and edit rules using the SAS Enterprise Guide interface. Banff was developed in C language. The functionality described above is quite adequate for economic surveys as Banff works only with numeric data.  Newly initiated developments will allow the use of metadata. The users will define the parameters in a spreadsheet or XML and the system will automatically generate all SAS code to perform edit and imputation. Programming knowledge will no longer be a requirement for the users.

STRENGTHS

The strengths of Banff are its capacity to find minimum changes for any set of rules being expressed as a series of linear equations, its automated donor imputation function driven by the edit rules, and its flexibility within a SAS environment.  This imputation function runs with almost no intervention from the user since it derives the matching fields by itself.  It simply uses the response pattern, whatever it is, to look for a donor.  The minimum change rule contributes to increase the chance of preserving a relatively good data integrity given the data in error. Banff can deal with both positive and negative values.  The flexible estimator module of Banff, the diagnostic reports and the on-line tutorial, coupled with a continuous user support constitute the desirable aspects of the system.

WEAKNESSES

Banff only deals with numeric variables.  Altough the imputation module can now process negative values, the editing process still cannot. This may cause problems to financial surveys.  The system now requires SAS to be installed on the platform of interest.

FUNCTIONAL EVALUATION

 LEGEND

***

The implementation offers sub-functions or options being required by a wide range of survey applications.  

**

The implementation have a less complete set of options.

*

The implementation offers a partial functionality. Options are too restrictive or not generalized enough.

-

No stars are assigned when the functionality is not offered at all.


TYPE OF DATA

 

 

Quantitative data

***

 

Qualitative data

*

 

EDITING FUNCTIONS

 

 

Data verification

*

 

On-line correction

-

 

Error localization

***

 

Minimum changes

***

 

User-defined changes

-

 

Outlier detection

**

 

IMPUTATION FUNCTIONS

 

 

Deterministic imputation

***

 

Donor imputation

***

 

Imputation by estimators

***

 

Multiple imputation

-

 

GENERAL FEATURES

 

 

Graphical user interface

-

 

User-friendliness

**

 

On-line help

-

 

On-line tutorial

-

 

Documentation

***

 

Diagnostic reports

***

 

Integration

-

 

Reusable code

**

 

Portability

**

 

Flexibility

**

 

User support

***

 

Acquisition cost

30,000 USD

 


REFERENCES

Statistics Canada (2009).  "Functional Description of Banff - the Generalized Edit and Imputation System".  Statistics Canada Technical Report.