Evaluated by C. Poirier, Statistics Canada, 1999
StEPS development was initiated in 1996 by the U.S. Census Bureau to provide integrated tools for the processing of survey steps, and to replace 15 existing systems used for U.S. economic surveys. As detailed in the system concepts and overview document (U.S. Census Bureau, 1996), StEPS is more than just an editing and imputation system. It includes a module to control the collection of information, a data review and on-line correction module, an estimation and variance calculation module, and a tabulation and disclosure module. It can provide general diagnostic tables, including response rates, imputation rates, etc. For the purpose of the present knowledge base, the focus is constrained to the editing and imputation modules.
The data editing module of StEPS allows simple verifications such as ascertaining the presence of data values for required items, range verifications, and verifications of valid categories. It also provides more complex tests such as balance tests which verify the additivity of items against selected totals, and survey rules to verify field relationships within observations. Skip pattern validations and field positivity verifications will eventually be implemented. The edit options offer the basic functionality and should a complex rule be required, the user can provide his or her own program statements. Such program coding is made easy by special windows integrated in the menus. The statements must be provided in SAS because SAS is the unique foundation software of StEPS. All verifications can be performed on mixtures of current and historical values. In case of edit failures, concurrent users can individually modify reported data in an interactive manner. The manual modification and the verification modules can be used iteratively until the data are ready for the automated imputation step.
StEPS has two modules for imputation, referred to as "simple imputation" and "general imputation". The simple-imputation module performs deterministic imputations and flags the resulting imputed values as if they had been reported. The imputation formulas used by the simple-imputation module are defined by the user through the use of SAS windows. Any group of SAS statements, regardless of their complexity, can be used to define the imputation formula. The following single-statement is just an example of such a formula.
yi= max(k , xi)
These user-defined functions are built from data entries and constants, not from any macro information like population means or trends. The general-imputation module aims to replace with valid values, any invalid values identified in the above editing process. The strategy is user-defined, as opposed to the automated localization of minimum changes. The imputation techniques available in StEPS are mostly estimator type techniques. This includes the imputation by auxiliary data items, sum of data items, historical values, means, trends, ratios, and multiple regressions. All estimator functions can be evaluated from weighted or unweighted data. The system can exclude several types of records from the calculation of estimators. Furthermore, for the ratio and mean estimators, StEPS allows the exclusion of records based on upper and lower bounds U and L.
The prorating transformation represents another imputation action offered in StEPS. The function consists of adjusting every component of a sum in order to obtain a known total. Currently, StEPS can prorate multiple one-dimensional sums that have a common total. Future versions of StEPS will be able to prorate nested one-dimensional sums (A+B=C and C+D=E) and two-dimensional sums.
The system is developed entirely in SAS and works in a UNIX environment. A complete graphical user interface is available. The file and variable naming convention eases the processing of historical edit and imputation. Indeed, the field names include the numeric field codes, the field status (Reported, Adjusted, Edited or Weighted) and the survey period. The database architecture is based on a data point model. A record corresponds to a data item of a respondent unit. It includes three basic components: the unit identifier, the field name (code/status/period) and the value itself.
The major strength of StEPS is its integration of several survey processing modules: Information management, data review and on-line correction, editing, imputation, estimation, variance calculation, tabulation and disclosure. The system uses SAS files to input and output data so survey statisticians can take advantage of the analytical strengths of this product. For the edit and imputation modules, both the survey specifications and the implementations are integrated into the graphical interface. That means, a survey manager can provide his or her specifications directly through the system and the application developer just has to translate these into system rules. The product standardization resulted in a good file and variable naming convention which simplifies all the processes. Its set of estimator imputations is complete and efficient.
StEPS does not provide a minimum change functionality nor any other automated error localization module. Thus, for every combination of errors, the user has to specify which fields need to be imputed. Also, StEPS does not offer a donor imputation function. This means the imputation strategy for a brand new survey, with no historical data nor administrative information, is limited to estimator imputations based on current values. Although the SAS windows in which users specify special rules are practical, a certain level of SAS knowledge is a prerequisite. In practice, this is not always available and thus some SAS training has to be provided in addition to the system-specific training.
U.S. Census Bureau (1996). "StEPS: Concepts and Overview". Technical report from the U.S. Census Bureau.