Evaluated by C. Poirier and R. Kozak, Statistics Canada, 1999
Solas is produced by Statistical Solutions Ltd., a statistical software company based in Ireland with offices in the United States. Solas v1.1 was designed for the imputation of missing data, primarily in biostatistical research. Although the documentation claims support for both numeric and ordered categorical variables, the imputation functions are more widely applicable to numeric variables. Solas also includes data analysis tools but these are used in the imputation process rather than being the main feature of the system. The system does not include any edit function. It simply imputes fields having missing data. Its main feature is the multiple imputation, a technique developed by Rubin (1978). It also includes the standard hot deck method, and two estimator type imputations, namely the current mean and the historical imputation.
The hot deck imputation attempts to find matching records which are similar to records to be imputed with respect to auxiliary matching variables and precedence rules defined by the user. Exact matches are targeted, and if no records are found, Solas will automatically drop the matching variables one by one given the precedence rules until it has found an exact match. The process is completed when an exact match is identified. Whenever several exact matches are found, one can be selected randomly to impute all the missing fields of the record to be imputed. If absolutely no matches are found, a random selection from the entire pool is possible.
The multiple imputation is a repetitive execution of an imputation strategy. It can be applied to both cross-sectional and longitudinal data and many imputation parameters can be controlled by the user. Solas will impute several, say M, values for each missing field. The results can be combined to produce overall estimates with variances for the variables of interest. The User reference manual (Statistical Solutions, 1997) describes the theory well. It states that:
"Solas applies an implicit model approach (using a logistic regression model) based on propensity scores and an approximate Bayesian bootstrap to generate the imputations. The multiple imputations are independent repetitions from a posterior predictive distribution for the missing data given the observed data."
The current mean is one of the two estimator functions. It consists of imputing the missing values with the basic mean of the other records in the imputation class. For ordered categorical variables, the mode is used. The historical imputation is the second estimator type imputation. It simply imputes the value from the previous data period, with no transformation. The value is copied as is. Solas provides the capability to define a weighting variable, referred to as a "case frequency variable". As its name suggests, the weights are defined for each record. A weighted observation will be processed by Solas as repetitions of an original observation.
The Solas system works on IBM compatible personal computers with 80-486 or higher processors. It can read and write data in different formats including ASCII, SAS, DBase, FoxPro Paradox, Excel, Lotus, BMDP, and others. The system can be installed by almost any user, with no support required. Its interface is user-friendly and the help function is quite complete.
Solas presents a good multiple imputation function with many control options for that method. The nice graphical interface of the system represents another good aspect. Solas is easy to install and user-friendly. Its on-line help function is adequate for the functionality Solas provides. Once imputation is completed, a copy of the resulting data sheet appears on the screen. The imputed values are shown in blue, in contrast to the reported values which appear in black. Finally, the small size and the portability of the system makes it very practical. Empirical evaluations have shown that the system is relatively quick.
The functionality aside from the multiple imputation is very basic. There is no control on the number of required records from which the information is extracted to perform the group mean or the donor imputation. The historical method includes no control on the imputation status of the historical information before its use in the process. The imputation techniques cannot be used in sequence. If missing values are still present after imputation is applied, the user must manually submit another run of Solas with a different method to complete the data set. Finally, no imputation summary report is produced after the imputation has been completed. Solas is mostly recommended for biostatisticians. It is less appropriate for complex surveys where we observe high numbers of variables linked together with complex relationships.
Rubin, D.B. (1978). "Multiple imputations in Sample Surveys - A Phenomenological Bayesian Approach to Nonresponse". Proceedings of the Section on Survey Research Methods, American Statistical Association.
Statistical Solutions, (1997). "Solas For Missing Data Analysis 1.1: User Reference". Cork, Ireland, Statistical Solutions Inc.