Skip to end of metadata
Go to start of metadata
Space Search

Searching K-Base: Data Editing Knowledge Base

Table of Contents

Data Capture

The process by which collected data are put in a machine-readable form. Elementary edit checks are often performed in sub-modules of the software that does data capture.

Source: UNECE Data Editing Group

Data Checking

Activity through which the correctness conditions of the data are verified. It also includes the specification of the type of the error or condition not met, and the qualification of the data and its division into the "error free" and "erroneous data". Data checking may be aimed at detecting error-free data or at detecting erroneous data.

Source: UNECE Data Editing Group

See also: Data Review

Data Collection

The process of gathering data. Data maybe observed, measured, or collected by means of questioning, as in a survey or census response.

Source: UNECE Data Editing Group

Data Correction (correction of errors in data)

Activity of checking data which was declared (is possibly) erroneous.

Source: UNECE Data Editing Group

Data Editing

The activity aimed at detecting and correcting errors (logical inconsistencies) in data.

Source: UNECE Data Editing Group

Data Field

See: Data Item

Data Imputation

Substitution of estimated values for missing or inconsistent data items (fields). The substituted values are intended to create a data record that does not fail edits.

Source: UNECE Data Editing Group

Data Item (data field)

The specific sub-components of a data record. For instance, in a population census, specific data items might be last name, first name, sex, and age.

Source: UNECE Data Editing Group

*Data Quality

A measure (or measures) that indicate the quality of the data in a database. For example, if most records pass edits and the set of edit-failing records do not seriously affect certain aggregate and other measures, then the data may be said to be of (relatively) high quality.

*Note: This definition may not be fully satisfactory.

Source: UNECE Data Editing Group

Data Redundancy

When the value of data items (fields) can be partially or completely deduced from the values of other data items (fields).

Source: UNECE Data Editing Group

Data Review (data checking)

Activity through which the correctness conditions of the data are verified. It also includes the specification of the type of the error or condition not met, and the qualification of the data and its division into the "error-free" and "erroneous" data. Data checking may be aimed at detecting error-free data or at detecting erroneous data. Data review consists of both error detection and data analysis, and can be carried out in manual or automated mode.

Data review/error detection may occur at many levels:

a) within a questionnaire

  • Item level / editing of individual data - the lowest logical level of checking and correction during which the relationships among data items are not considered. Validations at this level are generally named "range checking".
    Example: age must be between 0 and 120. In more complex range checks, the range may vary by strata or some other identifier.
    Example: if strata = "large farm operation", then the number of acres must be greater than 500.
  • Questionnaire level / editing of individual records - a logical level of checking and correction during which the relationships among data items in one record/questionnaire are considered.
    Examples:
    1) If married = 'Yes' then age must be greater than 14.
    2) Sum of field acres must equal total acres in farm.
  • Hierarchical - This level involves checking items in sub-questionnaires. Data relationships of this type are known as "hierarchical data" and include situations such as questions about an individual within a household. In this example, the common household information is on one questionnaire and each individual's information is on a separate questionnaire. Checks are made to ensure that the sum of the individual's data for an item does not exceed the total reported for the household.

b) across questionnaires / editing of logical units

  • A logical level of checking and correction during which the relationships among data in two or more records are considered, namely in a group of records that are logically coupled together. The across questionnaire edits involve calculating valid ranges for each item from the survey data distributions or from historic data for use in outlier detection. Data analysis routines that are usually run at summary time may easily be incorporated into data review at this level. In this way, summary level errors are detected early enough to be corrected during the usual error correction procedures. The "across questionnaire" checks should identify the specific questionnaire that contains the questionable data. "Across questionnaire" level edits are generally grouped into two types: statistical edits and macro edits.

Source: UNECE Data Editing Group

See also: Data Checking

Data Validation

An activity aimed at verifying whether the value of a data item comes from the given (finite or infinite) set of acceptable values. For instance, a geographic code (field), say for a Canadian Province, may be checked against a table of acceptable values for the field.

Source: UNECE Data Editing Group

Data Validation According to a List

Verifying whether the data value is in the list of acceptable values of this data item.

Source: UNECE Data Editing Group

Deck Imputation

Imputation method where a donor questionnaire is used to supply the missing value.

  • Hot-deck imputation - a donor questionnaire is found from the same survey as the questionnaire with the missing item. The "nearest neighbour" search technique is often used to expedite the search for a donor record. In this search technique, the deck of donor questionnaires comes from the same survey and shows similarities to the receiving record, where similarity is based on other data on the questionnaire that correlates to the data being donated. For example: similar size and location of farm might be used for donation of fuel prices.
  • Cold-deck imputation - same as hot deck except that the data is found in a previously conducted similar survey.

Source: UNECE Data Editing Group

See also: Automated Imputations, Hot-Deck, Cold-Deck

Deductive Imputation

An imputation rule defined by a logical reasoning, as opposed to a statistical rule.

Source: UNECE Data Editing Group

Detection of Errors in Data (error detection)

An activity aimed at detecting erroneous data. Usually predefined correctness criteria are used.

Source: UNECE Data Editing Group
 See also: Error Detection

Deterministic Checking Rule

A checking rule which determines whether data items are incorrect with a probability of 1.

Source: UNECE Data Editing Group

Deterministic Edit

An edit, which if violated, points to an error in the data with a probability of one. Contrast with stochastic edit.

Example: Age 5 and Status = mother.

Source: UNECE Data Editing Group

See also: Stochastic Edit

Deterministic Imputation

The situation, given specific values of other fields, when only one value of a field will cause the record to satisfy all of the edits. For instance, it might occur when the items that are supposed to add to a total do not add to the total. If only one item in the sum is imputed, then its value is uniquely determined by the values of the other items. This may be the first situation that is considered in the automated editing and imputation of survey data.

Example: The missing sum at the bottom of a column of numbers

Source: UNECE Data Editing Group

See also: Automated Imputations

Donor (imputation)

In hot-deck edit/imputation, a donor is chosen from the set of edit-passing records based on its similarity to the fields in the record being donated to (being imputed within). Values of fields (variables) in the donor are used to replace the corresponding contradictory or missing values in the edit-failing record that is receiving information. This type of replacement may or may not assure that the imputed record satisfies edits.

Source: UNECE Data Editing Group

See also: Hot-Deck, Hot-Deck Imputation