• Issue 17: Add some data integrity validation checking in the Collect phase
Skip to end of metadata
Go to start of metadata

 

In the GSBPM it isn't clear whether some of the validation of administrative datasets took place in the Collect phase (in 4.3), or whether it is all in the Process phase (in 5.3). ABS staff would like to put these editing related activities in the Collect phase, as they need to do some integrity checks and fix any issues, both before and after loading these datasets to shared stores.  These checks only deal with the basic integrity of the acquired datasets to the point where they can be loaded to stores.  For administrative datasets, these checks might include comparing the size, content (e.g. variables) and format of files received with the expected targets and checking against the standard metadata required for successful loading (e.g. no negative values, numeric only fields, etc.).  These administrative data checks are not that different to in-field checks for survey data. Any further editing/validation of input data would still be done in the Process phase.

Please indicate your support for this change using the stars and legend below

  • 5* (We should do this)
  • 4* (Good idea, but need to discuss)
  • 3* (I am not sure, we need to discuss)
  • 2* (Should not make the change, but need to discuss)
  • 1* (Should not make this change)
Your Rating:
Results:
PatheticBadOKGoodOutstanding!
8 rates
  • No labels

5 Comments

  1. Additional ABS comment

    Please note that the new title for this issue in the diagram is not quite correct. Our suggestion is not restricted to only administrative datasets, and it was only supposed to related to the integrity type checks, to avoid overlap with other validation activity in 5.3. Our suggested issue title was more generic 'Add data integrity validation checking in Collect phase'.

     

  2. In 4.3, Istat suggests to delete the text in parenthesis “(usually based on response rates)” because it is not generally true: data collection can be concluded on the basis of the planned period of interviewing, or when a certain amount of sampling units are observed.

  3. In my view GSBPM should not prescribe the order of process steps. Validation is one of the clearest examples of this. In some design strategies you do the validation as early as possible in the process, as there you are closer to the supplier of the information. In another strategy you do it as late as possible to profit from the availabitity of other sources and to profit from the possibility to assess the impact (selective correction). Usually checking and correcting is done at several points in the statistical process.

    (I agree with Nadia on the text of 4.3, but it does not fit well in this issue)

  4. In SSB we use:

    "4.1. Establish frame and registers, select sample.

    Based on the development and design in phase 2 Develop and design, the frame is established. This is specified as a list or a register of units that can be counted or drawn (typically from a population register). Necessary linking, checking and harmonisation of data sources are carried out. The use of external data sources is coordinated and relevant registers are controlled, maintained and updated. The sample is selected, controlled and documented. It is important to coordinate and distribute the workload over samples that share a common frame."

  5. 12/11: Validation can be done in many steps in the process. GSBPM does not have to be in order. 

    Checking that the admin data is sound, that the content of the data file is good. File consistency checks.

    Sentences 3 and 4 from Norway could be added.

    Checking the integrity of the data is ok, make sure that there is no mention of editing. 

    Add in field edits to 4.3?