95. The examples offered in this annex are not taken from concrete implementations of GSIM (as these don't currently exist). They have been devised with a teaching goal in mind. They illustrate how some of the main activities of a statistical organization can be described and managed by using GSIM.
96. Each of the scenarios in this section are common for statistical organizations. They are described first in simple everyday language that will be familiar to a large number of staff in statistical organizations. The scenarios are then described, for a second time, in terms of the information objects in GSIM and the sub-processes in GSBPM. The following scenarios are included:
Managing definitions of variables
97. Each organization will have its own specific set of processes to manage definitions of variables, but the simplest and most common process would be to update a variable for an established statistic and a more complicated, but still common, process would be to create a new variable for a new statistic. The process of updating or creating variables would be carried out within the specify needs and design phases of the established or new statistic.
98. Other processes are conceivable, and supported by GSIM, such as creating a new variable for an established statistic, reusing existing variables from one statistic in another new or established statistic, but these processes will not be further detailed in this scenario.
99. Part 1: Updating a variable for an established statistic
a) Update variable – Change the validity period of the outdated variable to reflect the fact that it is no longer valid. Copy the outdated variable and change the definition and validity period for the updated variable. Remember to also update documentation of variables that will be derived from this one. Connect the updated variable to relevant concepts (populations, categories {e.g. male, female and other}, other variables) for the established statistic, if these connections were not copied from the outdated variable.
b) Update/create or re-use value domain – If necessary update/create the value domain (allowed values of codes [e.g. m, f, o] for categories) for the updated variable and connect the updated variable to the updated/created value domain. Otherwise connect the updated variable to the existing value domain.
c) Change or re-use data source – check whether the value of the updated variable can still be obtained from the same data source. If not, then update the collection method.
d) Update/create or re-use question – if the value of the updated variable is to be obtained from a question in a questionnaire, then check whether an existing question can be re-used, possibly updated or one or more new questions need to be designed. Connect the appropriate question(s) to the updated variable.
e) Update the design of statistical outputs – Connect the appropriate unit/table/cube designs to the updated variables.
100. Part 2: Creating a new variable for a new statistic
a) Create variable - Document definition of new variable. Remember to also document variables that will be derived from this one.
b) Identify concepts – check whether appropriate concepts (populations and categories) are already documented in a concept management system. Document new concepts for the new statistic, if necessary. Connect the variables to the relevant concepts.
c) Identify value domain - check whether an appropriate value domain is already documented in a value domain management system. Document a new, re-use or update an existing value domain for the new variable. Connect the variable to the value domain.
d) Identify data source – check whether the value of the new variable can be obtained from an existing data source or whether it needs to be collected e.g. using a questionnaire.
e) Identify question – if the value of the new variable is to be obtained from a questionnaire, then check whether a relevant question/question group is already documented in a question bank and/or questionnaire. Document a new question/question group, if necessary. Connect the question to the new variable.
f) Designing statistical products – Connect the unit/table/cube designs to the new variables.
Mapping of example description to GSIM information objects
101. GSIM can be used by those people involved in metadata management to identify the pieces of information that they require to undertake their roles and in the design of systems for metadata management.
102. Part 1: Updating a variable for an established statistic
a) Update Variable – Change the validity period of the outdated Variable to reflect that it is no longer valid. Copy the outdated Variable and change the definition and validity period for the updated Variable. Remember to also update documentation of Variables that will be derived from this one. Connect the updated Variable to the relevant Concepts (Population, Categories and other Variables) for the established statistic, if these connections were not already copied from the outdated Variable.
b) Update/create or re-use value domain – If necessary update/create the Value Domain. The Represented Variable is then the association of the updated Variable with the updated/created or re-used Value Domain.
c) Change or re-use Data Set – check whether the value of the updated Variable can still be obtained from the same Data Set. If not, then update the Acquisition Design.
d) Update/create or re-use Question – if the value of the updated Variable is to be obtained from a Question in a Survey Instrument (questionnaire), then check whether an existing Question can be re-used or one or more new Questions need to be designed. Connect the appropriate Question(s) to the updated Variable.
e) Update the design of Representations – Connect the appropriate Unit/Dimensional Data Structures (unit/table/cube designs) to the updated Variables.
Table 2. Part 1: Updating a variable for an established statistic
Activity Steps | Applicable GSBPM sub-process | Applicable GSIM Objects |
a) Update Variable (Re-use Population, Category) | 2.2 Design variable descriptions |
|
b) Update/create or re-use Value Domain | 2.2 Design variable descriptions |
|
c) Change or re-use data source | 1.5. Check data availability |
|
d) Update/create or re-use Question | 2.3 Design data collection methodology |
|
e) Update the design of Representations | 2.1 Design outputs |
|
103. Part 2: Creating a new variable for a new statistic
a) Create Variable - Document definition of new Variable. Remember to also document Variables that will be derived from this one.
b) Identify Concepts – check whether these are already documented in a Concept (Population and Categories) management system. Document new Concepts for the new statistic, if necessary. Connect the Variables to the relevant Concepts.
c) Identify Value Domain - check whether this is already documented in a Value Domain management system. Document new, re-use or update an existing Value Domain. The Represented Variable is then the association of the Variable with the Value Domain.
d) Identify Data Set – check whether the value of the Variable can be obtained from an existing Data Set or whether it needs to be collected e.g. using a Survey Instrument (questionnaire).
e) Identify Question – if the value of the Variable is to be obtained from a Survey Instrument, then check whether a relevant Question/Multiple Question Item is already documented in a question bank and/or Survey Instrument. Document a new Question/Multiple Question Item, if necessary. Connect the Question to the new Variable.
f) Design Representations – Connect the new Unit/Dimensional Data Structure (unit/table/cube designs) to the new Variables.
Table 3. Part 2: Creating a new variable for a new statistic
Activity Steps | Applicable GSBPM sub-process | Applicable GSIM Objects |
| 2.2 Design variable descriptions |
|
| 1.4. Identify concepts |
|
| 2.2 Design variable descriptions |
|
| 1.5. Check data availability |
|
| 2.3 Design data collection methodology |
|
| 2.1 Design outputs |
|
Acquiring data
104. The majority of statistical organizations collect data in one form or another. The collection or acquisition of data begins with identifying the need for data and results in the statistical organization having a resource of data to process, analyses and disseminate.
105. Each organization will have its own specific set of processes to collect or acquire data. Generally, this process will consist of these following steps:
106. Part 1: A need for data is identified
a) Statistical organization determines need for new data
b) Decide on the concepts that are to be measured
c) Check what data and sources are already available.
d) Decide whether the data will be acquired and how
107. Part 2a: An administrative data source is available
a) An agreement is made with a register owner.
b) Administrative data are delivered from the register owner
c) The data is forwarded to an environment for pre-processing
108. Part 2b: A survey is needed to collect the data
a) Decide the variables that measure concepts and the applicable classifications
b) Decide on questions to ask and the question-sequence
c) Build the physical instrument
d) Collect the data
e) Finalize the collection
Mapping of example description to GSIM information objects
109. GSIM can be used by those people involved in the collection process to identify the pieces of information that they require to undertake their roles and in the design of systems for collection purposes.
110. Part 1: A need for data
a) Statistical organization determines need for new survey - A statistical organization will determine that there is a new Statistical Need. An example of this need might be an unemployment figure. This Statistical Need will usually be expressed in terms of a Subject Field, like Labour, and a Population, like Australian citizens.
b) Decide on the concept that is to be measured - The statistical organization will need to do conceptual work to establish exactly what it is trying to be measured –to determine exactly what the required Concepts are. In the context of an unemployment figure, one of the Concepts would be unemployment.
c) Check what data and sources are already available - The statistical organization will make an Assessment of what data is already available to them. This may involve searching the organization’s existing Data Resources to check whether relevant Data Sets are already held, which could be reused. It could also involve reviewing the organization’s Provision Agreements with Data Providers to see what administrative data could be accessed.
d) Decide whether the data will be acquired and how - The Process Outputs from the above two processes will result in a Change Definition - a formalized statement of how the organization should react to the Statistical Need. This Change Definition will feed into a Business Case. Based on Assessments made by the statistical organization, a particular Acquisition Activity will be proposed. This may include collecting data using a survey or an administrative source. This activity will be described by a Collection Description. If the Business Case for the Statistical Need is accepted, a Statistical Program (for example, a Labour Force Survey) will be initiated.
Table 4. Part 1: A need for data
Activity Steps | Applicable GSBPM sub-process | Applicable GSIM Objects |
a) Statistical organization determines need for new data | 1.1 Determine Needs |
|
b) Decide on the concept that is to be measured | 1.4 Identify Concepts |
|
c) Check what data and sources are already available | 1.5 Check Data Availability |
|
d) Decide whether the data will be acquired and how | 1.6 Prepare Business Case |
|
111. Part 2a: An administrative data source is available
a) An agreement is made with a register owner - As part of the Acquisition Design, the statistical organization makes a Provision Agreement with a Data Provider (the owner of the register). The Provision Agreement will outline the Data Location (where the Data Set can be retrieved from) and a Data Flow. The Data Flow could be a link to a specific Data Set file or to a Business Service which will consume a query and return a Data Set.
b) Administrative data are delivered from the register owner - In the Acquisition Activity, the Data Provider will make the Data Set available at a specific Data Location via a Data Flow. The Data Set will be structured according to the agreed Data Structure.
c) The data are forwarded to an environment for pre-processing - The Data Set is fed into the Data Resource for pre-processing. Pre-processing means that Instance Variables are created by extraction and derivation from the Instance Variables that have been received.
Table 5. Part 2a: An administrative data source is available
Activity Steps | Applicable GSBPM sub-process | Applicable GSIM Objects |
a) An agreement is made with a register owner | 2.3 Data Collection Methodology |
|
b) Administrative data are delivered from the register owner | 4.3 Run Collection |
|
c) The data are forwarded to an environment for pre-processing | 4.4 Finalize Collection |
|
112. Part 2b: A survey is needed to collect the data
a) Decide the variables that measure concepts and the applicable classifications - The statistical organization will need to decide on the Acquisition Design. One of the first inputs for this is to define the Variables (for example, Unemployment Status) and Classifications (for example, industry classification) which will be collected via the Survey Instrument.
b) Decide on questions to ask and the question-sequence - The statistical organization will then design the Survey Instrument (e.g. questionnaire for the Labour Force Survey). The design of the Survey Instrument will depend of the Mode(s) (CATI interview) and the Data Channel(s) (phone) that will be used. The Questions ('Last week, did you do any work at all in a job, business or farm?') and Value Domains (Yes, No) and Units of Measure (dollars) used in the response options and designed. The Questions will be grouped into Question Blocks and the Control Transition (flow logic or question sequence) between the Questions will be determined.
c) Build the physical instrument - Once the Survey Instrument has described what is to be collected, an Instrument Implementation (for example, a Blaise Program) is created.
d) Collect the data - An Acquisition Activity takes place. The Acquisition Activity executes a number of processes required to collect the data via the specified Data Channel (phone).
e) Finalize the collection - The collected data is loaded into a Data Resource for further processing.
Table 6. Part 2b: A survey is needed to collect the data
Activity Steps |
| Applicable GSBPM sub-process |
| Applicable GSIM Objects |
|
a) Decide the variables that measure concepts and the applicable classification | 2.2 Design Variable Descriptions |
|
|
| |
b) Decide on questions to ask and the question-sequence | 2.3 Data Collection Methodology |
|
|
| |
c) Build the physical instrument | 3.1 Build Data Collection Instrument |
|
|
| |
d) Collect the data | 4.3 Run Collection |
|
|
| |
e) Finalize the collection | 4.4 Finalize Collection |
|
|
|
Sample selection and estimation
113.The majority of statistical organizations select samples and compute estimates. This example illustrates the application of a few statistical methods, so it is well suited to show how methodology is modelling in GSIM.
114. This example was used in the CORE project (COmmon Reference Environment). Each organization will have its own specific set of processes to do sample selection and estimation. However, generally, this process will consist of these following steps:
115. Part 1: Sampling
a) Establish the population.
b) Determine sampling method.
c) Compute strata statistics
d) Allocate the sample
e) Select the sample
116. Part 2: Collection
a) Collect survey data
b) Check which methodology to use
c) Check and correct survey data
117. Part 3: estimation
a) Check which methodology to use
b) Calibrate survey data
c) Compute estimates
Mapping of example description to GSIM information objects
118.GSIM can be used by those people involved in the methodological processes to identify the pieces of information that they require to undertake their roles and in the design of systems.
119. A central role is played by the Design Context, an information object representing a repository of principles, best practices and proven solutions supporting the production of coherent statistics in a transparent and reproducible way.
120. Part 1: Sampling
a) Establish the population - the Target Population is Banks in the Netherlands
b) Determine sampling method - access the Design Context of sampling, to retrieve a Process Method suited for sampling banks. The Process Method to be applied is a stratified random sample because the Population of banks is skewed on the Represented Variable Turnover. This Process Method comprises the following three steps (c - e):
c) Compute strata statistics – establish the Frame Population (a subset of the Target Population that is available for surveying) and apply a stratification Rule to classify it according to the Instance Variable Turnover into a number of strata, and compute, for each stratum, the mean and standard deviation of a set of auxiliary Represented Variables. Store this information in a Data Set whose Data Structure specifies a record for each stratum and a set of two Represented Variables (Mean and SDev) for every auxiliary Represented Variable.
d) Allocate the sample – apply a Rule supplied by the Process Method to find the optimal sample allocation across strata. The output of this Rule is a Data Set whose Data Structure specifies a record for each stratum and one Represented Variable for the allocation value.
e) Select the sample – draw a stratified random sample of Units from the Frame Population, according to the previously computed optimal allocation.
Table 7. Part 1: Sampling
Activity Steps | Applicable GSBPM sub-process | Applicable GSIM Objects |
a) Establish the population | 1.1 Determine needs for information |
|
b) Determine sampling method | 2.4 Design frame & sample methodology |
|
c) Compute strata statistics | 2.4 Design frame & sample methodology |
|
d) Allocate the sample | 2.4 Design frame & sample methodology |
|
e) Select the sample | 4.1 Select sample |
|
121. Part 2: Collection
a) Collect survey data – approach one Unit at a time and apply to it the Survey Instrument. The Process Output of this step is a Unit Data Set, described by a Unit Data Structure.
b) Check which methodology to use - Access the Design Context of data editing to retrieve the Process Methods applicable to the Represented Variables of the Unit Data Set.
c) Check and correct survey data – the validation Rules specified by the Process Methods for a subset of the Represented Variables of the Unit Data Structure are applied to the Instance Variables of the Unit Data Set, and the correction Rules are applied to the Instance Variables that fail validation.
Table 8. Part 2: Collection
Activity Steps | Applicable GSBPM sub-process | Applicable GSIM Objects |
a) Collect survey data | 4.2 Run collection |
|
b) Check which methodology to use | 2.4 Design frame & sample methodology |
|
c) Check and correct survey data | 5.3 Review, validate & edit |
|
122. Part 3: Estimation
a) Check which methodology to use - Access the Design Context of estimation of stratified data and retrieve the Process Methods suited for the Population of banks. Two Process Methods will be applied in sequence (steps b-c).
b) Calibrate survey data – the weights computed in Part 1 are no longer valid, due to survey errors such as non-response. The calibration Process Method specifies a Rule to compute new weights. The weights will be stored in the Instance Variables described by the Represented Variables designed for this purpose.
c) Compute estimates – select the aggregation Process Method, which specifies the following steps:
i) Select an aggregation Rule (sum, average, median, etc.)
ii) Design a Dimensional Data Structure to describe the aggregates to be produced
iii) Design an aggregation frame, specifying how Unit Measure Components of the Unit Data Structure of the Process Input contribute either to Dimensional Identifier Components, or in combination with the calibrated weights, to Dimensional Measure Components of the Dimensional Data Structure of the Process Output.
iv) Apply the selected aggregation Rule and the aggregation frame of the previous step to the Unit Data Set (used as Process Input) to produce the Dimensional Data Set specified in the Dimensional Data Structure (the Process Output) of step 3b.
Table 9. Part 3: Estimation
Activity Steps | Applicable GSBPM sub-process | Applicable GSIM Objects |
a) Check which methodology to use | 2.5 Design statistical processing methodology |
|
b) Calibrate survey data | 5.6 Calculate weights |
|
c) Compute estimates | 5.7 Calculate aggregates |
|
i) Select aggregation rule | 2.5 Design statistical processing methodology |
|
ii) Design dimensional data structure | 2.1 Design outputs |
|
iii) Design aggregation scheme | 2.5 Design statistical processing methodology |
|
iv) Apply aggregation rule | 5.7 Calculate aggregates |
|
Dissemination of statistical information
123.The majority of statistical organizations disseminate information in one form or another. Generally dissemination of information begins with the process of designing the outputs (static products or interactive services) to meet a set of user's information needs and results in information being made publically available on a website or other dissemination channel.
124.Each organization will have its own specific set of processes to disseminate information but generally this will consist of:
a) Selecting the data and information to be disseminated
b) Setting up output systems to receive data and information
c) Loading data and information into the output system
d) Producing products or implementing services to present information to users
e) Reviewing, editing and approving information for release
125. The above scenario is applicable across a range of types of dissemination including:
Mapping of example description to GSIM information objects
126. GSIM is able to support the dissemination process across multiple types of dissemination including those identified above. A key need of statistical organizations is to disseminate products that include data from multiple outputs.
127.GSIM can be used by those people involved in dissemination processes to identify the pieces of information they require to undertake their roles and in the design of systems for dissemination purposes.
128.Steps involved in the dissemination of statistical information:
a) Selecting the data and information to be disseminated - The dissemination of information takes place within the context of a Statistical Program (the overarching activity or ongoing series, e.g. Employment Survey) as part of a Statistical Program Cycle (an iteration of the ongoing activity, e.g. March 2012 Employment Survey) and specifically a Dissemination Activity. The first step in this activity is to identify the data and/or information to be disseminated. The Variables required by the intended user audience are identified and the particular Represented Variables selected depending on the requirements (as identified by users in an Information Request) for data about a particular Population or according to a particular Classification.
b) Setting up output systems to receive data and information - Once the data and information for dissemination have been selected the systems used for the process will be configured. This is done according to the Dissemination Design which identifies the attributes of the Dissemination Activity, such as the Data Structure. In many dissemination processes a Dimensional Data Structure will be defined. A Dimensional Data Structure describes the structure of an aggregate, multi-dimensional table (macro data) by means of Dimensional Identifier Components, Dimensional Attribute Components and Dimensional Measure Components. Both are Represented Variables with specific roles in such a table. Dimensions typically refer to Variables with coded Value Domains, measures to Variables with uncoded Value Domains. An example for a type of Data Set defined by a Dissemination Data Structure is a Time Series. It has specific attributes such as frequency and type of temporal aggregation and specific methods, e.g. seasonal adjustment, and must contain a temporal variable.
c) Loading data and information into the output system - When dissemination systems have been set up and structures created according to the appropriate Data Structure, data and information are loaded. Data Points, defined by Instance Variables, are selected from source Data Sets. A check is undertaken to ensure that the required selection of data and metadata from the source Information Resource (a collection of Data Sets including data and/or metadata) has been correctly loaded.
d) Producing products or implementing services to present information to users - After data have been loaded to the output system, the type of Dissemination Activity determines how the data are presented to users. A Dissemination Activity includes either a Publication Activity or a Dissemination Service as a method to create and disseminate Representations to consumers. Representations may contain any type of information, for instance statistical data (as a Data Set or visualization) or structural or conceptual metadata like a Data Structure, a Code List or a description of a Concept.
A Publication Activity results in the creation of Products which may be made up of one or more Representations and stored to be delivered to users. Examples of Products are publications, press releases, etc.
A Dissemination Service is the mechanism to create and disseminate Representations to users. These Representations are created dynamically on the specific request and according to the specific needs of the consumer. These exposes Data Sets that may be included in Products and Representations, either as Data Sets (e.g. when providing access to public-use micro data) or as a visualization (e.g. a table in a report or an interactive chart on a website).
e) Reviewing, editing and approving information for release - A completed Product or Dissemination Service will be reviewed against the requirements of the original Information Request to ensure user needs are being met and against the organizations policies and procedures to ensure the outputs meet the required level of quality.
Table 9. Dissemination of statistical information
Activity Steps | Applicable GSBPM sub-process | Applicable GSIM Objects |
a) Selecting the data and information to be disseminated | 6.5 Finalize outputs |
|
b) Setting up output systems to receive data and information | 7.1 Update output systems |
|
c) Loading data and information into the output system | 7.1 Update output systems |
|
d) Producing products or implementing services to present information to users | 7.2 Produce dissemination products |
|
e) Reviewing, editing and approving information for release | 7.3 Manage the release of dissemination products |
|
Quality measures and reports
129.The majority of statistical organizations will calculate quality measures, and include quality information in a report.
130. The decision was made that quality would not be defined as a separate object (or group of objects) in GSIM, as it could not be adequately defined in this manner. Quality itself can have many forms depending on the purpose that it is there for. It could, for example, be representing the quality of the organization, the quality of the process used, or the quality of the statistics produced. Quality information and reports could be relevant to a number of different levels of a particular information object, or tied to the production process (or parts of the process) as a whole. It is present in the inputs and outputs of all process steps within the GSBPM, and can act as the rules and control of processes.
131. Quality means different things in different settings, and so depending on the scope, it will refer to the different information objects in relation to relevant processes. Quality is therefore not seen as an explicit object in GSIM.
132. This scenario has two parts and looks a little different to the previous four scenarios. The first part describes a process that could be used to calculate response rates (i.e. a quality measure), and the second part how these rates could be included in a quality report in the form of a qualitative assessment. Refer to Figure 1 for a representation of the scenario.
133.Part 1: Calculate the response rates needed for a set of data
a) A process is setup to calculate the response rates for a Data Set.
b) A quality check is included as a part of this process
134. Part 2: A qualitative assessment statement included in a quality report.
a) A quality report is prepared
Mapping of example description to GSIM information objects
135.GSIM can be used by those people involved to identify the pieces of information that they require to undertake their roles and in the design of systems.
136. Part 1: Calculate the response rates needed for a set of data.
a) A process is setup to calculate the response rates for a Data Set - First, a Process Step Design is created to outline the specification of the Process Step, in this case the calculation of the response rates for a particular Data Set. The design includes the Process Input Specification and Process Output Specification – the required inputs, and expected outputs of the Process Step.
For the response rate calculation example, the inputs include the Instance Variables that describe the original Data Set for which the response rates are to be calculated, and the Parameter Input, (the calculation formulas used to obtain the Unit and Item response rates). The expected outputs of the described Process Step will include a new Data Set of the calculated response rate data, and new Instance Variables to represent the data.
The execution of the Process Step is recorded by the Process Step Execution Record, which identifies the (actual) inputs and outputs at the time of the process execution. In this example the response rate calculation process is itself made up to two lower level Process Steps, to calculate the unit response rate, and item response rate respectively. Each of these Process Steps was defined through the Process Step Design, and then when used, captured by the Process Step Execution Record.
b) A quality check is included as a part of this process - As part of the overall Process Step Design, a Process Control was included in the Process Step as a quality check. This quality check stage allows for a check point in the procedure to ensure the calculated response rates are within an acceptable range.
If the calculated rates are within an acceptable range, the output of the process (both the calculated Instance Variables and the Process Metric) can be used in a subsequent Process Step (scenario part 2). If the calculated rates are not accepted, then the process ends, and other Process Steps may need to be employed (such as a review of the collection Process Steps to allow for more acceptable response rates).
137. Part 2: A qualitative assessment statement included in the quality report.
a) A quality report is prepared - First, Process Step Designs are created for the Process Steps, in this case the preparation of the quality report which includes a qualitative assessment (based on the calculated response rates).
There are two Process Steps involved. The first is the Process Step of preparing the qualitative assessment of the calculated response rates. The Process Inputs are the Data Set and Process Metric created from the Process Step in Part 1 and the Process Output is a Representation (the qualitative assessment statement).
This Representation in turn becomes the Process Input of the second Process Step which is the creation of the quality report. The Process Output is a Product.
The execution of the overall Process is recorded in two Process Step Execution Records.

Figure 3. Response Rate Calculation for quality report