Note: To translate this paper into over 50 languages, please see the Automatic translation option at the top of the screen
Section  


Generic Statistical Information Model (GSIM):
Communication Paper for a General Statistical Audience
(Version 1.
...
1, December
...
2013)
Anchor  


Anchor  


Anchor  


About this document
This document provides an overview about the information represented in GSIM, and summaries of how the model will benefit statistical organizations and relationships to other models and standards.
Anchor  


Anchor  


Anchor  


This work is licensed under the Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/. If you reuse all or part of this work, please attribute it to the United Nations Economic Commission for Europe (UNECE), on behalf of the international statistical community.
Anchor  


Anchor  


Table of Contents
Introduction
Scope
What is GSIM?
Benefits of GSIM for the organization as a whole
GSIM and GSBPM
What does it mean for me?
The Business view
The Information Technology view
SDMX, DDI and other standards
Summary
Anchor  


1.Across the world statistical organizations undertake similar activities albeit with variations in the processes each uses. Each of these activities use and produce similar information (for example all organizations use classifications, create data sets and publish productsdisseminate information). Although the information used by statistical organizations is at its core the same, all organizations tend to describe this information slightly differently (and often in different ways within each organization). There is In the past, there was no common way to describe the information that is used. This makes it difficult to communicate clearly within and between statistical organizations and without this there is was no foundation for indepth collaboration, standardization, or the sharing of tools and methods.
...
4.GSIM is one of the cornerstones for modernizing official statistics and moving away from subject matter silos. It is a key element of the strategic vision prepared by the HighLevel Group for the Modernization of Statistical Production and Services (HLG), and endorsed by the Conference of European Statisticians
Footnote Macro 

See: www1.unece.org/stat/platform/display/hlgbas 
5. The modernization of statistical production is needed in order for statistical organizations are to remain relevant and flexible in a dynamic and competitive information environment. It is hoped that statistical organizations will adopt and implement GSIM and the common language it provides. However, a model alone cannot transform an organization or its processes. In order to meet the future needs of statistical organizations, GSIM is designed to allow for innovative approaches to statistical production to the greatest extent possible; for example, in the area of dissemination, where demands for agility and innovation are increasing. It is one of the main foundations of the Common Statistical Production Architecture
Footnote Macro 

6.This paper provides an introduction to GSIM, summarizing the key points for a relatively general statistical audience. For more technical detail, please see the Specification specification document and the User Guiderelated material, available on the UNECE web site
Footnote Macro 

See: http://www1.unece.org/stat/platform/display/metis/Generic+Statistical+Information+Model+(GSIM) 
Anchor  


Anchor  


Anchor  


Anchor  


7. GSIM provides the information object framework supporting all statistical production processes such as those described in the Generic Statistical Business Process Model (GSBPM)
Footnote Macro 

See: www.unece.org/stats/gsbpm 
...
Anchor  


9. GSIM contains objects which specify information about the real world – 'information objects'. Examples include data and metadata (such as classifications) as well as the rules and parameters needed for production processes to run (for example, data editing rules). GSIM identifies around
...
110 information objects, which are grouped into four toplevel groups, and are explained in more detail in the specification documentation.
...
Anchor  


10. The four toplevel groups are described below:
The Business group is used to capture the designs and plans of statistical programs, and the processes undertaken to deliver those programs. This includes the identification of a Statistical Need, the Acquisition, Production and Dissemination Activities Business Processes that comprise the statistical program Statistical Program and the evaluations of them.
The Production Exchange group is used to describe each step in the statistical process, with a particular focus on describing the inputs and outputs of these stepscatalogue the information that comes in and out of a statistical organization via Exchange Channels. It includes objects that describe the collection and dissemination of information.
The Concepts group is used to define the meaning of data, providing an understanding of what the data are measuring.
The Structures group is used to describe and define the terms used in relation to data information and its structure.
11.Figure 2 shows a simplified view of the information objects identified in GSIM. It gives users examples of the objects that are in each of the four toplevel groups.
Anchor  


Anchor  


Anchor  


Anchor  


12.Figure 3 shows another
...
view of one part of GSIM. This is a slightly more technical view
...
, but still intended to be accessible by a relatively wide audience. Both figures 2 and 3 can
...
be used as a means for communication with users who are interested in examples of the objects and relationships in GSIM.
...
...
Anchor  


13. Figure 3
...
gives an example of GSIM information objects that tell a story about some of the information that is important in a statistical organisation. Information objects in the GSIM model are given in italics.
"A statistical organization
...
initiates a Statistical
...
Program. The Statistical Program corresponds to an ongoing activity such as a survey or an output series and has a Statistical Program Cycle (for example it repeats quarterly or annually).
The Statistical Program Cycle will include a set of Business Processes. The Business Processes consist of a number of Process Steps which are specified by a Process Design. These Process
...
Designs have Process
...
Input Specifications and Process
...
Output Specifications. The specifications will often be pieces of information that refer to Concepts and Structures (for example, Statistical Classification, Variable, Population, Data Structure, and Data Set).
...
If, for example, the Business Process is related to the collection of data, there will be an Information Provider who agrees to provide the statistical organisation with data (via a Provision Agreement). This Provision Agreement specifies an agreed Data Structure and governs the Exchange Channel used for the incoming information. The Exchange Channel could be a Questionnaire or an Administrative Register. It will receive the information via a particular mechanism (Protocol) such as an interview or a data file exchange.
The Data Set produced by the Exchange Channel will be stored in a Data Resource and structured by a Data Structure.
Anchor  


Anchor  


...
 Between the different roles in statistical production (business and information technology experts);
 Between the different statistical subject matter domains;
 Between statistical organizations at national and international levels.
19. Improving communication will result in a more efficient exchange of data and metadata within and between statistical organizations, and also with external users and suppliers.
...
 Build capability among staff by using GSIM as a teaching aid that provides a simple easy to understand view of complex information and clear definitions;
 Validate existing information systems and compare with emerging international best practice and where appropriate leverage off international expertise;
 Guide development or updating of international or local standards to ensure they meet the broadest needs of the international statistical community.
Anchor  


Anchor  


Anchor  


21.GSIM and GSBPM are complementary models for the production and management of statistical information. GSBPM models the statistical production process and identifies the activities undertaken by producers of official statistics that result in information outputs. These activities are broken down into subprocesses, such as "Impute" and "Calculate aggregates". As shown in Figure 6, GSIM helps describe GSBPM subprocesses by defining the information objects that flow between them, that are created in them, and that are used by them to produce official statistics.
...
Figure 4. GSIM and GSBPM
22. Greater value will be obtained from GSIM if it is applied in conjunction with GSBPM. Likewise, greater value will be obtained from GSBPM if it is applied in conjunction with GSIM. Nevertheless, it is possible (although not ideal) to apply one without the other. In the same way that individual statistical business processes do not use all of the subprocesses described within GSBPM, not every information object in GSIM is necessarily required to be used and/or produced in the course of every it is very unlikely that all information objects in the GSIM will be needed in any specific statistical business process.
...
 Facilitate the building of efficient metadata driven collection, processing, and dissemination systems.;
 Help harmonize statistical computing infrastructures.
25.GSIM supports a consistent approach to metadata, facilitating the primary role for metadata envisaged in Part A of the Common Metadata Framework "Statistical Metadata in a Corporate Context"
Footnote Macro 

http://www1.unece.org/stat/platform/display/metis/The+Common+Metadata+Framework 
Anchor  


Anchor  


...
 Subject matter specialists, methodologists and information technologists.;
 Statisticians in different domains of a statistical organization.;
 Statisticians in different organizations.
28. GSIM will help you design and understand your processes (and their inputs and outputs) better.
29.For a production cycle, a statistician can design the input and the output, and the process inbetween. In GSIM terms, the output and the input can be designed in terms of structures and concepts information objects, and the process inbetween can be designed using the production business information objects. The structures and concepts objects are provided by subject matter specialists.
30. As seen in Figure 4, if the GSBPM is considered as a frame of reference for statistical production processes, the first level can be considered as equivalent to the statistical production process as a whole. The next level corresponds to a phase of the statistical production process (for example the "Process" phase 5 of the GSBPM). The third level corresponds to a subprocess (for example subprocess 5.3 of the GSBPM – Review , and validate and edit). The fourth level consists of the individual building blocks within the subprocess, such as detecting financial values that might be expressed in thousands rather than units.
...
Anchor  


31. An important issue for statisticians is the problem of singleuse design components, which are often recreated or at least modified for each production cycle. GSIM facilitates the description of inputs and outputs at each level of the GSBPM, following the same pattern thus providing a consistent structure to design statistical processes. It supports the design, specification and implementation of harmonized methods and standard technology to create a generalized statistical production system.
...
40. At the national level, statisticians will become more selfsupporting in the design (see Figure 56) and production of their statistics reusing and repurposing harmonized components GSIM will enable more flexible and modular production systems. Production will be based upon more standardized applications that are more robust to change and less vulnerable to changing of IT personnel. An increase in the use of standardized applications, which can easily be shared across domains, will enable the IT specialists to more easily work in different domains.
41. The use of GSIM will reduce the workload as many components can be repurposed and reused. This means less repetitive work and more time for innovation.
42. This will free the IT staff to make more robust applications and explore new ways to better meet the changing needs of the statistical organization and their clients at large. This will include more time for creation of robust, modular, harmonized, well documented processes that
...
comply with the requirements of the Common Statistical Production Architecture.
Anchor  


43. At the international level there will be increased possibilities for codesign and codevelopment of common components based upon more robust userrequirements from a wider usercommunity. The IT developers will also have access to a larger development community that all speak the same language to describe their statistical information.
...
45. The information objects within GSIM are conceptual; no specific physical representation of the information is prescribed. As a simplified illustration, the "street address" of a person's home name of an organization can be defined as the same concept regardless of whether the information is recorded in a database, in a spreadsheet, in a CSV file, in an XML file or handwritten on a piece of paper.
46. GSIM allows organizations to start with a common language related to the data and metadata used throughout the statistical production process. The next step, which will be undertaken internationally on a collaborative basis, is to map (or relate) information objects from GSIM In this context, GSIM information objects have been mapped to relevant representations in SDMX and DDI.
...
48. While GSIM information objects can be mapped to SDMX and DDI (and substantial business benefit can be obtained from harnessing these standards), GSIM does not require these standards to be used. Some producers and some users of statistics may decide to use alternative standards for particular purposes. In other cases, producers of statistics may be open to using SDMX and/or DDI but have legacy information systems which are not economical to update to for use with these standards.
49. Describing statistical information using GSIM as the common point of reference helps users identify the relationship between two sets of statistical information which are represented differently from a technical perspective.
50.For example, a statistician may receive some data described in DDI and some described in a locally created format. The statistician can relate both of these to GSIM. The statistician will be able identify which differences are purely technical and which reflect underlying conceptual differences.
51. Once the nature and extent of the differences can be understood, it commonly often proves straightforward to transform the information into a common technical representation (for example, SDMX or DDI) which allows the content to be integrated and explored. This approach ensures thatthe that the results of the technical conversion to a common standard are accurately understood, and are sound, from a conceptual perspective.
...
53. For example, when determining the set of definitions to be used for information objects within GSIM, existing standards and models were harnessed as key reference sources. While none of these existing sources had the same purpose and scope as GSIM – that is a reference framework of information objects spanning the full statistical production processes process – the development of each entailed analysing and supporting particular needs and scenarios related to particular types of statistical data and metadata.
54. In this way GSIM benefited from the investment of time in analysis, modelling, testing and refinement when developing these standards and models to their current level of maturity. It also means GSIM does not vary "for no reason" from terms and definitions which are used in existing standards and models. Where it does vary it is for reasons such as existing relevant standards and models being inconsistent internally, with one another and/or statisticians reporting that alternative terms or definitions are more relevant to their business needs. A direct consequence of this was the revision of the Neuchâtel Model for Classifications, to fully align and integrate it with GSIM.
Anchor  


55. This paper introduces GSIM to people working in statistical organisations. It outlines the benefits of the model as well as how the adoption of the model might impact staff in statistical organisations. The paper also discusses the interaction of GSIM and other frameworks and standards such as GSBPM, DDI and SDMX.
56.For more information on how a statistical agency might implement GSIM, the GSIM User Guide introduces the steps that need to be undertaken. 57.For more detailed information on the information objects in GSIM, their definitions, attributes and relations, the GSIM Specification document provides a fine level detail and also discusses the relationship between GSIM and other standards and models. The GSIM wiki page
Footnote Macro 

http://www1.unece.org/stat/platform/display/metis/Generic+Statistical+Information+Model 
...
Display Footnotes Macro 

Viewtracker  


Show If  

 