Notes on the Generic Statistical Information Model (GSIM)

Summary

The recently released strategic vision document from HLG-BAS (High Level Group for Strategic Directions in Business Architecture in Statistics) discusses the role of GSIM and GSBPM (Generic Statistical Business Process Model) in supporting the “industrialisation” of statistics.

 

In simple terms GSIM can be envisaged as providing a basis for statistical organisations to agree on common terminology and definitions to aid their discussion on developing metadata systems and information management frameworks.

 

More specifically, it is anticipated GSIM will fulfil an essential role as a reference model that can be operationalized on a consistent basis when defining the information required to drive statistical production processes as well as when defining the outputs (eg statistical data) and outcomes (eg process metrics) from those processes.

 

In this way GSIM will support “designed in” (and, therefore, built in) interoperability between new methodological and new technical developments undertaken within the community/industry of producers of official statistics.  This represents a much more efficient and effective approach than seeking to achieve interoperability only after discrete developments have been delivered.

 

Establishment of GSIM will therefore facilitate partnerships within the community to collaboratively develop new and improved capabilities, and then facilitate broader sharing of these new and improved capabilities across the community as a whole.

 

The next section of this document summarises the background to the GSIM collaboration, including

       the NSIs which are currently collaborating to produce initial drafts
 

       the expected relationship between GSIM and standards such as SDMX and DDI-L
     

       the context of GSIM as an initial major output which enables operationalization of a common statistical information management framework
 

       other key collaborative forums and initiatives which are recognised as having a critical role in reviewing and shaping definition of GSIM

The document then explores current broad thinking about the structure of GSIM, including

       the Common Reference Model (CRM) Layer (which is analogous in nature to GSBPM), and
 

       the Semantic Reference Model (SRM) Layer (an additional level of formalisation and detail that is required in order to support consistent operationalization)   

It concludes with more information in regard to the proposed timeline for this work, including the planned points for engagement with the broader community of producers of official statistics to seek their input in co-ordinating and shaping the approach to GSIM.

 

 

Background

The idea of a Generic Statistical Business Information Model was discussed at the meeting of MSIS (Management of Statistical Information Systems) in April 2010 .

Two months later the inaugural meeting of the Informal CSTAT workgroup on stronger collaboration on Statistical Information Management Systems (CSTAT is the OECD Committee on Statistics) identified an essential role for GSIM in providing a consistent reference model when defining information required to drive statistical production processes, and output from these processes.

Six NSIs participated in this inaugural meeting

       Statistics Sweden (SCB)

       Statistics Norway (SSB)

       Statistics New Zealand (SNZ)

       Statistics Canada (StatCan)

       Office for National Statistics (ONS)

       Australian Bureau of Statistics (ABS)

Recommended ways forward identified by this workgroup were reviewed and endorsed by the Directors General of the participating agencies.

Operationalizing GSIM was agreed to be the highest priority strategic enabler of efficient and effective collaboration in the development and sharing of statistical information management systems.   This operationalization was seen as progressing through associating GSIM with, for example, a commonly agreed to representation in XML.  Rather than redundantly investing huge amounts of time and money developing such a representation “from a clean sheet of paper”, the approach would harness existing standards based representations wherever fit for purpose.    

The workgroup identified SDMX and DDI-L (DDI-Lifecycle) as the key starting points in this regard.  By relating GSIM to the information models associated with SDMX and DDI (neither of which covers the anticipated full scope of GSIM), a path to operationalizing GSIM which draws extensively on SDMX ML and DDI-L would be available.

The establishment and operationalization process as a whole was termed OCMIMF (Operationalize a Common Metadata/Information Management Framework) by the workgroup.  GSIM was defined as the initial major output which then enables operationalization steps to be undertaken.

The OCMIMF “Opportunity Statement” as agreed by the inaugural workgroup meeting is attached to this document as Annex 1.  More information about the collaborations, including OCMIMF, which were agreed at the inaugural workgroup meeting, is available via the MSIS wiki .

While participants in the workgroup committed to progressing work on OCMIMF “with pace and passion” they recognized this effort needs to be co-ordinated with other collaborative forums and initiatives and benefit from input from these.  Prominent examples include

       The METIS (Statistical Metadata) Group , jointly convened by UNECE, Eurostat and OECD and reporting ultimately to the Conference of European Statisticians

       Relevant ESSnets (Collaborative ESS - European Statistical System – Networks), such as
 

o         CORA (Common Reference Architecture) and its successor CORE (Common Reference Environment)
 

o         SDMX ESSnet , especially Work Package (WP) 2 related to MCV (Metadata Common Vocabulary) Ontology

The recently released strategic vision document from HLG-BAS (High Level Group for Strategic Directions in Business Architecture in Statistics) discusses the role of GSIM and GSBPM (Generic Statistical Business Process Model) in supporting the “industrialisation” of statistics.  This vision is to be presented for discussion at the 59th Plenary Session of the Conference of European Statisticians (14-16 June 2011, Geneva).

 

 

Proposed broad structure of GSIM

 

There are often references to parallels between GSBPM and GSIM.  The two are intended to work together, with the former providing a reference model for statistical business processes and the latter providing a reference model for information input to, used by and produced by those processes.

 

The documentation of V4.0 (eg para 11) highlights that (at least at this time) GSBPM does not formally model attributes which are required in order to “operationalize” business processes in practice.  This approach is entirely consistent with the (currently) agreed scope and intent of the GSBPM.

 

A key aim, however, is to consistently operationalize GSIM and to harness it to support “designed in” (rather than “attempted after the fact”) interoperability between new methodological and new technical developments undertaken within the community/industry of producers of official statistics.

 

This is seen as requiring GSIM to include an additional layer of detail compared with GSBPM.  This would provide more formal “reference semantics”.

 

The figure on the following page illustrates this idea, where GSIM spans both the Reference Model Layer and the Information Model Layer.  The “operationalization” of GSIM (eg mapping to representations in SDMX and DDI-L) then supplies consistent connection between the Information Model Layer and the Physical Implementation Layer.

 

GSIM itself should remain distinct from, but connected to, recommended “operationalization” of GSIM (eg representation, physical implementation).  The recommended means of “operationalizing” GSIM may change over time as technical standards and business practices commonly employed by producers of official statistics in their production processes evolve over time.

 

While GSIM itself may also need to evolve over time (probably following the same pattern as GSBPM in evolving more rapidly at first then achieving greater stability) it is important to its utility and uptake that it remains as stable as possible as a reference model.  It should only change when statistical production processes of the future require, or produce, conceptually different/additional information – not when a next generation of processes require fundamentally the same information but in a different format (eg using a different technical standard).     

 

 

Common Reference Model (CRM) Layer

 

The Business Communication Diagram is intended as a simple (not technical) diagrammatic representation of GSIM.  It is analogous to the much referred to diagram of GSBPM provided in Section IV of the documentation of GSBPM V4.0.  For many business staff the diagram is their “day to day reference” in regard to GSBPM.  Such staff members seldom (if ever) refer to the more detailed documentation.  (It is debatable whether this is a good or bad thing – but it is a reality in any case.)

 

Similarly to GSBPM it is expected that GSIM will consist of multiple levels.  The diagram for GSBPM, for example, shows

       Level 0 (statistical business process)

       Level 1 (nine phases of the statistical business process)

       Level 2 (sub-processes within each phase)

 

The documentation of GSBPM is seen as incorporating Level 3 in the form of descriptions of each sub-process.

 

The OCMIMF collaboration team is currently exploring different ways to describe and represent Level 1 and Level 2 of GSIM.  Several possible representations have been identified by participants already.  For example,
 

       one approach has been based on briefly reviewing information models associated with SDMX, DDI-L, the MetaNet initiative, efforts by NSIs to define enterprise level information models etc and then identifying a consolidated superset of high level concepts/constructs from these
    

       another approach has been based on reviewing the text of the description of the GSBPM and identifying the information objects that it mentions as being associated with statistical business processes 

 

Considering the top layer of GSIM from several perspectives (including checking it against the information objects that NSIs refer to in their actual systems/applications used to support the statistical production process) is seen as vital.

 

It is anticipated that during May 2011 early thoughts on the CRM Layer will be made available by the OCMIMF team and they will encourage review and input from all agencies and initiatives which have an interest in GSIM.

 

A key issue at the CRM Layer is terminology used to refer to, and describe, the information objects within the model.  This requires a two level approach.

 

For example, if Agency A refers to “data elements”, Agency B refers to “variables” and Agency C refers to “data items” then
 

1.       are the definitions used by the agencies consistent enough in meaning/concept to be represented by a single “box” in the common reference model layer?
 

2.       if so, which label should be placed on the box?
 

Similarly to the success of the GSBPM, it needs to be recognised that this is a Common Reference Model.  It does not require that each agency commits to “enforcing” each selected term as the preferred terminology for that agency’s internal purposes and that all existing internal documentation, repositories and user interfaces be updated accordingly.

 

The aim is to focus on agreeing the information objects (concepts) within the model and on a set of terminology that is acceptable for reference purposes – rather than ideal from every perspective.  Such an approach is necessary and appropriate in order to achieve practical progress within a reasonable timeframe.

 

 

Semantic Reference Model (SRM) Layer   

 

Defining and agreeing the SRM can progress most efficiently and effectively once there is an agreed working draft of the CRM Layer.  The CRM sets out the high level information objects (including “common reference terms” to be used when referring to them) which will be defined in more formal, technical detail in the SRM.  Attempting to progress development of the SRM without a foundation of common thinking in regard to high level objects and terminology would lead to a lot of talking at cross purposes.

 

That said, refinement of the GSBPM took several years.  It is not necessary or appropriate to delay commencement of work on the SRM until the CRM has evolved to the same level of maturity as the GSBPM.

 

The intent is to commence work based on the initial agreed working draft of the CRM Layer.  In effect, this first draft provides a basic (but essential) “common vocabulary” for the team in progressing work on the SRM.   If/when there are subsequent changes to the CRM layer (eg changes to high level objects and/or to the terms used to refer to them) these should be readily factored into the SRM work.
 

It is recognised there are a range of existing well developed reference points for work on the SRM.  These include

       information models associated with SDMX and DDI-L,

       MCV Ontology work undertaken within the SDMX ESSnet,

       outputs from the MetaNet initiative, and

       efforts by NSIs to define enterprise level information models 

             

The agreed working draft of the CRM Layer will provide a framework for

       relating these existing reference points to the context of GSIM,

       harmonising across existing reference points where appropriate for the purpose of GSIM, and

       identifying gaps that need to be addressed in the context of GSIM

It is recognised that many of these reference points took several years to research, design, evaluate and refine.  Given this wealth of existing intellectual investment and experience is available as a starting point, however, it is expected than an initial (potentially partial) draft version of the SRM will be available before the end of 2011.

 

 

Proposed Stakeholder Engagement

 

The draft stakeholder engagement timeline recognises the importance of sustained engagement with other collaborative forums and initiatives such as METIS and the CORE ESSnet.  For example, starting from May 2011 it is currently planned there will be two rounds of engagement with these groups in order to “road test” ideas (at different levels of detail) related to the Common Reference Model before the team formally submits a draft of the CRM to the Directors General who commissioned the work.

 

Similarly active engagement is expected to then follow in regard to the Semantic Reference Model.

 

A more general approach to awareness raising and engagement will target forums such as the SDMX Global Conference, MSIS, IASSIST and ISI.

ANNEX 1 : OCMIMF “Opportunity Statement”

The following opportunity statement was agreed at the inaugural meeting of the Informal CSTAT workgroup on stronger collaboration on Statistical Information Management Systems

 

#

OPPORTUNITY  

 

3.

 

OPERATIONALISE A COMMON METADATA/INFORMATION MANAGEMENT FRAMEWORK

             

MAJOR INITIATIVES

Major Initiative One

Develop a generic statistical information model (GSIM)

       Develop requirements (what the model is supposed to achieve)

       Investigate existing solutions and evaluate them against predefined criteria (incl. SDMX & DDI)

       Build a draft model and test against real information/processes

       Present the model to relevant formal and informal communities

       Develop a plan for adoption and implementation

 

Major Initiative Two

Map SDMX and DDI to the GSIM framework

 

Major Initiative Three

Update and evolve the standards

 

Major Initiative Four

Implement SDMX and DDI in a test situation

 

Major Initiative Five

Operationalise the use of metadata as a driver for business processes (GSBPM)

- experimentation

 

 

STATISTICAL NETWORK BENEFITS

* Makes production process more effective       * A small step for Statisticians, a giant step for Statistics    * Enables sharing      * Key enabler for new information solutions

* Foundation for long term savings and operating costs      * Increase user value