Generic Statistical Information Model (GSIM):

Specification

 

(Version 1.1, December 2013)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

About this document

This is aimed at metadata specialists, information architects and solutions architects. This document includes descriptions of information in a statistical organization. There are also a number of annexes, which include information about Exchange Channels, a glossary, UML class diagrams and the GSIM extension methodology.

 

 


 

I. Introduction               3

II. Information in the Statistical Business Process               5

A. Identifying and Evaluating Statistical Needs               5

B: Designing and Managing Statistical Programs               6

C: Designing Processes               7

D: Running Processes               12

E: Exchanging Information               13

F: Collecting Information               15

G: Processing and Analyzing Information               15

H: Disseminating Information               17

III. Foundational Information               19

A: Concepts               19

B. Population               20

C. Node and Node Set               21

D. Statistical Classification               23

E. Variable               25

F. Represented Variable               25

G. Instance Variable               27

H. Information Resources               27

I. Data Sets               28

J. Dimensional and Unit Data Structures               30

K. Referential Metadata Sets               31

IV. Technical information               33

A: Identity and Administrative Details               33

B. Information Providers, Information Consumers, Organizations, and Individuals               34

Annex A: Exchange Channels               35

A: Administrative Registers               35

B. Web Scraping for Data Collection               36

C: Survey Data Collection               38

Annex B: Glossary               41

Annex C: UML Diagrams               70

Annex D:     Extending the model               240

A.   GSIM Extension Methodology               240

B.   Administrative Attributes               244

Figure 1. Identify and Evaluate Statistical Needs               5

Figure 2. Design and Manage Statistical Program               6

Figure 3. Process Steps can be as large or small as needed               8

Figure 4. Design Processes               9

Figure 5. Use of re-usable Business Services               11

Figure 6. Run Process               12

Figure 7. Exchange Channels               13

Figure 8. Exchange Channels for collecting information               15

Figure 9. Exchange Channel for disseminating information               17

Figure 10. Concepts               19

Figure 11. Populations and Units               20

Figure 12. Node and Node Set inheritance               21

Figure 13. Statistical Classifications               23

Figure 14. Variable               25

Figure 15. Represented Variable               26

Figure 16. Instance Variable               27

Figure 17. Information Resources               27

Figure 18. Data Sets               28

Figure 19. Data Structures               30

Figure 20. Referential Metadata Sets               31

Figure 21. Identifiable Artefacts               33

Figure 22. Agents               34

Figure 23.Administatrative Register               35

Figure 24.Web Scraping Channel               37

Figure 24.  Questionnaire               38

 

 

 

 

I. Introduction

1.         The GSIM Specification is the most detailed level of the Generic Statistical Information Model (GSIM). It provides a set of standardized, consistently described information objects, which are the inputs and outputs in the design and production of statistics. Each information object is been defined and its attributes and relationships are specified. GSIM is the result of a collaboration involving statistical organizations across the world in order to develop and maintain a generic reference model suitable for all organizations and meet the strategic goals (in particular the modernization effort) of the official statistics community.  For contextual information, an introduction to GSIM and information on using GSIM, please refer to the GSIM Brochures, Communication and Implementing GSIM documents.

 

2.               There is a widespread interest across statistical organizations in being able to trace how statistical information (for example, data and metadata) "flow" through statistical business processes (into and out of processes). Interested parties include broad statistical systems (like the European Statistical System), National Statistical Systems (both centralized and decentralized) and smaller task teams working inside National Statistical Offices.

 

3.               GSIM covers the whole statistical process and is designed to support both current and new ways of producing statistics. Section II describes tasks (for example identifying statistical needs, managing statistical programs, dissemination) which statistical organizations undertake and how the model describes the information flows in those tasks. This section also contains descriptions of designing and running processes to show how GSIM has models the explicit separation between the design and execution of statistical processes.

 

4.               There is an increasing business need to record reliable, structured information about the processes used to produce specific statistical outputs. In order to maximize transparency and reproducibility of results, it is important for a statistical organization to understand the processes it undertakes and their inputs and outputs. Section III describes the foundational information objects (that is, the conceptual and structural metadata objects) that are used as inputs and outputs in a statistical business process.

 

5.               There are a number of technical information objects in GSIM.  These objects are the fundamental building blocks that support many of the other objects and relationships in the model. They provide features which are reusable by other objects to support functionality such as identity, versioning etc. These objects are described in Section IV of this document.

 

6.               This document provides a description of GSIM in the context of a statistical organization. It has a number of annexes which provide further details for the reader. These annexes are:

  • Annex A: Exchange Channels - This annex provides further information about the three subtypes of Exchange Channel focused on data collection.
  • Annex B: Glossary - The annex gives readers definitions and explanatory descriptions for the GSIM information objects.
  • Annex C: UML diagrams - This annex includes all detailed UML models of GSIM.
  • Annex D: Extending the model - This annex provides information for implementers on how to extend GSIM for organization specific purposes. It also contains the set of recommended attributes for the administration of GSIM objects.

 

7.               Note: GSIM information objects have been given in italics in the descriptions that follow. The diagrams included in this section are stylized representations of the model. The colours of the boxes in diagrams represent which group the information object belongs to (Blue for Business Group, Red for Exchange Group, Green for Concepts Group, Yellow for Structures Group and Orange for the Base Group). In many cases there is more detail to be found in the UML. Detailed information on each information object in the model, including a glossary and UML class diagrams can be found in Annexes B and C of this document.

 

 

II. Information in the Statistical Business Process

 

8.               This section looks at different ways that information objects are used within the statistical business process. It considers eight different scenarios, identifying the information objects used and the relationships between those objects.

 

A. Identifying and Evaluating Statistical Needs

 

Figure 1. Identify and Evaluate Statistical Needs

 

9.               An organization will react and change due to a variety of needs. A Statistical Need presents itself to the statistical organization in the form of an Environment Change or an Information Request .

 

10.               Environment Change indicates that there needs to be an externally motivated change. This may be specific to the organization in the form of reduced budget or new demands from stakeholders, or may be a broader change such as the availability of new methodology or technology.

 

11.               When an organization receives an Information Request this will identify the information that a person or organization in the user community requires for a particular purpose. This community may include users within the organization as well as external to it. For example, a the team responsible for compiling National Accounts may need a new Business Process to be initiated to produce new inputs to their compilation process.  This request will commonly be defined in terms of a Subject Field that defines what the user wants to measure. When an Information Request is received it will be discussed and clarified with the user. Once clarified, a search will be done to check if the data already exist. Discovering these Data Sets may be enabled by searching for Concepts and Classifications . Each of these activities are described by a Process Step.

 

12.               The Statistical Need - whether an Information Request or Environmental Change - will be formalized into a Change Definition, typically created by a Statistical Support Program (a "statistical change program"). The Change Definition identifies the specific nature of the change in terms of its impacts on the organization or specific Statistical Programs or Statistical Support Programs. This Change Definition is used as an input into a Business Case. A successful outcome will either initiate a new Statistical Program or a new Statistical Support Program that will create a new Statistical Program Design that redefines the way an existing Statistical Program is carried out.

 

13.               A Statistical Need can also be internally driven. At any point in the statistical business process, an organization may undertake an evaluation to determine utility or effectiveness of the business process or its inputs and outputs. An Assessment will be undertaken to evaluate any resources, processes or outputs and may refer to any object described in the model.  Assessments include gap analyses undertaken in the context of Business Cases and evaluations undertaken to determine whether a statistical output meets the need for which it was first created.

 

B: Designing and Managing Statistical Programs

Figure 2. Design and Manage Statistical Program

 

14.               A statistical organization will respond to a perceived Statistical Need by creating a Business Case . Responding to the Business Case will involve one of three things: the creation of a new Statistical Support Program , the creation of a new Statistical Program , or the evolution of an existing Statistical Program Design to be implemented by an existing Statistical Program .

 

15.               Statistical Support Programs undertake the activities of the statistical organization such as statistical change programs, data management programs, metadata management programs, methodological research programs, etc. A good example is a program which manages classifications.

 

16.               Statistical Programs are those programs that an organization undertakes to produce statistics (for example, a retail trade survey). Statistical Programs are cyclical - they perform cycles of collection, production and dissemination of products. Each such cycle is represented by a Statistical Program Cycle object.  The Statistical Program Cycle is a repeating activity to produce statistics at a particular point in time (for example, the retail trade survey for March 2012).

 

17.               Statistical Programs require Statistical Program Designs to achieve their objectives. These designs cover the design of all activities to be undertaken, notably at the level of Business Processes . Within a Statistical Program Cycle , several Business Processes would typically be performed. These can be understood to correspond to the processes and sub-processes found in the Generic Statistical Business Process Model (GSBPM). These Business Processes may be repeated within a cycle. Each iteration can be made up of multiple activities of the same or different types. As an example of this, within a single cycle, the Statistical Program might perform three iterations of data collection and processing, then analyze the data and disseminate the resulting statistical Products . Each of these activities could be understood to be a separate Business Process .

 

18.               The Statistical Program Design specifies the way in which Business Processes will be conducted. This includes the use of re-usable Business Services (possibly sourced from outside the statistical organization), or through the design and use of more traditional processes. In the latter case, Process Design objects would be used to specify Process Steps . (Although re-usable Business Services are also specified by Process Designs and Process Steps , these will already exist, and not need new design work as part of the Statistical Program Design .)

 

19.               It should be noted that Statistical Program Designs specify what Process Steps will need Process Designs , and also which Business Services would be used, but do not do the low-level specification of how such Process Steps and Business Services are executed. These specifications are found in the Process Design object.

 

C: Designing Process Steps

 

20.               Before explaining the objects which GSIM uses to represent the design of Process Steps , it is important to discuss the nature of processes more generally. The types of objects provided by GSIM perform specific functions. In GSIM, Business Processes have Process Steps.  Each P rocess Step can be as "large scale" or "small scale" as the designer of a particular Business Process chooses (see Figure below).

 

/stat/platform/download/attachments/96209110/process%20levels.png?version=2&modificationDate=1386744560078&api=v2

 

Figure 3. Process Steps can be as large or small as needed

 

21.               Process Steps can contain "sub-steps", those "sub-steps" can contain further "sub-steps" within them and so on indefinitely. Typically, the outputs of one Process Step become inputs to the next Process Step . There can also be conditional flow logic applied to the sequence of Process Steps , based on parameters which have been passed in, or conditions met by the outputs of a previous Process Step .

 

22.               The design of a Process Step thus can be understood to use other Process Steps and even other Business Services which have already been designed and made available for re-use. In a more traditional scenario, the Process Step is designed and then executed. In future, it is foreseen that re-usable Business Services will be increasingly common, having been designed and implemented by another external organization. The next sections describe these two scenarios.


i. Designing Process Steps

 

 

Figure 4. Design Process Steps

 

23.               A Statistical Program Design is associated with a top level Process Step whose Process Design contains all the sub-steps and process flows required to put that Statistical Program into effect. Each Process Step in a statistical Business Process has been included to serve some purpose. This is captured as the Business Function associated with the Process Step. An example of a Business Function could be "impute missing values in the data". In order to support this Business Function , an imputation process is needed, which will require a Process Design .

 

24.               In line with the GSIM design principle of separating design and production, GSIM assumes that Process Steps will be designed during a design phase. Having divided a planned statistical Business Process into Process Steps , the next requirement is to specify a Process Design for each step. The Process Design identifies how each Process Step will be performed. A Process Design may use a Process Pattern which is a nominated set of Process Designs and associated flows ( Process Control Designs ) which have been highlighted for reuse.

 

25.               Process Designs specify several things: they identify the different types of inputs and outputs represented by the Process Input Specification and Process Output Specification . Examples of Process Inputs include data, metadata such as Statistical Classifications , imputation and editing Rules , parameters, etc. Process Outputs can be reports of various types (processing metrics, reports about data validation and quality, etc.), edited Data Sets , new Data Sets , new or revised instances of metadata, etc.

 

26.               To continue the example, the process designer would specify the inputs in the Process Input Specification as imputation Rules and the Data Set for which imputation is desired. The Process Output Specification would include an edited Data Set containing the imputed values, plus a report detailing which values had been imputed.

 

27.               The Process Design specifies the control logic, that is the sequencing and conditional flow logic among different sub-processes ( Process Steps ). This flow is described in the Process Control Design . When creating a Process Design, a Process Control Design that provides information on "what should happen next" is specified. Sometimes one Process Step will be followed by the same step under all circumstances. In such cases the Process Control Design simply records what Process Step comes next. However, sometimes there will be a choice of which Process Step will be executed next. In this case, the Process Control Design will detail the set of possible "next steps" and the criteria to be applied in order to identify which Process Step(s) should be performed next.

 

28.               The Process Design associated with that Process Step will identify the Process Method that will be used to perform the Business Function associated with the Process Step . For example, if the Business Function is 'impute missing values in the data', the Process Method might be 'nearest neighbour imputation'.

 

29.               A Process Method specifies the method to be used, and is associated with a set of Rules to be applied. For example, any use of the Process Method 'nearest neighbour imputation' will be associated with a (parameterized) Rule for determining the 'nearest neighbour'. In that example the Rule will be mathematical (for example, based on a formula). Rules can also be logical (for example, if Condition 1 is 'false' and Condition 2 is 'false' then set the 'requires imputation' flag to 'true', else set the 'requires imputation flag' to 'false').

 

30.               The resulting Process Design and Process Control Design objects (along with related Process Input Specifications and Process Output Specifications ) would be used in the implementation of the Process Step .

 


ii. Using Re-Usable Business Services

 

 

Figure 5. Use of re-usable Business Services

 

31.               It is not always necessary for the S tatistical Program to design its own Process Steps from the beginning. The Common Statistical Production Architecture (CSPA) describes how statistical organizations can create statistical services that are easily reused in other statistical organizations. In GSIM terms, a statistical service is a Business Service . A Business Service is a means of performing a Business Function (an ability that an organization possesses, typically expressed in general and high level terms and requiring a combination of organization, people, processes and technology to achieve).

 

32.               The increased sharing and reuse of Business Services means that the resources needed to meet new demands for statistical production could be considerably reduced, and the time needed to produce new statistical products could be lessened. To facilitate this, CSPA introduced the concept of a statistical services catalogue, where different statistical organizations could list the statistical services they have developed, with the intent of sharing them with other statistical organizations.

 

33.               Business Services have already been designed, with all of the normal input types, output types, process control design, and other properties already specified. Thus, a Business Service can act in a fashion similar to a Process Step designed within the organization, but without the effort required in the traditional scenario.

D: Running Processes

 

 

Figure 6. Run Process

 

34.               A Statistical Program needs to execute processes to realize some Business Functions . This can be done in two ways: a Process Step can be directly executed by a Business Process , or a re-usable Business Service can be used by the Business Process , as an intermediate trigger for the execution of the Process Step .

 

35.               In order to understand how this works, we characterize the nature of Process Steps in more detail. Process Steps are the resources which have been specified in a Process Design , and which can be executed multiple times. Process Steps can exist at many levels of granularity, and can involve the use of other Process Steps as sub-processes. The navigation among the sub-processes is performed during execution as indicated by a Process Control , which is itself an implementation of a Process Control Design .

 

36.               Individual executions of a Process Step are represented by the Process Step Instance . It is at this level that specific instances of the inputs and outputs are used. In the Process Design , the types of inputs and outputs are specified ( Process Input Specification and Process Output Specification ) - the actual instances of inputs and outputs are associated with the Process Step Instance , and are represented by the Process Input and Process Output objects. Inputs can be of any type of information - rules, parameters, data sets, metadata of many kinds, etc. Outputs are similarly of many different types, and often include process metrics and various types of reports, as well as data and metadata.

 

37.               At the time the Process Design is executed someone or something needs to apply the designated method and rules. The Process Design can designate the Business Service that will implement the Process Method at the time of execution. A Business Service represents a service delivered by a piece of software (as described in the section above) or a person. Putting a publication on the statistical institute's website or putting collected response forms in a shared data source for further processing are both examples of Business Services .

 

38.               It should be noted that this model supports both automated and manual processes, and processes which might involve sub-processes of either type.

 

E: Exchanging Information

 

Figure 7. Exchange Channels

 

39.               Statistics organizations collect data and referential metadata from Information Providers , such as survey respondents and providers of Administrative Registers , and disseminate data to Information Consumers , such as government agencies, businesses and members of the public. Each of these exchanges of data and referential metadata uses an Exchange Channel , which describes the means to receive (data collection) or send (dissemination) information. Information Providers and Information Consumers can be Organizations or Individuals who are either within or external to the statistical organization.

 

40.               Different Exchange Channels are used for collection and dissemination. Examples of collection Exchange Channels include Questionnaire , Web Scraper Channel and Administrative Register . The only example of a dissemination Exchange Channel currently contained in GSIM is Product . Additional Exchange Channels can be added by organizations depending on their needs.

 

41.               The use of an Exchange Channel is governed by a Provision Agreement between the statistics office and the Information Provider (collection) or the Information Consumer (dissemination). The Provision Agreement , which may be explicitly or implicitly agreed, provides the legal or other basis by which the two parties agree to exchange data. The parties also use the Provision Agreement to agree the Data Structure and Referential Metadata Structure of the information to be exchanged.

 

42.               The mechanism for exchanging information through an Exchange Channel is specified by a Protocol (e.g. SDMX web service, data file exchange, face to face interview).

 

43.                To collect data, a statistical organization receives data and referential metadata from the Information Provider in a manner consistent with the Protocol and the Provision Agreement , and the Exchange Channel produces an Information Set . To disseminate data, the Exchange Channel consumes an Information Set , which is then provided to the Information Consumer in a manner consistent with the Protocol and the Provision Agreement . More information about collection and dissemination can be found in the following sections.

 


F: Collecting Information

 

 

Figure 8. Exchange Channels for collecting information

 

44.               GSIM models three collection Exchange Channel examples: Questionnaire , Web Scraper Channel and Administrative Register . Each of these is detailed in Annex A. Statistics organizations may collect data and referential metadata from Information Providers using additional Exchange Channels , such as file transfer, web services and data scanning. Statistical organizations can extend GSIM to add channels relevant to their context.

 

45.               The use of an Exchange Channel for collection is governed by a Provision Agreement between the statistical organization and the Information Provider . The two parties use the Provision Agreement to agree the Data Structure and Referential Metadata Structure of the data to be exchanged. The mechanism for collecting information through the Exchange Channel is specified by a Protocol (e.g. face to face interview, data file exchange, web robot). The collecting organization uses the collected information to produce an Information Set , which may contain data or referential metadata.

 

G: Processing and Analyzing Information

 

46.               GSIM is very flexible in describing the processing and analysis of information.

 

47.               One can understand the statistical production process from a data-centric perspective [1] . Statistical organizations strive to produce high-quality accurate data that is supported by the metadata needed to make the data optimally useful. For this reason, it is appropriate to think of the evolution of data as it passes through the production process. The focus of many activities is driven by the metadata, but at the end of the production process the metadata is a supporting resource from the perspective of the data and ultimately a statistical product. The relationship of the data and metadata is one which is important to understand.

 

48.               Collected data comes into a statistical organization through an Exchange Channel . Regardless of how the data is collected and where it comes from, it is a resource which will begin a process of evolution through many different stages. The initial data is described as a Data Set with relevant Data Structures. Data Sets are stored in an organised way in a Data Resource. The Data Sets are the primary inputs and outputs of a set of Process Steps , as conducted by a Statistical Program .

 

49.               As the statistical organization moves from raw input data to an increasingly refined set of data, it can be understood that each phase of this processing adds additional Datasets to the Data Resource . There are many different Process Methods which may inform these activities. These are implemented through the different Process Steps that the statistical organization undertakes.

 

50.               At a certain point (and this can take place at different places within the production process, depending on the type of edits being performed) the data will be analysed for the production of statistical Products . The analysis of the data can be understood as using Data Sets from the Data Resource as inputs to processes such as confidentiality routines or to produce explanations of the data. The operations performed during analysis will vary based on what the ultimate Products are - confidentialised unit-record data may be a Product , or we may be publishing aggregated indicators and tables to address specific policy issues, and these involve different types of analysis - but the process is still one of further evolving the information held in the Data Resource .

 

51.               In the past, there was an assumption that a data collection will be followed by processing, analysis, and dissemination of the statistical Products . This is a time-consuming and resource-intensive process. One way to make the functions of a statistical organization more efficient is to re-use data to produce new Products as they are asked for, lowering the cost and shortening the time needed for production. In this sense, the Data Resource can be understood as an organizational asset, to be managed and exploited to the greatest extent possible.

 


H: Disseminating Information

 


 

Figure 9. Exchange Channel for disseminating information

 

52.               A statistical organization disseminates statistical information to an Information Consumer.

 

53.               The Information Consumer accesses a set of information via a Product (or potentially via another Exchange Channel ), which contains one or more Presentations. Each Presentation will typically provide a view of data and associated metadata to define and describe the structure of the presented data, and perhaps referential metadata in the form of textual media, such as quality reports.

 

54.               A Presentation can take different forms - for example, it could be a screen visualization of a table of data in graphical form displayed in an HTML page, a downloadable PDF, or an SDMX file in XML format.

 

55.               An Output Specification defines what is contained in the Presentation . A Product, which packages Presentations , may be a statistical organization’s standard specific output as one might see in:

  • a regular statistical bulletin (e.g. a monthly publication of the Retail Prices Index),
  • a dynamically generated package of statistical content which is generated following the receipt of a query from an Information Consumer who wishes to access the organization’s data via a published API (Application Programming Interface) or
  • some data exploration facility which might be built into the statistical organization’s website.

 

56.               The Output Specification also defines the information required from the Information Set for the Presentation . The specifications are frequently determined by an internal (to the organization) process which would have specifically standard, static outputs to produce (such as the aforementioned statistical bulletins).  For dynamically delivered products, aspects of the specification could be determined by the Information Consumer at run time, via machine to machine dynamic, as exemplified in the API scenario above. In either case, the requests would result in the Output Specification specifying Information Set data and/or referential metadata that will be included in each Presentation .

 

57.               The mechanism for providing a Product is specified by a Protocol (e.g. SDMX-ML, DDI XML, PDF etc.). This formatting information forms part of the Output Specification to generate the Product and its Presentations in the appropriate format.

 

58.               The Information Consumer can be one of many forms depending upon the scenario of the request. The Information Consumer could be a person accessing the statistical organization’s website and visually inspecting the contents of a web page, or it could be a computer program requesting the information via an API using an SDMX query. The Information Consumer's access to the information would be subject to a Provision Agreement , which would set out the conditions of access and use. This might be in the form of passive acceptance of the terms and conditions of use of the data from a website the Information Consumer is accessing, or in the case of access to a greater level of detail via an API, it might be a more involved registration process.

 

 

III. Foundational Information

 

59.               The GSIM Concepts and Structures groups include information objects which are foundational to the statistical Business Process . That is, these objects are the conceptual and structural objects which are used as the Process Inputs and Process Outputs to the process. The Concepts area of GSIM includes the sets of information objects that describe and define the terms used when talking about the real-world phenomena that the statistics measure in their practical implementation. The Structures area includes the set of information objects used in relation to data and referential metadata and their structures. The objects described in this section of the document are used to provide information that helps users of data and metadata understand the results of Business Processes and Statistical Programs .

 

A: Concepts

 

 

Figure 10. Concepts

 

60.               At an abstract level, a Concept is defined in GSIM as a 'unit of thought differentiated by characteristics'. Concepts are used in different ways throughout the statistical lifecycle, and each different role of a Concept is described using a different information object (which are subtypes of Concept ). A Concept can be used in these situations:

 

(a) As a characteristic. The Concept is used by a Variable to describe the particular characteristic that is to be measured about a Population . For example, to measure the Concept of gender in a population of adults in the Netherlands, the Variable combines this Concept with the Unit Type person.

(b) As a Unit Type or a Population. To describe the set of objects that information is to be obtained about in a statistical survey. For example, the Population of adults in Netherlands, based on the Unit Type of persons.

(c) As a Category to further define details about a Concept . For example, Male and Female for the Concept of Gender. Codes can be linked to a Category via a Node (i.e., a Code Item or Classification Item ), for use within a Code List or Statistical Classification .

 

61.               Concept Systems are sets of Concepts which are structured by the relations between those Concepts . A Subject Field groups Concept Systems on the basis of their field of special knowledge (for example, labour market, tourism).

 

B. Population

 

 

Figure 11. Populations and Units

 

62.               There are several kinds of Populations depending on what Process Step it is used in. For example a statistical organization may refer to a target, survey, frame, or analysis population. The objects of interest in a statistical process are Units (for example, a particular person or a business). Data are collected about Units . There are two ways in which a unit is specified in the model. A Unit is an individual entity associated with a Population about which information may be obtained. A Unit Type (for example persons or businesses) is a way of identifying an abstract type of Unit that a Variable is measuring.

 

 

 

 

 

 


C. Node and Node Set

 

 

Figure 12. Node and Node Set inheritance

 

63.               A Category is a particular type of Concept whose role is to define a characteristic. There are three ways in which a Category can be used. In GSIM, these are described as the three subtypes of Node - Category Item, Code Item and Classification Item. Categories are grouped into Node Sets based on the way in which it can be used. There are three subtypes of these groups ( Node Sets ) - Category Sets , Code Lists and Statistical Classifications .

 

64.               A Category Set is a set of Category Items , which contain the meaning of a Category without any associated representations. An example of a Category Set is: Male, Female.

 

65.               In a Code List , the Code Items contain the meaning of the Categories combined with a Code representation. An example of a Code List is: 1. Male, 2. Female.

 

66.               A Statistical Classification is similar to a Code List. It combines the meaning of the Category with a Code representation. However the content of a Statistical Classification must fulfil certain criteria and have a certain status. The Classification Items must be mutually exclusive and jointly exhaustive for the Level at which they exist at in the Statistical Classification . An example of a Statistical Classification is: 1. Male, 2. Female, 3. Intersex.

 

67.               A Code List does not have to satisfy the same criteria as the Statistical Classification . The Code List can also contain additional Code Items to support a particular use of the Code List , such as the inclusion of missing values.

 

68.               The similarities between Statistical Classifications , Code Lists and Category Sets are inherent through their link (as subtypes) to Node Set . Similarly, the three types of item which make up each group ( Classification Item , Code Item and Category Item respectively) are subtypes of Node .

 


D. Statistical Classification

 

69.               This section describes a Statistical Classification and its related management objects, as a particular view of the Node Set portion of GSIM. Further detail about Statistical Classifications in particular can be found in the GSIM Statistical Classification Model.

 

 

Figure 13. Statistical Classifications

 

70.               The figure above provides an overview of the objects relating to Statistical Classifications .

 

71.               A Classification Family is a group of Classification Series related based on a common Concept (e.g. economic activity). A Classification Series is an ensemble of one or more Statistical Classifications that are based on the same Concept . The Statistical Classifications in a Classification Series are related to each other as versions or updates. Typically, these Statistical Classifications have the same name, for example International Standard Industrial Classification of All Economic Activities (ISIC), or International Standard Industrial Classification of Occupations (ISCO).

 

72.               A Statistical Classification is a set of Categories which may be assigned to one or more Represented Variables used in the production and dissemination of statistics. The Categories at each Level of the classification structure must be mutually exclusive and jointly exhaustive of all objects/units in the population of interest. One example of a Statistical Classification is ISIC rev 4.

 

73.               The Categories are defined to reference one or more characteristics of a particular population of interest. A Statistical Classification may have a flat, linear structure or may be hierarchically structured, such that all Categories at lower Levels are sub-categories of a Category at the next Level up.

 

74.               A Statistical Classification has Categories that are represented by Classification Items . These Classification Items are organised into Levels determined by the hierarchy. A Level is a set of Concepts that are mutually exclusive and jointly exhaustive; for example: section, division, group and class in ISIC rev 4.

 

75.               A Classification Item combines the meaning from a Category , its representation (i.e., Code ) and additional information in order to meet the Statistical Classification criteria, for example "A- agriculture, forestry and fishing" and accompanying explanatory text such as information about what is included and excluded.

 

76.               Statistical Classifications can be versions or variants. A variant type of Statistical Classification is based on a version type of Statistical Classification. In a variant the Categories of the version may be split, aggregated or regrouped to provide additions or alternatives to the standard order and structure of the original Statistical Classification .

 

77.               A Correspondence Table is a set of Maps . These Maps link a Classification Item in a Statistical Classification with a corresponding Classification Item in another Statistical Classification via the Concept which is common to both Classification Items . For example, in a Correspondence Table displaying the relationship between ISIC rev 4 and the North American Industry Classification System (NAICS 2007 (US)), "0112 - Growing of Rice" in ISIC Rev 4 is related to "111160 - Rice Farming" in NAICS through the common concept of "growing rice".

 

78.               A Classification Index shows the relationship between text found in statistical data sources (responses to survey questionnaires, administrative records) and one or more Statistical Classifications . A Classification Index may be used to assign the Codes for Classification Items to observations in Statistical Programs .

 

79.               A Classification Index Entry is a word or short text (e.g. the name of a locality, an economic activity or an occupational title) describing a type of Concept to which a Classification Item applies, together with the Code of the corresponding Classification Item . Each Classification Index Entry typically refers to one item of the Statistical Classification . Although a Classification Index Entry may be associated with a Classification Item at any Level of a Statistical Classification , they are normally associated with Classification Items at the lowest Level .

 

 

 

E. Variable

 

 

Figure 14. Variable

 

80.               When used as part of a Business Process , a Unit Type defining a Population is associated with a characteristic. The association of Unit Type and a Concept playing the role of a characteristic is called a Variable (see Figure 14). For example, if the Population is adults in Netherlands, then a relevant Variable might be the Concept educational attainment combined with the Unit Type person.

 

81.               The Variable (person’s educational attainment) does not include any information on how the resulting value may be represented. This information (the Value Domain ) is associated with the Represented Variable . This distinction promotes the reuse of a Variable definition when what is being measured is conceptually the same but it is represented in a different manner.

 

82.               A derived variable is created by a Process Step that applies a Process Method to one or more Process Inputs ( Variables ). The Process Output of the Process Step is the derived variable.

 

83.               A Conceptual Domain is associated with a Variable . It has two subtypes: Described Conceptual Domain and Enumerated Conceptual Domain . An Enumerated Conceptual Domain , in combination with a Category Set, contains information on the semantics of the Categories used by the Variable .

 

F. Represented Variable

 

84.               GSIM assists users in understanding both the meaning and the concrete data-representation of the object. Accordingly, GSIM distinguishes between conceptual and representation levels in the model, to differentiate between the objects used to conceptually describe information, and those that are representational.

 

 

Figure 15. Represented Variable

 

85.               The Represented Variable (see Figure 15) adds information that describes how the resulting values may be represented through association with a Value Domain . While Conceptual Domains are associated with a Variable, Value Domains are associated with a Represented Variable . These two domains are distinguished because GSIM separates the semantic aspect ( Conceptual Domain ) and the representational aspect ( Value Domain ).

 

86.               Both the Enumerated Value Domain and the Described Value Domain (the two subtypes of Value Domain ) give information on how the Represented Variable is represented. The Enumerated Value Domain does this in combination with a Code List, while the Described Value Domain provides a definition of how to form the values, rather than explicitly listing them.

 

87.               The Value Domain includes data type and unit of measure information. The data type contains information on the allowed computations one may perform on the Datum (nominal-, ordinal-, interval-data, etc.), while the unit of measure (Tonnes, Count of __, Dollars, etc.) refines the measure of the Value Domain . For example gender codes lead to nominal statistical data, whereas age values in years lead to interval data.


G. Instance Variable

 

Figure 16. Instance Variable

 

88.               An Instance Variable (see Figure 16) is a Represented Variable that has been associated with a Data Set . This can correspond to a column of data in a database. For example, the “age of all the US presidents either now (if they are alive) or the age at their deaths” is a column of data described by an Instance Variable , which is a combination of the Represented Variable describing "Person’s Age" and the Value Domain of "decimal natural numbers (in years)".

 

89.               A Datum is contained within a Data Point in a Data Set . It may be defined by the measure of a Value Domain associated with a describing Instance Variable , combined with the link to a Unit (for unit data), or a Population (for dimensional data).

 

H. Information Resources

 

 

Figure 17. Information Resources

 

90.               Statistical organizations collect, process, analyse and disseminate Information Sets , which are either data ( Data Sets ) or referential metadata ( Referential Metadata Sets ).

 

91.               Each Data Set must be structured according to a Data Structure (for example, a structure for Balance of Payments, Demography, Tourism, Education etc.). In the same way, a Referential Metadata Set must be structured according to a Referential Metadata Structure (e.g. an organization’s quality framework).

 

92.               Information Resources contain Information Sets . The main purpose of the Information Resource is to aid discovery and management of Information Sets , by providing location and other information relevant to these tasks. There are two types of Information Resource . Data Resources contain Data Sets , and Referential Metadata Resources contain Referential Metadata Sets .

 

I. Data Sets

 

 

Figure 18. Data Sets

 

93.               A Data Set has Data Points . A Data Point is placeholder (for example, an empty cell in a table) in a Data Set for a Datum . The Datum is the value that populates that placeholder (for example, an item of factual information obtained by measurement or created by a production process). A Data Structure describes the structure of a Data Set by means of Data Structure Components (Identifier Components, Measure Components and Attribute Components) . These are all Represented Variables with specific roles.

 

94.               Data Sets come in different forms, for example as Administrative Registers, Time Series, Panel Data, or Survival Data, just to name a few. The type of a Data Set determines the set of specific attributes to be defined, the type of Data Structure required ( Unit Data Structure or Dimensional Data Structure ), and the methods applicable to the data.

 

95.               For instance, an administrative register is characterized by a Unit Data Structure , with attributes such as its original purpose or the last update date of each record. It contains a record identifying variable, and can be used to define a Population that is used as a frame, to replace or complement existing surveys, or as an auxiliary input to imputation. Record matching is an example of a method specifically relevant for registers.

 

96.               An example for a type of Data Set defined by a Dimensional Data Structure is a time series. It has specific attributes such as frequency and type of temporal aggregation and specific methods, for example, seasonal adjustment, and must contain a temporal variable.

 

97.               Unit data and dimensional data are perspectives on data.  Although not typically the case, the same set of data could be described both ways.  Sometimes what is considered dimensional data by one organization (for example, a national statistical office) might be considered unit data by another (for example, Eurostat where the unit is the member state).  A particular collection of data need not be considered to be intrinsically one or the other. This matter of perspective is conceptual. In GSIM, the distinction is that a Unit Data Set contains data about Units and a Dimensional Data Set contains data about either Units or Populations.

 

98.               GSIM states that all Data Sets must have a structure associated with them. There are, however, cases where a Data Set has no structure – because it was not stored or lost, or it is not known. This type of data may become more prevalent for statistical organizations in the future. In order for a statistical organization to use this data, the data will need to go through a process of being structured. For example, in a case of investigation of new potential data sources for a new or changed Statistical Need , there will need to be a process where these new data are analyzed to determine their content and structure. It is only after this process that these new Data Sets can be described using the Data Structure objects. This unstructured data is currently described by GSIM as a Process Input . Organizations could extend GSIM to capture this use case by creating a new subtype of the Information Set object.

 


J. Dimensional and Unit Data Structures

 

 

Figure 19. Data Structures

 

99.               A Dimensional Data Structure describes the structure of a Dimensional Data Set by means of Represented Variables with specific roles.

 

100.               The combination of dimensions contained in a Dimensional Data Structure creates a key or identifier of the measured values. For instance, country, indicator, measurement unit, frequency, and time dimensions together identify the cells in a cross-country time series with multiple indicators (for example, gross domestic product, gross domestic debt) measured in different units (for example, various currencies, percent changes) and at different frequencies (for example, annual, quarterly). The cells in such a multi-dimensional table contain the observation values.

 

101.               A measure is the variable that provides a container for these observation values. It takes its semantics from a subset of the dimensions of the Dimensional Data Structure . In the previous example, indicator and measurement unit can be considered as those semantics-providing dimensions, whereas frequency and time are the temporal dimensions and country the geographic dimension. An example for a measure in addition to the plain 'observation value' could be 'pre-break observation value' in the case of a time series. Dimensions typically refer to Represented Variables with coded Value Domains ( Enumerated Value Domains) , measures to Represented Variables with uncoded Value Domains ( Described Value Domains ).

 

102.               A Unit Data Structure describes the structure of a Unit Data Set by means of Represented Variables with specific roles. It distinguishes between the logical and physical structure of a Data Set . A Unit Data Set may contain data on more than one type of Unit , each represented by its own record type.

 

103 . Logical Records describe the structure of such record types, independent of physical features by referring to Represented Variables that may include a unit identification (for example, household number). A Record Relationship defines source-target relations between Logical Records .

 

K. Referential Metadata Sets

Figure 20. Referential Metadata Sets

 

104.               Information that describes the characteristics of statistics is “referential metadata”. These metadata can be broad, such as about an entire Statistical Program , or narrow, such as about an individual Data Point . Referential Metadata Resources , a special type of Information Resource, provide top-level containers for referential metadata.

 

105.               A Referential Metadata Set organizes referential metadata, whose structure is defined in a Referential Metadata Structure. A Referential Metadata Structure specifies both the Referential Metadata Subject for which referential metadata may be included, and a structured list of Referential Metadata Attributes that can be reported or authored for the given Referential Metadata Subject .

 

106.               These subjects may be any GSIM object type, or any Data Point or set of Data Points created from a specific Data Structure.

 

  • Example of a GSIM object type as a Referential Metadata Subject : Product for which there is a list specified in a Value Domain . The Value Domain specifies the list of actual Products for which reference metadata can be reported or authored using this Referential Metadata Structure.
  • Examples of Referential Metadata Attributes include status, coverage, methodology description, and quality indicator.

 

107.               A Referential Metadata Set contains the actual referential metadata reported or authored . The Referential Metadata Subject Item identifies the actual object e.g. actual Product such as Balance of Payments and International Investment Position, Australia, June 2013, or actual Data Points such as the Data Points for a single region within a Data Set covering all regions for a country.

 

108.               The Referential Metadata Content Item is the actual metadata for the identified Referential Metadata Subject Item . Each Referential Metadata Content Item contains the reported referential metadata for one Referential Metadata Attribute specified in the Referential Metadata Structure .

 

Table 1. Example of Use of GSIM Referential Metadata Objects

 

GSIM Object

ONS Statistical bulletin: Public Sector Finances, October 2013: Table 1

Referential Metadata Structure

Implicit

Referential Metadata Subject

Data Structure Component

Referential Metadata Attribute

Table footnote

Referential Metadata Set

Footnotes

Referential Metadata Subject Item

Data Structure Component : billion; PS Current Budget; PS Current Budget ex APF;…

Referential Metadata Content Item

Footnoted text

 

 

IV. Technical information

 

109.               These objects can be seen as the fundamental building blocks that support many of the other objects and relationships in the model. They form the nucleus for the application of GSIM objects. They provide features which are reusable by other objects to support functionality such as identity, versioning etc.

 

110.               The GSIM Base Group consists of two sets of objects

 

  1. Those that give identity and administrative details that are re-usable by other information objects.
  2. Those that model the organizations and individual that may provide or consume data and referential metadata.

 

A: Identity and Administrative Details

Figure 21. Identifiable Artefacts

 

111.               The only base artefact in GSIM that gives underlying identity is the Identifiable Artefact . It can be inherited by any class in GSIM for which identity is required.

 

112.               There is no attempt in GSIM to model the administration of items in repositories such as the maintenance agency, versioning, repository functions. However, the Identifiable Artefact does have a link to Administrative Details where such details can be added using the GSIM extension methodology.

 


B. Information Providers, Information Consumers, Organizations, and Individuals

 

 

Figure 22. Agents

 

113.               Information Providers and Information Consumers are the respective sources for and the targets of data and referential metadata collection and dissemination. Each Agent can play the role of Information Provider or Information Consumer in a particular context of collection or dissemination. The same Agent may play the role of Information Provider in one context and the role of Information Consumer in another context. For any one Agent Role there must be a single Agent that plays the role: this is actual Organization or Individual that is the Information Provider or Information Consumer .

 

114.               If the Agent is an Organization then it is possible to specify the structure of the Organization in terms of sub Organizations or Individuals.

 


 

Annex A: Exchange Channels for Data Collection

 

115.               All data collection is modelled in GSIM using the Exchange Channel object, which represents the mechanism by which data comes into the statistical organization. This object is always extended into sub-classes, to describe specific sources of data collection. There is a growing emphasis on the use of non-survey data sources, as these often represent sources of data which can be realized more quickly and at lower cost. The model can be extended by adding further sub-classes to represent other sources, as required.

 

116.               Two common forms of data collection are the use of data from administrative registers, and the collection of data by programmatically "scraping" web sites for their content. To reflect this, GSIM models two non-survey data sources - Administrative Registers and Web Scraper Channels. It also models one survey data source - Questionnaire . The following sections describe how each of these is modelled in GSIM.

 

A: Administrative Registers

 

117.               In the illustration below, we show how GSIM can model administrative registers as data sources.  The sub-type of Exchange Channel which represents administrative register as a source is the Administrative Register object.

 

 

Figure 23.Administatrative Register

 

118.               The important information about Administrative Registers includes:

 

  • the agreement between the statistical organization and the provider of the register data,
  • the protocol for accessing the data, and
  • the structure of the data to be received.

 

119.               Each of these can be described by information objects which are inherited from the Exchange Channel by the subtype Administrative Register .

 

120.               The agreement between the statistical organization and the Information Provider is represented using a Provision Agreement object. This shows the relationship between our Administrative Register and the Organization with which the agreement exists (the Information Provider ). There is typically an agreed structure for the data - described in the data collection agreement - but this can sometimes be different from the structure of the data actually received. The Information Provider object has a relationship to the Data Structure object. This represents the agreed structure of the information to be collected from the administrative register.

 

121.               The Administrative Register object also inherits a relationship to a Protocol object from its parent Exchange Channel. The Protocol object captures the details of the technical process by which the register data is to be collected. This might be through the use of a standard mechanism such as an SDMX data exchange, a technical mechanism such as a query to a database, or even a manual process.

 

122.               The Exchange Channel object also allows for its Administrative Register sub-type to link to the Data Set actually collected, which references its own Data Structure object. By comparing the collected data and its structure against the "agreed" structure, the received data can be validated. Note that if the information being collected were referential metadata, the Referential Metadata Structure object would replace the Data Structure in the diagram, and similarly the Referential Metadata Set would replace the Data Set .

 

123.               Once a Data Set is collected, we have all of the usual objects such as the Data Point , the Instance Variable , and so on. As more Data Sets are collected, these can in turn be stored in a Data Resource , which would hold all of the data coming from the Administrative Register over time.

 

B. Web Scraping for Data Collection

 

124.               The second non-questionnaire data source to be modelled is a web scraper, as seen in the diagram below.

 

Figure 24.Web Scraping Channel

 

125.               There will be at least a notional Provision Agreement between the statistical organization collecting the data through the web scraper and each of the organizations whose sites are being scraped (the Information Provider ), even if this is only the terms and conditions of accessing the data provider's website. In many cases, Internet robots used to do web scraping are blocked from websites, and there is typically contact between the scraping organization and the data providing one, to make sure that access is not blocked, and to know when the website's structure might change.

 

126.               Although perhaps trivial, the Protocol being used will need to be recorded, being either HTTP or HTTPS (by definition, the scraping tool is operating on the web).

 

127.               Each website is scraped using a software application. Due to the varying structure of different websites, often a different software tool will be needed for each website. Further, every time the website being scraped is structurally modified, adjustments may need to be made to the software tool. The software tools themselves are represented as Process Steps in GSIM, these being the result of a design process administered through a Statistical Program, which are capable of being executed to programmatically collect the data.

 

128.               The management of the mappings between each website and the software tool used to scrape it is important information to capture. It is necessary to be able to describe the software tools used to scrape websites, and their link to the websites for which they are designed. This is done using the Scraping Process Map object. This object links a Process Step and one or more Information Providers (the organizations whose sites that software tool can scrape). A set of these gives the links needed to manage the mappings between the web scraping tools and the sites from which the data is collected.

 

129. As for the Administrative Register above, the structure of the data to be collected and the information regarding the actual data collected are captured in the Data Set and Data Structure objects.

 

C: Survey Data Collection

 

Figure 24.  Questionnaire

 

130.               Although more and more alternative data collection methods (such as Administrative Register sources) will be utilized by statistical organizations, it is envisaged that for the foreseeable future, surveys will continue to make extensive use of questionnaires for the purpose of data collection. As such, Questionnaire is included in GSIM as a subtype of Exchange Channel.

 

131.               The Provision Agreement establishes the relationship between the Questionnaire and the Information Provider , in the form of some agreement to provide data to the collecting organization. This is sometimes (especially in the case of collections for official business statistics) a mandatory requirement specified by law.

 

132.               Depending upon the survey it will be used for, the Questionnaire could be developed as one or more generic types. Each instance of a Questionnaire will be constructed by reference to the Questionnaire Specification . A Questionnaire could take the form of a standard Questionnaire Specification (i.e. the layout would be the same or have a relatively small number of variations) for a particular survey, or at the other extreme, the Questionnaire Specification could be tailored to each Information Provider (or Unit ) selected for the survey.

 

133.               The Questionnaire Specification will consist of a top level Questionnaire Component , which will itself be made up of lower level Questionnaire Components , built up in a hierarchical manner. Each Questionnaire Component will in turn be made up of a number of Instance Question Blocks, Instance Questions , and Instance Statements .

 

134.               In its simplest form, a Questionnaire Specification would have a single Questionnaire Component made up of a number of simple Instance Question Blocks , Instance Questions and Instance Statements , but will also have associated Questionnaire Logic , which will govern the navigation and validation of Questions and responses within the Questionnaire Specification . The Questionnaire Logic will implement a number of Rules , which will carry out such work as the evaluation of the response data in terms of the range of acceptable values.  In most cases, the Questionnaire Specification will be built up using of several Questionnaire Component levels, each with their associated Questionnaire Logic .

 

135.               Question Block , Question and Statement are reusable artifacts, which will be implemented in the Questionnaire Specification by means of the Instance Question Blocks , Instance Questions and Instance Statements respectively. It might be that the actual Question Blocks , Questions and Statements would be stored in some searchable library for use during the Questionnaire Specification development process.

 

136.               Questions can take the form of a multiple question item, and can be hierarchical. Question will have a connection to one or more Variables, and will also be associated with a Value Domain , specifying the constraints of the values which can be assigned to the Variables in the response to the Question.

 

137.               Different Protocols (modes of collection) would require different implementations of Questionnaire . For example, if a Questionnaire Specification is designed for collection via a web page, a similar Questionnaire Specification design containing all the same Question Blocks , and Questions but is intended for collection via a printed paper form, it would be implemented in a different instance of Questionnaire . Thus, where a multi-mode data collection strategy is adopted for a survey, separate Questionnaire Specifications would be needed to be developed for each Protocol (mode) employed, and they would be implemented in different Questionnaire instances.

 

138.               The navigation and validation aspects within the Questionnaire will need to be designed with the Protocol (mode of capture) in mind. For example, if the Questionnaire is to be rendered as a paper form, then the navigation will be implemented using an Instance Statement in  the form of a text instruction to the Information Provider such as "If the response to gender question is 'MALE'  then go to question X". If a similar Questionnaire were to be rendered as a web form, then the navigation could be automated and the Information Provider would be automatically routed to 'question X'.


Annex B: Glossary

 

Object

Group

Definition

Explanatory Text

Synonyms

Administrative Details

Base

A placeholder for extensions to the model based on an organization’s administrative needs.

The Administrative Details object is designed to act as a 'placeholder' to allow for future extensions to the existing model. It allows for further information to be added about the administrative details required to maintain the other objects outlined by GSIM.

 

Administrative Register

Exchange

A source of administrative information which is obtained from an external organization (or sometimes from another department of the same organization)

The Administrative Register is a source of administrative information obtained from external organizations. The Administrative Register would be provided under a Provision Agreement with the supplying organization. This administrative information is usually collected for an organization’s operational purposes, rather than for statistical purposes.

 

Agent

Base

An actor that performs a role in relation to the statistical Business Process.

An Agent may be either an Organization or an Individual . An Organization may be an entire organization or entities within a larger organization, such as departments or divisions. An Organization may have sub Agents, which may be either other Organizations within the parent Organization or Individuals that belong to that Organization .

 

Agent Role

Base

The function or activities of an Agent , in regard to their involvement in the statistical Business Process .

An Agent Role may apply to either type of Agent - an Organization or Individual . A common example would be to identify which individuals or departments within an organization provide administrative data.

 

Assessment

Business

The result of the analysis of the quality and effectiveness of any activity undertaken by a statistical organization and recommendations on how these can be improved.

An Assessmen t can be of a variety of types. One example may include a gap analysis, where a current state is determined along with what is needed to reach its target state. Alternately, an Assessment may compare current processes against a set of requirements, for example a new Statistical Need or change in the operating environment.

An Assessment can use various information objects as inputs, whether they are the main objects that the Assessment is about or auxiliary information objects that help accomplish the Assessment .

 

Attribute Component

Structures

The role given to a Represented Variable in the context of a Data Structure, which supplies information other than identification or measures.

For example the publication status of an observation (e.g. provisional, final, revised)

 

Business Case

Business

A proposal for a body of work that will deliver outputs designed to achieve outcomes. A Business Case will provide the reasoning for undertaking a Statistical Support Program to initiate a new Statistical Program Design for an existing Statistical Program, or an entirely new Statistical Program , as well as the details of the change proposed.

A Business Case is produced as a result of a detailed consideration of a Change Definition . It sets out a plan for how the change described by the Change Definition can be achieved. A Business Case usually comprises various evaluations. The Business Case will specify the stakeholders that are impacted by the Statistical Need or by the different solutions that are required to implement it.

 

Business Function

Business

Something an enterprise does, or needs to do, in order to achieve its objectives.

A Business Function delivers added value from a business point of view. It is delivered by bringing together people, processes and technology (resources), for a specific business purpose.

Business Functions answer in a generic sense "What business purpose does this Business Service or Process Step serve?" Through identifying the Business Function associated with each Business Service or Process Step it increases the documentation of the use of the associated Business Services and Process Steps , to enable future reuse.

A Business Function may be defined directly with descriptive text and/or through reference to an existing catalogue of Business Functions . The phases and sub processes defined within GSBPM can be used as an internationally agreed basis for cataloguing high level Business Functions . A catalogue might also include Business Functions defined at a lower level than "sub process". For example, "Identify and address outliers" might be catalogued as a lower level Business Function with the "Review, validate and edit" function (5.3) defined within GSBPM.

 

Business Process

Business

The set of Process Steps to perform one of more Business Functions to deliver a Statistical Program Cycle or Statistical Support Program .

For example, a particular Statistical Program Cycle might include several data collection activities, the corresponding editing activities for each collection and the production and dissemination of final outputs. Each of these may be considered separate Business Processes for the Statistical Program Cycle .

 

Business Service

Business

A means of performing a Business Function (an ability that an organization possesses, typically expressed in general and high level terms and requiring a combination of organization, people, processes and technology to achieve).

A Business Service may provide one means of accessing a particular Business Function . The operation of a Business Service will perform one or more Business Processes .

The explicitly defined interface of a Business Service can be seen as representing a "service contract". If particular inputs are provided then the service will deliver particular outputs in compliance within specific parameters (for example, within a particular period of time).

Note: The interface of a Business Service is not necessarily IT based. For example, a typical postal service will have a number of service interfaces:

- Public letter box for posting letters

- Counter at post office for interacting with postal workers

 

Category

Concepts

A Concept whose role is to extensionally define and measure a characteristic.

Categories for the Concept of sex include: Male, Female

Note: An extensional definition is a description of a Concept by enumerating all of its sub ordinate Concepts under one criterion or sub division.

For example - the Noble Gases (in the periodic table) is extensionally defined by the set of elements including Helium, Neon, Argon, Krypton, Xenon, Radon. (ISO 1087-1)

class

Category Item

Concepts

An element of a Category Set.

A type of Node particular to a Category Set type of Node Set . A Category Item contains the meaning of a Category without any associated representation.

 

Category Set

Concepts

A list of Categories

A Category Set is a type of Node Set which groups Categories through the use of Category Items . The Categories in a Category Set typically have no assigned Designations ( Codes ).

For example:
Male, Female

 

Change Definition

Business

A structured, well-defined specification for a proposed change.

A related object - the Statistical Need - is a change expression as it has been received by an organization. A Statistical Need is a raw expression of a proposed change, and is not necessarily well-defined. A Change Definition is created when a Statistical Need is analyzed by an organization, and expresses the raw need in well-defined, structured terms.

A Change Definition does not assess the feasibility of the change or propose solutions to deliver the change - this role is satisfied by the Business Case object. The precise structure or organization of a Change Definition can be further specified by rules or standards local to a given organization. It also includes the specific Concepts to be measured and the Population that is under consideration.

Once a Statistical Need has been received, the first step is to do the conceptual work to establish what it is we are trying to measure. The final output of this conceptual work is the Change Definition.

The next step is to assess how we are going to make the measurements - to design a solution and put forward a proposal for a body of work that will deliver on the requirements of the original Statistical Need

 

Classification Family

 

Concepts

A Classification Family is a group of Classification Series related from a particular point of view. The Classification Family is related by being based on a common Concept (e.g. economic activity).

Different classification databases may use different types of Classification Families and have different names for the families, as no standard has been agreed upon.

 

Classification Index

 

Concepts

A Classification Index is an ordered list (alphabetical, in code order etc.) of Cla ssification Index Entries. A Classification Index can relate to one particular or to several Statistical Classifications .

A Classification Index shows the relationship between text found in statistical data sources (responses to survey questionnaires, administrative records) and one or more Statistical Classifications .  A Classification Index may be used to assign the codes for Classification Items to observations in statistical collections.

A Statistical Classification is a subtype of Node Set . The relationship between Statistical Classification and Classification Index can also be extended to include the other Node Set types - Code List and Category Set .

 

Classification Index Entry

Concepts

A Classification Index Entry is a word or a short text (e.g. the name of a locality, an economic activity or an occupational title) describing a type of object/unit or object property to which a Classification Item applies, together with the code of the corresponding Classification Item . Each Classification Index Entry typically refers to one item of the Statistical Classification . Although a Classification Index Entry may be associated with a Classification Item at any Level of a Statistical Classification , Classification Index Entries are normally associated with items at the lowest Level .

A Classification Item is a subtype of Node . The relationship between Classification Item and Classification Index Entry can also be extended to include the other Node types - Code Item and Category Item .

 

Classification Item

 

Concepts

A Classification Item represents a Category at a certain Level within a Statistical Classification . It defines the content and the borders of the Category . A Unit can be classified to one and only one item at each Level of a Statistical Classification .

 

 

Classification Series

 

Concepts

A Classification Series is an ensemble of one or more Statistical Classifications , based on the same concept, and related to each other as versions or updates. Typically, these Statistical Classifications have the same name (for example, ISIC or ISCO).

 

 

Code

Concepts

A Designation for a Category.

Codes are unique within their Code List . Example: M (Male) F (Female).

 

Code Item

Concepts

An element of a Code List .

A type of Node particular to a Code List type of Node Set . A Code Item combines the meaning of the included Category with a Code representation.

 

Code List

Concepts

A list of Categories where each Category has a predefined Code assigned to it.

A kind of Node Set for which the Category contained in each Node has a Code assigned as a Designation .

For example:
1 - Male
2 - Female

 

Code Value

Concepts

An alpha-numeric string used to represent a Code .

A Code Value is a subtype of Sign - a way of denoting the value of a Code . This is a kind of Sign used for Codes .

 

Concept

Concepts

Unit of thought differentiated by characteristics.

 

 

Concept System

Concepts

Set of Concepts structured by the relations among them.

Here are 2 examples 1) Concept of Sex: Male, Female, Other 2) ISIC (the list is too long to write down)

 

Conceptual Domain

Concepts

Set of valid Concepts .

The Concepts can be described by either enumeration or by an expression.

 

Correspondence Table

Concepts

A Correspondence Table expresses the relationship between two Statistical Classifications. These are typically: two versions from the same Classification Series; Statistical Classifications from different Classification Series; a variant and the version on which it is based; or, different versions of a variant. In the first and last examples, the Correspondence Table facilitates comparability over time. Correspondence relationships are shown in both directions.

A Statistical Classification is a subtype of Node Set. The relationship between Statistical Classification and Correspondence Table can also be extended to include the other Node Sets - Code List and Category Set.

 

Data Point

Structures

A placeholder (or cell) for the value of an Instance Variable

Field in a Data Structure which corresponds to a cell in a table. The Data Point is structural and distinct from the value (the Datum ) that it holds.

 

Data Resource

Structures

An organized collection of stored information made of one or more Data Sets.

Data Resources are collections of data that are used by a statistical activity to produce information. Data Resource is a specialization of an Information Resource .

data source

Data Set

Structures

An organized collection of data.

Examples of Data Sets could be observation registers, time series, longitudinal data, survey data, rectangular data sets, event-history data, tables, data tables, cubes, registers, hypercubes, and matrixes. A broader term for Data Set could be data. A narrower term for Data Set could be data element, data record, cell, field.

database, data file, file, table

Data Structure

Structures

Defines the structure of an organized collection of data ( Data Set ).

The structure is described using Data Structure Components that can be either Attribute Components, Identifier Components or Measure Components . Examples for unit data include social security number, country of residence, age, citizenship, country of birth, where the social security number and the country of residence are both identifying components and the others are measured variables obtained directly or indirectly from the person ( Unit ).

 

Data Structure Component

Structures

The role of the Represented Variable in the context of a Data Structure.

A Data Structure Component can be an Attribute Component, Measure Component or an Identifier Component .

Example of Attribute Component : The publication status of an observation such as provisional, revised.

Example of Measure Component : age and height of a person in a Unit Data Set or number of citizens and number of households in a country in a Data Set for multiple countries ( Dimensional Data Set ).

Example of Identifier Component : The personal identification number of a Swedish citizen for unit data or the name of a country in the European Union for dimensional data.

 

Datum

Concepts

A value.

A Datum is the actual instance of data that was collected or derived. It is the value which populates a Data Point . A Datum is the value found in a cell of a table.

value

Described Conceptual Domain

Concepts

A Conceptual Domain defined by an expression.

For example: All real numbers between 0 and 1.

Non-enumerated conceptual domain

Described Value Domain

Concepts

A Value Domain defined by an expression.

For example: All real decimal numbers between 0 and 1.

Non-enumerated value domai

Designation

Concepts

The name given to an object for identification.

The association of a Concept with a Sign that denotes it.

 

Dimensional Data Point

Structures

A placeholder (or cell) for the value of an Instance Variable with respect to either a Unit or Population .

A Dimensional Data Point is uniquely identified by the combination of exactly one value for each of the dimensions ( Identifier Component ) and one measure ( Measure Component ). There may be multiple values for the same Dimensional Data Point that is for the same combination of dimension values and the same measure. The different values represent different versions of the data in the Data Point . Values are only distinguished on the basis of quality, date/time of measurement or calculation, status, etc. This is handled through the mechanisms provided by the Datum information object.

cell

Dimensional Data Set

Structures

A collection of dimensional data that conforms to a known structure.

 

 

Dimensional Data Structure

Structures

Describes the structure of a Dimensional Data Set .

For example (city, average income, total population) where the city is the Identifier Component and the others are measured variables.

 

Enumerated Conceptual Domain

Concepts

A Conceptual Domain expressed as a list of Categories

For instance, the Sex Categorie s: 'Male' and 'Female'

 

Enumerated Value Domain

Concepts

A Value Domain expressed as a list of Categories and associated Codes.

Example - Sex Codes <m, male>; <f, female>; <o, other>.

 

Environment Change

Business

A requirement for change that originates from a change in the operating environment of the statistical organization.

An Environment Change reflects change in the context in which a statistical organization operates. Environment Changes can be of different origins and also take different forms. They can result from a precise event (budget cut, new legislation enforced) or from a progressive process (technical or methodological progress, application or tool obsolescence). Other examples of Environment Changes include the availability of a new Information Resource , the opportunity for new collaboration between organizations, etc.

 

Exchange Channel

Exchange

A means of exchanging data.

An abstract object that describes the means to receive (data collection) or send (dissemination) information.

Different Exchange Channels are used for collection and dissemination. Examples of collection Exchange Channel include Questionnaire , Web Scraper Channel and Administrative Register . The only example of a dissemination Exchange Channel currently contained in GSIM is Product . Additional Exchange Channels can be added to the model as needed by individual organizations.

 

Identifiable Artefact

Base

An abstract class that comprises the basic attributes and associations needed for identification, naming and other documentation.

An instance of any GSIM information object is an Identifiable Artifact .

 

Identifier Component

Structures

The role given to a Represented Variable in the context of a Data Structure to identify the unit in an organized collection of data.

An Identifier Component is a sub-type of Data Structure Component . The personal identification number of a Swedish citizen for unit data or the name of a country in the European Union for dimensional data.

 

Individual

Base

A person who acts, or is designated to act towards a specific purpose.

 

 

Information Consumer

Exchange

A person or organization that consumes disseminated data.

The Information Consumer accesses a set of information via a Product (or potentially via another Exchange Channel), which contains one or more Presentations. The Information Consumer's access to the information is subject to a Provision Agreement , which sets out conditions of access.

 

Information Provider

Exchange

An Individual or Organization that provides collected information.

An Information Provider possesses sets of information (that it has generated, collected, produced, bought or otherwise acquired) and is willing to supply that information (data or referential metadata) to the statistical office. The two parties use a Provision Agreement to agree the Data Structure and Referential Metadata Structure of the data to be exchanged via an Exchange Channel .

information supplier, data supplier

Information Request

Business

An outline of a need for new information required for a particular purpose.

An Information Request is a special case of Statistical Need that may come in an organized form, for example by specifying on which S ubject Field the information is required. It may also be a more general request and require refinement by the statistical agency and formalized in a Change Definition .

 

Information Resource

Structures

An abstract notion that is any organized collection of information.

There currently are only two concrete sub classes: Data Resource and Referential Metadata Resource . The Information Resource allows the model to be extended to other types of resource.

 

Information Set

Structures

Organized collections of statistical content.

Statistical organizations collect, process, analyze and disseminate Information Sets , which contain data ( Data Sets ), referential metadata ( Referential Metadata Sets ), or potentially other types of statistical content, which could be included in addition types of Information Set .

 

Instance Question

Exchange

The use of a Question in a particular Questionnaire .

The Instance Question is the use of a Question in a particular Questionnaire Component . This also includes the use of the Question in a Question Block , which is a particular type of Questionnaire Component .

 

Instance Question Block

Exchange

The use of a Question Block in a particular Questionnaire .

The Instance Question Block is the use of a Question Block in a particular Questionnaire Component . This also includes the use of a Question Block in another Question Block , as it is a particular type of Questionnaire Component .

 

Instance Statement

Exchange

The use of a Statement in a particular Questionnaire .

The Instance Statement is the use of a Statement in a particular Questionnaire Component . This also includes the use of the Statement in a Question Block , which is a particular type of Questionnaire Component .

 

Instance Variable

Concepts

The use of a Represented Variable within a Data Set . It may include information about the source of the data.

The Instance Variable is used to describe actual instances of data that have been collected. Here are 3 examples:
1) Gender: Dan Gillman has gender <m, male>, Arofan Gregory has gender<m, male>, etc.
2) Number of employees: Microsoft has 90,000 employees; IBM has 433,000 employees, etc.
3) Endowment: Johns Hopkins has endowment of <3, $1,000,000 and above>,
Yale has endowment of <3, $1,000,000 and above>, etc.

 

Level

Concepts

A Statistical Classification has a structure which is composed of one or several Levels . A Level often is associated with a Concept , which defines it. In a hierarchical classification the Classification Items of each Level but the highest are aggregated to the nearest higher Level . A linear classification has only one Level .

A Statistical Classification is a subtype of Node Set . The relationship between Statistical Classification and Level can also be extended to include the other Node Set types - Code List and Category Set .

 

Logical Record

Structures

Describes a type of Unit Data Record for one Unit Type within a Unit Data Set .

Examples: household, person or dwelling record.

 

Map

Concepts

A Map is an expression of the relation between a Classification Item in a source Statistical Classification and a corresponding Classification Item in the target Statistical Classification . The Map should specify whether the relationship between the two Classification Items is partial or complete. Depending on the relationship type of the Correspondence Table , there may be several Maps for a single source or target item.

The use of Correspondence Tables and Maps can be extended to include all types of Node and Node Set . This means that a Correspondence Table could map between the items of Statistical Classifications , Code Lists or Category Sets .

 

Measure Component

Structures

The role given to a Represented Variable in the context of a Data Structure to hold the observed/derived values for a particular Unit in an organized collection of data.

A Measure Component is a sub-type of Data Structure Component. For example age and height of a person in a Unit Data Set or number of citizens and number of households in a country in a Data Set for multiple countries ( Dimensional Data Set ).

 

Node

Concepts

A combination of a Category and related attributes.

A Node is created as a Category , Code or Classification Item for the purpose of defining the situation in which the Category is being used.

 

Node Set

Concepts

A set of Nodes.

Node Set is a kind of Concept System . Here are 2 examples:

1) Sex Categories

      Male

      Female

      Other


2) Sex Codes

      <m, male>

      <f, female>

      <o, other>

 

 

Organization

Base

A unique framework of authority within which a person or persons act, or are designated to act, towards some purpose.

 

 

Output Specification

Exchange

Defines how Information Sets consumed by a Product are presented to Information Consumers .

The Output Specification specifies Products and defines the Presentations they contain. The Output Specification may be fully defined during the design process (such as in a paper publication or a predefined web report), or may be a combination of designed specification supplemented by user selections (such as in an online data query tool).

 

Population

Concepts

The total membership of a defined class of people, objects or events.

A population is used to describe the total membership of a group of people, objects or events based on characteristics, e.g. time and geographic boundaries.

Here are 3 examples –
1. US adult persons on 13 November 1956
2. US computer companies at the end of 2012
3. Universities in the US 1 January 2011.

 

Presentation

Exchange

The way data and referential metadata are presented in a Product .

A Product has one or more Presentations, which present data and referential metadata from Information Sets . A Presentation is defined by an Output Specification .

Presentation can be in different forms; e.g. tables, graphs, structured data files.
Examples:

      A table of data. Based on a Data Set , the related Data Structure is used to label the column and row headings for the table. The Data Set is used to populate the cells in the table. Reference metadata is used to populate footnotes and cell notes on the table. Confidentiality rules are applied to the Data Set to suppress any disclosive cells.

      A data file based on a standard (e.g. SDMX).

      A PDF document describing a Classification .

      Any structural metadata object expressed in a standard format (e.g. DDI 3.1 XML).

      A list of Products or services (e.g. a product catalogue or a web services description language (WSDL) file).

      A web page containing Classifications , descriptions of Variables , etc.

 

 

Process Control

Business

A set of decision points which determine the flow between the Process Steps used to perform a Business Process .

The typical use of Process Control is to determine what happens next after a Process Step is executed. The possible paths, and the decision criteria, associated with a Process Control are specified as part of designing a production process, captured in a Process Control Design . There is typically a very close relationship between the design of a process and the design of a Process Control .

 

Process Control Design

Business

The specification of the decision points required during the execution of a Business Process .

The design of a Process Control typically takes place as part of the design of the process itself. This involves determining the conditional routing between the various sub-processes and services used by the executing process associated with the Process Control and specified by the Process Control Design.

It is possible to define a Process Control where the next step in the Process Step that will be executed is a fixed value rather than a "choice" between two or more possibilities. Where such a design would be appropriate, this feature allows, for example, initiation of a step in the Process Step representing the GSBPM Process Phase (5) to always lead to initiation of GSBPM sub-process Integrate Data (5.1) as the next step.

This allows a process designer to divide a Business Process into logical steps (for example, where each step performs a specific Business Function through re-use of a Business Service ) even if these process steps will always follow each other in the same order. In all cases, the Process Control Design defines and the Process Control manages the flow between Process Steps , even where the flow is "trivial". Process Design is left to focus entirely on the design of the process itself, not sequencing between steps.

 

Process Design

Business

The specification of how a Process Step will be performed. This includes specifying the types of Process Inputs required and the type of Process Outputs that will be produced.

A Process Design is the design time specification of a Process Step that is performed as part of a run-time Business Service . A Process Step can be as big or small as the designer of a particular Business Service chooses. From a design perspective, one Process Step can contain "sub-steps", each of which is conceptualized as a (smaller) Process Step in its own right. Each of those "sub-steps" may contain "sub-steps" within them and so on indefinitely. It is a decision for the process designer to what extent to subdivide steps. At some level it will be appropriate to consider a Process Step to be a discrete task (unit of work) without warranting further subdivision. At that level the Process Step is designed to process particular Process Inputs, according to a particular Process Method , to produce particular Process Output s. The flow between a Process Step and any sub steps is managed via Process Control .

 

Process Input

Business

Any instance of an information object which is supplied to a Process Step Instance at the time its execution is initiated.

Process Input might include information that is used as an input that will be transformed (e.g. a Data Set ), information that is used to control specific parameters of the process (e.g. a Rule ), and information that is used as reference to guide the process (e.g. a Code List ).

 

Process Input Specification

Business

A record of the types of inputs required for a Process Design .

The Process Input Specification enumerates the Process Inputs required at the time a Process Design is executed. For example, if five different Process Inputs are required, the Process Input Specification will describe each of the five inputs. For each required Process Input the Process Input Specification will record the type of information object (based on GSIM) which will be used as the Process Input (example types might be a Dimensional Data Set or a Classification ).

The Process Input to be provided at the time of Process Step execution will then be a specific instance of the type of information object specified by the Process Input Specification . For example, if a Process Input Specification requires a Dimensional Data Set then the corresponding Process Input provided at the time of Process Step execution will be a particular Dimensional Data Set.

 

Process Method

Business

A specification of the technique which will be used to perform the unit of work.

The technique specified by a Process Method is independent from any choice of technologies and/or other tools which will be used to apply that technique in a particular instance. The definition of the technique may, however, intrinsically require the application of specific Rules (for example, mathematical or logical formulas).

A Process Method describes a particular method for performing a Process Step.

 

Process Output

Business

Any instance of an information object which is produced by a Process Step as a result of its execution.

Process Outputs have an attribute of Process Output Type, which has two possible values:

      Transformed Output is the result which provides the "reason for existence" of the Process Step . If that output were no longer required then there would be no need for the Process Step in its current form. Typically a Transformed Output is either a Process Input to a subsequent Process Step or it represents the final product from a statistical business process.

 

      A Process Metric records information about the execution of a Process Step . For example, how long it took to complete execution of the Process Step and what percentage of records in the Process Input was updated by the Process Step to produce the Transformed Output .

 

 

Process   Output Specification

Business

A record of the types of outputs required for a Process Design.

The Process Output Specification enumerates the Process Outputs that are expected to be produced at the time a Process Design is executed. For example, if five different Process Outputs expected, the Process Output Specification will describe each of the five outputs. For each expected Process Output the Process Output Specification will record the type of information object (based on GSIM) which will be used as the Process Output (Example types might be a Dimensional Data Set or a Classification ).

The Process Output to be provided at the time of Process Step execution will then be a specific instance of the type of information object specified by the Process Output Specification . For example, if a Process Output Specification expects a Dimensional Data Set then the corresponding Process Output provided at the time of Process Step execution will be a particular Dimensional Data Set.

 

Process Pattern

Business

A nominated set of Process Designs , and associated Process Control Designs (flow), which have been highlighted for possible reuse.

In a particular Business Process , some Process Steps may be unique to that Business Process while others may be applicable to other Business Processes . A Process Pattern can be seen as a reusable template. It is a means to accelerate design processes and to achieve sharing and reuse of design patterns which have proved effective. Reuse of Process Patterns can indicate the possibility to reuse related Business Services.

By deciding to reuse a Process Pattern , a designer is actually reusing the pattern of Process Designs and Process Control Designs associated with that Process Pattern . They will receive a new instance of the Proce ss Designs and Process Control Designs . If they then tailor their "instance" of the Process Designs and Process Control Designs to better meet their needs they will not change the definition of the reusable Process Pattern .

 

Process Step

Business

A Process Step is a work package that performs a Business Process . A Process Step implements the Process Step Design specified in order to produce the outputs for which the Process Step was designed.

Each Process Step is the use of a Process Step Design in a particular context (e.g. within a specific Business Process ). At the time of execution a Process Step Instance specifies the actual instances of input objects (for example, specific Data Sets , specific Variables ) to be supplied.

 

Process Step Instance

Business

An executed step in a Business Process . A Process Step Instance specifies the actual inputs to and outputs from for an occurrence of a Process Step .

Each Process Step is the use of a Process Step Design in a particular context (e.g. within a specific Business Process). At the time of execution a Process Step Instance specifies the actual instances of input objects (for example, specific Data Sets, specific Variables) to be supplied.

Each Process Step Instance may produce unique results even though the Process Step remains constant.

Even when the inputs remain the same, metrics such as the elapsed time to complete execution of process step may vary from execution to execution. For this reason, each Process Step Instance details of inputs and outputs for that instance of implementing the Process Step .

In this way it is possible to trace the flow of execution of a Business Process through all the Process Steps which were involved.

 

Product

Exchange

A package of content that can be disseminated as a whole.

A Product is the only defined type of Exchange Channel for outgoing information. A Product packages Presentations of Information Sets for an Information Consumer . The Product and its Presentations are generated according to Output Specifications , which define how the information from the Information Sets it consumes are presented to the Information Consumer . The Protocol for a Product determines the mechanism by which the Product is disseminated (e.g. website, SDMX web service, paper publication).

A Provision Agreement between the statistics office and the Information Consumer governs the use of a Product by the Information Consumer . The Provision Agreement , which may be explicitly or implicitly agreed, provides the legal or other basis by which the two parties agree to exchange data. In many cases, dissemination Provision Agreements are implicit in the terms of use published by the statistics office.

For static Products (e.g. paper publications), specifications are predetermined.  For dynamic products, aspects of specification could be determined by the Information Consumer at run time. Both cases result in Output Specifications specifying Information Set data or referential metadata that will be included in each Presentation within the Product .

 

Protocol

Exchange

The mechanism for exchanging information through an Exchange Channel.

A Protocol specifies the mechanism (e.g. SDMX web service, data file exchange, web robot, face to face interview, mailed paper form) of exchanging information through an Exchange Channel .

 

Provision Agreement

Exchange

The legal or other basis by which two parties agree to exchange data.

A Provision Agreement between the statistical organization and the Information Provider (collection) or the Information Consumer (dissemination) governs the use of Exchange Channels . The Provision Agreement , which may be explicitly or implicitly agreed, provides the legal or other basis by which the two parties agree to exchange data. The parties also use the Provision Agreement to agree the Data Structure and Referential Metadata Structure of the information to be exchanged.

 

Question

Exchange

Describes the text used to elicit a response for the Concept to be measured.

A Question may be a single question used to obtain a response, or may be a multiple question, a construct which links multiple sub-questions, each with their own response.

A Question also includes a relationship to the Value Domain to document the associated response criteria for the question. A single response question will have one Value Domain associated with it, while a 'multiple question' may have more than one Value Domain .

A Question should be designed with re-use in mind, as it can be used in multiple Questionnaires.

Multiple Question

Question Block

Exchange

A set of Questions, Statements or instructions which are used together.

A Question Block should be designed for reuse, as it can be used in multiple Questionnaires . The Question Block is a type of Questionnaire Component . A statistical organization will often have a number of Question Blocks which they reuse in a number of Questionnaires . Examples of Question Blocks include:

      Household Question Block

      Income Question Block

      Employment Question Block

 

Question Module

Questionnaire

Exchange

A concrete and usable tool to elicit information from observation units .

 

This is an example of a way statistical organizations collect information (an exchange channel). Each mode should be interpreted as a new Questionnaire derived from the Questionnaire Specification .

The Questionnaire is a subtype of Exchange Channel , as it is a way in which data is obtained.

 

Questionnaire Component

Exchange

A record of the flow of a Questionnaire Specification and its use of Questions, Question Blocks and Statements

Defines the structure of the Questionnaire Specification , as a combination of Questions, Question Blocks and Statements. It is the object which groups together all the components of a Questionnaire.

A Questionnaire Component is recursive, in that it can refer to other Questionnaire Components and accompanying Questionnaire Logic objects at a lower level. It is only at the top level where the Questionnaire Component links to the Questionnaire Specification,

Question Block

Questionnaire Logic

Exchange

Governs the sequence of Questions , Question Blocks and Statements based on factors such as the current location, the response to the previous questions etc., invoking navigation and validation rules to apply.

 

Routing

Questionnaire Specification

Exchange

The tool designed to elicit information from observation Units .

This represents the complete questionnaire design, with a relationship to the top level Questionnaire Component .

There may be many different Questionnaire Specifications , for the same surveys, or tailored to individual observation Units (respondents) so that there would be a different Questionnaire Specification for each respondent. The design would also differ depending upon the specific mode of collection the Questionnaire is designed for.

 

Record Relationship

Structures

Describes relationships between Logical Records within a Unit Data Structure . It must have both a source Logical Record and a target Logical Record in order to define the relationship.

Example: Relationship between person and household Logical Records within a Unit Data Set .

 

Referential Metadata Attribute

Structures

The role given to a Represented Variable to supply information in the context of a Referential Metadata Structure .

 

 

Referential Metadata Content Item

Structures

The content describing a particular characteristic of a Referential Metadata Subject .

A Referential Metadata Content Item c ontains the actual content describing a particular characteristic of a Referential Metadata Subject .

 

Referential Metadata Resource

Structures

An organized collection of stored information consisting of one or more Referential Metadata Sets .

Referential Metadata Resources are collections of structured information that may be used by a statistical activity to produce information. This information object is a specialization of an Information Resource .

 

Referential Metadata Set

Structures

An organized collection of referential metadata for a given Referential Metadata Subject .

Referential Metadata Sets organize referential metadata. Each Referential Metadata Set uses a Referential Metadata Structure to define a structured list of Referential Metadata Attributes for a given Referential Metadata Subject .

 

Referential Metadata Structure

Structures

Defines the structure of an organized collection of referential metadata ( Referential Metadata Set ).

A Referential Metadata Structure defines a structured list of Referential Metadata Attributes for a given Referential Metadata Subject .

Examples of Referential Metadata Attributes are those that describe quality information and methodologies. Examples of subject are: objects like a Questionnaire or a Classification , or collections of data like a Data Set , or any Data Point or set of Data Points created from a specific Data Structure.

Metadata Structure Definition

Referential Metadata Subject

Structures

Identifies the subject of an organized collection of referential metadata.

The Referential Metadata Subject identifies the subject of the metadata that can be reported using this Referential Metadata Structure. These subjects may be any GSIM object type, or any Data Point or set of Data Points created from a specific Data Structure.

Examples: The GSIM object type may be Product for which there is a list specified in a Value Domain . The Value Domain specifies the list of actual Products for which reference metadata can be reported or authored using this Referential Metadata Structure.

 

Referential Metadata Subject Item

Structures

Identifies the actual subject for which referential metadata is reported.

Examples are an actual Product such as Balance of Payments and International Investment Position, Australia, June 2013, or a collection of Data Points such as the Data Points for a single region within a Data Set covering all regions for a country.

 

Represented Variable

Concepts

A combination of a characteristic of a population to be measured and how that measure will be represented.

Example:

The pair (Number of Employees, Integer), where "Number of Employees" is the characteristic of the population ( Variable) and "Integer" is how that measure will be represented ( Value Domain).

 

Rule

Business

A specific mathematical or logical expression which can be evaluated to determine specific behavior.

Rules are of several types: they may be derived from methods to determine the control flow of a process when it is being designed and executed; they may be used as the input parameters of processes (e.g., imputation rules, edit rules); and they may be used to drive the logical flow of a questionnaire. There are many forms of Rules and their purpose, character and expression can vary greatly.

 

Scraping Process Map

Exchange

Maps a web scraping process to a specific website.

Scraping Process Map is an essential element of the Web Scraper Channel . The process being mapped can be a Business Service or a Process Step .

 

Sign

Concepts

Something that suggests the presence or existence of a fact, condition, or quality.

It is a perceivable object. This object is used to denote a Concept as a Designation .

 

Statement

Exchange

A report of facts in a Questionnaire

Statements are often included to provide further explanation to respondents. Example:

"The following questions are about your health".

The object is also used to represent completion instructions for the interviewer or respondent.

Statement should be designed with re-use in mind as it can be used in numerous Questionnaires .

Interviewer Instruction

Instruction

Statistical Classification

 

Concepts

A Statistical Classification is a set of Categories which may be assigned to one or more variables registered in statistical surveys or administrative files, and used in the production and dissemination of statistics. The Categories at each Level of the classification structure must be mutually exclusive and jointly exhaustive of all objects/units in the population of interest.

The Categories are defined with reference to one or more characteristics of a particular population of units of observation. A Statistical Classification may have a flat, linear structure or may be hierarchically structured, such that all Categories at lower Levels are sub- Categories of Categories at the next Level up. Categories in Statistical Classifications are represented in the information model as Classification Items .

 

Statistical Need

Business

A requirement, request or other notification that will be considered by an organization. A Statistical Need does not necessarily have structure or format - it is a 'raw' need as received by the organization. A Statistical Need may be of a variety of types including Environmental Change or Information Request .

The Statistical Need is a proposed or imposed requirement, request or other notification as it has been received by an organization. A Statistical Need is a raw expression of a requirement, and is not necessarily well-defined. A related object - Change Definition - is created when a Statistical Need is analyzed by an organization. Change Definition expresses the raw need in well-defined, structured terms.
Once a Statistical Need has been received, the first step is to do the conceptual work to establish what it is we are trying to measure. The final output of this conceptual work is the Change Definition .

In some cases, the Statistical Need can result from the Assessment of the quality, efficiency, etc. of an existing process.

 

Statistical Program

Business

A set of activities, which may be repeated, that describes the purpose and context of a set of Business Process within the context of the relevant Statistical Program Cycles.

The Statistical Program is one of a family of objects that provide the environmental context in which activities to produce statistics within a statistical organization are conducted. Statistical Program is the top level object that describes the purpose and objectives of a set of activities. Statistical Program will usually correspond to an ongoing activity such as a survey or output series. Some examples of Statistical Program are:

      Labour Force Survey - Multipurpose Household Survey - National Accounts - Demography - Overseas Arrivals and Departures

Related to the Statistical Program object there are Statistical Program Design and Statistical Program Cycle objects that hold the detailed information about the design and conduct of the Business Process .

In the case of the traditional approach, an organization has received a Statistical Need and produced a Change Definition and an approved Business Case . The Business Case will specify either a change to the design or methodology of an existing Statistical Program , which will result in a new Statistical Program Design ; or a change to one or more existing Statistical Programs (for example, to add an additional objective to the Statistical Program ); or result in a new Statistical Program being created.

This does not include statistical support functions such as metadata management, data management (and other overarching GSBPM processes) and design functions. These activities are conducted as part of Statistical Support Programs .

 

Statistical Program Cycle

Business

A set of activities to investigate characteristics of a given Population for a particular reference period.

A Statistical Program Cycle documents the execution of an iteration of a Statistical Program according to the associated Statistical Program Design for a certain reference period. It identifies the activities that are undertaken as a part of the cycle and the specific resources required and processes used and description of relevant methodological information used in this cycle defined by the Statistical Program Design.

 

Statistical Program Design

Business

The specification of the resources required, processes used and description of relevant methodological information about the set of activities undertaken to investigate characteristics of a given Population .

The Statistical Program Design is an objects that provide the operational context in which a set of Business Processes is conducted.

A simple example is where a Statistical Program relates to a single survey, for example, the Labour Force Survey. The Statistical Program will have a series of Statistical Program Design objects that describe the methodology and design used throughout the life of the survey. When a methodological change is made to the survey, a new Statistical Program Design is created to record the details of the new design.

 

Statistical Support Program

Business

A program which is not related to the post-design cyclic production of statistical products, but is necessary to support cyclical production.

This type of program will include such functions as metadata management, data management, methodological research, and design functions. These programs correspond to the horizontal functions shown in the GSBPM, as well as programs to create new or change existing Statistical Programs .

 

Subject Field

Concepts

One or more Concept Systems used for the grouping of Concepts and Categories for the production of statistics.

A Subject Field is a field of special knowledge under which a set of Concepts and their Designations is used. For example, labour market, environmental expenditure, tourism, etc.

subject area, theme

Unit

Concepts

The object of interest in a Business Process

Here are 3 examples - 1. Individual US person (i.e., Arofan Gregory, Dan Gillman, Barack Obama, etc.) 2. Individual US computer companies (i.e., Microsoft, Apple, IBM, etc.) 3. Individual US universities (i.e., Johns Hopkins, University of Maryland, Yale, etc.)

 

Unit Data Point

Structures

A placeholder (or cell) for the value of an Instance Variable with respect to a Unit.

This placeholder may point to multiple values representing different versions of the data. Values are only distinguished on the basis of quality, date/time of measurement or calculation, status, etc. This is handled through the mechanisms provided by the Datum information object.

cell

Unit Data Record

Structures

Contains the specific values (as a collection of Unit Data Points ) related to a given Unit as defined in a Logical Record .

For example (1212123, 48, American, United Kingdom) specifies the age (48) in years on the 1st of January 2012 in years, the current citizenship (American), and the country of birth (United Kingdom) for a person with social security number 1212123.

 

Unit Data Set

Structures

A collection of data that conforms to a known structure and describes aspects of one or more Units .

Example: A synthetic unit record file is a collection of artificially constructed Unit Data Records , combined in a file to create a Unit Data Set .

micro data, unit data, synthetic unit record file

Unit Data Structure

Structures

Describes the structure of a Unit Data Set .

For example (social security number, country of residence, age, citizenship, country of birth) where the social security number and the country of residence are the identifying components ( Identifier Component ) and the others are measured variables obtained directly or indirectly from the person ( Unit ) and are Measure Components of the Logical Record .

file description, dataset description

Unit Type

Concepts

A Unit Type is a class of objects of interest

A Unit Type is used to describe a class or group of Units based on a single characteristic, but with no specification of time and geography.  For example, the Unit Type of “Person” groups together a set of Units based on the characteristic that they are ‘Persons’.

It concerns not only Unit Types used in dissemination, but anywhere in the statistical process. E.g. using administrative data might involve the use of a fiscal unit.

Object class (ISO 11179)

Value Domain

Concepts

The permitted range of values for a characteristic of a variable

The values can be described by enumeration or by an expression

 

Variable

Concepts

The use of a Concept as a characteristic of a Population intended to be measured

The Variable combines the meaning of a Concept with a Unit Type, to define the characteristic that is to be measured.

Here are 3 examples -
1. Sex of person
2. Number of employees
3. Value of endowment

 

Web Scraper Channel

Exchange

A concrete and usable tool to gather information from the Internet.

 

This is an example of a way statistical organizations collect information (an Exchange Channel ). The Web Scraper Channel contains Scraping Process Maps , which map the channel to each website targeted for scraping.

 

 

 



http://www1.unece.org/stat/platform/download/attachments/97356002/Base%20Group.png?api=v2 Administrative Details

/stat/platform/download/attachments/86540898/Administrative%20Details.png?version=15&modificationDate=1387090242380&api=v2

 

Definition

Object

Group

Definition

Explanatory Text

Synonyms

Administrative Details

Base

A placeholder for extensions to the model based on an organization’s administrative needs.

The Administrative Details object is designed to act as a 'placeholder' to allow for future extensions to the existing model. It allows for further information to be added about the administrative details required to maintain the other objects outlined by GSIM.

 

 

Attributes

To be defined on an 'as needs' basis.

 

 

 

 

Agent

/stat/platform/download/attachments/86541333/Organization%20Item.png?version=13&modificationDate=1387090440480&api=v2

 

Definition

Object

Group

Definition

Explanatory Text

Synonyms

Agent

Base

An actor that performs a role in relation to the statistical Business Process.

An Agent may be either an Organization or an Individual . An Organization may be an entire organization or entities within a larger organization, such as departments or divisions. An Organization may have sub Agents, which may be either other Organizations within the parent Organization or Individuals that belong to that Organization .

 

 

Attributes

Name

Description

Cardinality

Value Type

Name

 

0..1

Text

Description

 

0..1

Text


Agent Role

 

Definition

Object

Group

Definition

Explanatory Text

Synonyms

Agent Role

Base

The function or activities of an Agent , in regard to their involvement in the statistical Business Process .

An Agent Role may apply to either type of Agent - an Organization or Individual . A common example would be to identify which individuals or departments within an organization provide administrative data.

 

 

Attributes

Name

Description

Cardinality

Value Type

Name

 

0..1

Text

Description

 

0..1

Text

Identifiable Artefact

/stat/platform/download/attachments/86541181/Identifiable%20Artefact.png?version=16&modificationDate=1387097419866&api=v2

Note: there are many relationships between Identifiable Artefact and other information objects; these are not shown in the above diagram.

 

Definition

Object

Group

Definition

Explanatory Text

Synonyms

Identifiable Artefact

Base

An abstract class that comprises the basic attributes and associations needed for identification, naming and other documentation.

An instance of any GSIM information object is an Identifiable Artefact .

 


Attribute

Name

Description

Cardinality

Value Type

id

 

1..1

string

Individual

/stat/platform/download/attachments/86541185/Individual.png?version=10&modificationDate=1387097671353&api=v2

 

Definition

Object

Group

Definition

Explanatory Text

Synonyms

Individual

Base

A person who acts, or is designated to act towards a specific purpose.

 

 

 

Attributes

Name

Description

Cardinality

Value Type

Name

 

0..1

Text

Description

 

0..1

Text

 

Organization

/stat/platform/download/attachments/86541339/orgunit3.png?version=8&modificationDate=1387100852316&api=v2

 

Definition

Object

Group

Definition

Explanatory Text

Synonyms

Organization

Base

A unique framework of authority within which a person or persons act, or are designated to act, towards some purpose.

 

 

 

Attributes

Name

Description

Cardinality

Value Type

Name

 

0..1

Text

Description

 

0..1

Text


Business Group

 

Identifying and Evaluating Statistical Needs

Designing and Managing Statistical Programs



Assessment

/stat/platform/download/attachments/86540912/Assessment.png?version=22&modificationDate=1387090735010&api=v2

 

Definition

Object

Group

Definition

Explanatory Text

Synonyms

Assessment

Business

The result of the analysis of the quality and effectiveness of any activity undertaken by a statistical organization and recommendations on how these can be improved.

An Assessmen t can be of a variety of types. One example may include a gap analysis, where a current state is determined along with what is needed to reach its target state. Alternately, an Assessment may compare current processes against a set of requirements, for example a new Statistical Need or change in the operating environment.

An Assessment can use various information objects as inputs, whether they are the main objects that the Assessment is about or auxiliary information objects that help accomplish the Assessment .

 

 

Attributes

Name

Description

Cardinality

Value Type

Name

 

1..1

Text

Description

 

1..1

Text

Date assessed

 

1..*

Date

Subject Matter Domain

 

0..*

Text

Issues

 

0..*

Text

Results

 

0..*

Text

Recommendations

 

0..*

Text

 


Business Case

/stat/platform/download/attachments/86540939/Business%20Case.png?version=29&modificationDate=1387105906536&api=v2

 

Definition

Object

Group

Definition

Explanatory Text

Synonyms

Business Case

Business

A proposal for a body of work that will deliver outputs designed to achieve outcomes. A Business Case will provide the reasoning for undertaking a Statistical Support Program to initiate a new Statistical Program Design for an existing Statistical Program, or an entirely new Statistical Program , as well as the details of the change proposed.

A Business Case is produced as a result of a detailed consideration of a Change Definition . It sets out a plan for how the change described by the Change Definition can be achieved. A Business Case usually comprises various evaluations. The Business Case will specify the stakeholders that are impacted by the Statistical Need or by the different solutions that are required to implement it.

 

 

 

 

 

Attributes

Name

Description

Cardinality

Value Type

Name

 

1..1

Text

Description

 

1..1

Text

Date initiated

 

0..1

Date

Date approved

 

0..1

Date

Date implementation commenced

 

0..1

Date

Type

 

1..*

e.g. new program, permanent (indefinite) change to existing program, temporary change to existing program, cease program

Outcomes (objectives)

 

1..*

Text

Outputs (deliverables)

 

1..*

Text

Business Function

/stat/platform/download/attachments/86540963/Business%20Function%20%28revised%29.png?version=18&modificationDate=1387092034541&api=v2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Definition

Object

Group

Definition

Explanatory Text

Synonyms

Business Function

Business

Something an enterprise does, or needs to do, in order to achieve its objectives.

A Business Function delivers added value from a business point of view. It is delivered by bringing together people, processes and technology (resources), for a specific business purpose.

Business Functions answer in a generic sense "What business purpose does this Business Service or Process Step serve?" Through identifying the Business Function associated with each Business Service or Process Step it increases the documentation of the use of the associated Business Services and Process Steps , to enable future reuse.

A Business Function may be defined directly with descriptive text and/or through reference to an existing catalogue of Business Functions . The phases and sub processes defined within GSBPM can be used as an internationally agreed basis for cataloguing high level Business Functions . A catalogue might also include Business Functions defined at a lower level than "sub process". For example, "Identify and address outliers" might be catalogued as a lower level Business Function with the "Review, validate and edit" function (5.3) defined within GSBPM.

 

 

Attributes

Name

Description

Cardinality

Value Type

Name

 

1..1

Text

Description

 

1..1

Text

Business Process

/stat/platform/download/attachments/86541401/Statistical%20Activity.png?version=32&modificationDate=1387091970183&api=v2

 

Definition

Object

Group

Definition

Explanatory Text

Synonyms

Business Process

Business

The set of Process Steps to perform one of more Business Functions to deliver a Statistical Program Cycle or Statistical Support Program .

For example, a particular Statistical Program Cycle might include several data collection activities, the corresponding editing activities for each collection and the production and dissemination of final outputs. Each of these may be considered separate Business Processes for the Statistical Program Cycle .

 

 

 

 

 

 


Attributes

Name

Description

Cardinality

Value Type

Name

 

1..1

Text

Description

 

1..1

Text

Date initiated

First date of validity

0..1

Date

Date ended

Last date of validity

0..1

Date

Status

 

1..1

Extensible redefined list

(e.g. New Proposal, New-Under Development, Current, Completed, Cancelled, Transferred to another Organization)

 


Business Service

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Definition

Object

Group

Definition

Explanatory Text

Synonyms

Business Service

Business

A means of performing a Business Function (an ability that an organization possesses, typically expressed in general and high level terms and requiring a combination of organization, people, processes and technology to achieve).

A Business Service may provide one means of accessing a particular Business Function . The operation of a Business Service will perform one or more Business Processes .

The explicitly defined interface of a Business Service can be seen as representing a "service contract". If particular inputs are provided then the service will deliver particular outputs in compliance within specific parameters (for example, within a particular period of time).

Note: The interface of a Business Service is not necessarily IT based. For example, a typical postal service will have a number of service interfaces:

- Public letter box for posting letters

- Counter at post office for interacting with postal workers

 

 

Attributes

Name

Description

Cardinality

Value Type

Name

 

1..1

Text

Description

 

1..1

Text

Service Interface

Specifies how to communicate with the service.

0..*

Text

Location

Specifies where the service can be accessed.

0..1

Text

Change Definition

/stat/platform/download/attachments/86540979/Change%20Definition.png?version=28&modificationDate=1387092992814&api=v2

 

 

 

 

 

 

 

 

 

 

 

 

 

Definition

Object

Group

Definition

Explanatory Text

Synonyms

Change Definition

Business

A structured, well-defined specification for a proposed change.

A related object - the Statistical Need - is a change expression as it has been received by an organization. A Statistical Need is a raw expression of a proposed change, and is not necessarily well-defined. A Change Definition is created when a Statistical Need is analyzed by an organization, and expresses the raw need in well-defined, structured terms.

A Change Definition does not assess the feasibility of the change or propose solutions to deliver the change - this role is satisfied by the Business Case object. The precise structure or organization of a Change Definition can be further specified by rules or standards local to a given organization. It also includes the specific Concepts to be measured and the Population that is under consideration.

Once a Statistical Need has been received, the first step is to do the conceptual work to establish what it is we are trying to measure. The final output of this conceptual work is the Change Definition.

The next step is to assess how we are going to make the measurements - to design a solution and put forward a proposal for a body of work that will deliver on the requirements of the original Statistical Need

 

 

Attributes

Name

Description

Cardinality

Value Type

Name

A human-readable identifier for the object

0..1

Text

Description

A human-readable description of the object

0..1

Text

 

 

Environment Change

/stat/platform/download/attachments/86541165/Environment%20Change.png?version=11&modificationDate=1387097001733&api=v2

Definition

Object

Group

Definition

Explanatory Text

Synonyms

Environment Change

Business

A requirement for change that originates from a change in the operating environment of the statistical organization.

An Environment Change reflects change in the context in which a statistical organization operates. Environment Changes can be of different origins and also take different forms. They can result from a precise event (budget cut, new legislation enforced) or from a progressive process (technical or methodological progress, application or tool obsolescence). Other examples of Environment Changes include the availability of a new Information Resource , the opportunity for new collaboration between organizations, etc.

 

 

Attributes
Name

Description

Cardinality

Value Type

Change origin

 

1..1

Text

Legal changes

 

0..*

Text