Message-ID: <737988563.43702.1462300500891.JavaMail.confluence@ece-vmapps> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_43701_14737824.1462300500890" ------=_Part_43701_14737824.1462300500890 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html
|2. Modelling the Information and Proc= esses of a Statistical Organization (Statistics Canada)||Statistics Canada||4. Statistical Metadata Systems= (Statistics Canada)|
GSIM is being adopted to specify, design, and implement components that = will easily integrate into =E2=80=9Cplug=E2=80=99n=E2=80=99play=E2=80=9D so= lution architectures and seamlessly link to standard exchange formats (e.g.= DDI, SDMX). It is important to note that GSIM does not make assumptions ab= out the standards or technologies used to implement the model, which leaves= the Agency room to determine its own implementation strategy.=20
Statistics Canada is beginning to use GSIM=E2=80=99s Concepts a= nd Structures Groups as the main classifiers of metadata. These gr= oups contain the conceptual and structural metadata objects, respect= ively, that are used as inputs and outputs in a statistical business proces= s. The Structures group defines the terms used in relation to data= and their structure. The Concepts group defines the meaning of da= ta, providing an understanding of what the data are measuring.=20
Work focuses on aligning the new GSIM-based classification with other in= ternal metadata classification models currently in use. For instance, IBSP = identifies the following types of metadata:=20
- Reference met= adata: Describes statistical datasets and processes.=20
- Definitional = metadata: Description of statistical data (with meaning to business use= r community) E.g., concepts, definitions, variables, classifications, value= meanings and domains.=20
- Quality metad= ata: Quality evaluation of a dataset or individual records; helps users= assess the fitness of associated data for their specific purposes. E.g., C= V, rolling estimates, analysts comments about the quality of a set of recor= ds.=20
- Operational m= etadata: links between the concepts and the physical data.=20
- Systems metad= ata: Low-level information about files, servers and infrastructure that= allows the physical IT environment to be updated without re-specification = by the end user.=20
 For example: analyst comments about their analy= sis, output of statistical processes; respondent comments, interviewer comm= ents or additional information about the respondent obtained during collect= ion.
Metadata use is not uniform across all GSBPM phases. IMDB me= tadata, consisting mostly of GSIM Concepts and Business o= bjects, is used for survey design (phase 2) and dissemination (phase 7).&nb= sp; Survey managers use the IMDB to identify existing variables for reuse. = New variables and related questions will be soon documented and stored duri= ng questionnaire design as well, which will then be used by collection proc= esses across the Agency. In the dissemination phase, the IMDB is the = primary source of summary texts describing surveys, definitions of variable= s, related methodology, and data quality and questionnaire images. Mo= st products on the Statistics Canada website offer a link to the related IM= DB survey records.=20
The System of National Accounts (SNA) creates and uses metadata (classif= ications) for the data integration sub-process (5.1) of the GSBPM. In parti= cular, the SNA creates the Input-Output Industry Codes (IOIC), Commodity Co= des (IOCC) and Institutional Sectors classifications. They are mainly used = for the GDP surveys (annual, quarterly and monthly). SNA classifications ar= e being exported to the IMDB and integrated into data warehouses for analys= is via a classification web service). In addition, concordances to = NAICS, NAPCS and other international classifications will be maintained in = a Classification Management and Coding System (CMCS).=20
The Social Survey Processing Environment (SSPE) has its own metadata rep= ository (see Section IV-B) which is used from design through dissemination,= including survey and questionnaire metadata, codesets and codebooks. The S= SPE repository does not use the same metadata objects that are in the GSIM = Business or Structures groups.=20
Underlying the GSBPM is the implicit need for common semantics which req= uire some degree of harmonization and maintenance across all phases. Even c= ommon concepts like questionnaire, survey, or classif= ication mean different things across the Agency. Enterpris= e Architecture Services (EAS), Methodology and Subject Matter areas have wo= rked collaboratively to make progress in semantics work for the IBSP on a n= umber of topics, including:=20
Statistics Canada=E2=80=99s involvement in the development of the GSIM h= as influenced both GSIM and the Agency=E2=80=99s internal semantic work. A = case in point is the work done by EAS with IBSP and the Integrated Collecti= on and Operation System (ICOS) on survey instrument and questionnaires, whi= ch helped identify the need for a flow decision object separated from flow = action that was included in version 1.0 (submitted to the GSIM group for re= view). This semantic work has been the starting point for developing a cano= nical model for survey instrument and questionnaire for the SOA .= p>=20
IBSP has also developed a conceptual framework and naming convention for= harmonized content. SSPE has developed standardized questionnaire modules = for cross-cutting household survey variables. These modules contain standar= d concepts, definitions, classification and wording for multiple collection= modes.=20
 See Section IV-F-(a) for more information on this service.= p>
 It indicates the provenance of the value for data quality pu= rposes.
 See Section IV-F for more information on SOA canonical model= s.
Several projects have been identified as potential content providers or = consumers of the IMDB. The IMDB now stores documentation for public u= se microdata files as part of the requirements for the Data Liberation Init= iative (DLI) - an initiative between Statistics Canada and Canadian univers= ities to share data for social science research. Statistics Canada makes av= ailable to universities and colleges, by subscription, all of its statistic= al products including microdata files using Data Documentation Initiative (= DDI) specifications.=20
Another initiative under development is the Research Data Centre (RDC) M= etadata Project. Rather than integrating with the IMDB on a case-by-c= ase basis (point-to-point integration), authorized applications can gain ac= cess to content through IT industry standard web services in a standard bas= ed format (DDI). This approach is expected to reduce development cost= s, allow for code and component reuse across projects and foster the adopti= on of global standards across the Agency. It will also support the fu= ture establishment of standard-based data and a metadata management framewo= rk.