|2. Modelling the Information and Processes of a Statistical Organization (German Federal Statistical Office)||German Federal Statistical Office||4. Statistical Metadata Systems (German Federal Statistical Office)|
3.1 Metadata Classification
The RDC-metadata system uses a classification that distinguishes between semantic, technical and administrative metadata. Semantic metadata include definitions of variables and other definitions as well as all kinds of methodological documentation. Technical metadata define metadata on the level of record types. Administrational metadata is mainly information about the responsible persons and institutions.
The RDC-classification reflects the need to classify metadata according to different levels of abstraction. Although the RDC-system does not have a separate conceptual level in the sense of the Neuchâtel-model, the term semantic metadata can be seen as synonymous with conceptual metadata. In the future we might need to supplement this classification with a contextual level functioning as a mediating level between the conceptual and technicallevels.
With the broadening of Destatis' approach to metadata and the involvement in the Census 2011, it soon became clear that additional classifications had to be introduced to reflect a stronger focus on the statistical process chain. However, it also became clear that there were endless possibilities to structure and classify metadata. Initial experiments with the proposed CMF-classification were made before we realised that Sundgren (2008)** was right to state that multiple linear classifications of metadata exist and that each of them serves a purpose. Roughly following Sundgren, we decided to classify the metadata by form into structured, semi-structured and unstructured metadata. Structured metadata is metadata that exists in metadata systems being structured according to some information or metadata model. Semi-structured metadata exists in the form of written text in a linear order where each text file is the instance of some given template. Typical examples of this kind of metadata are quality reports (or this case study). Unstructured metadata basically consists of text files (methodological documents, etc.) that are structured only on the basis of the author's needs and taste. This classification works fairly well in the census, where most of the metadata is of the unstructured kind.
In addition to using proper classifications*** we also distinguish metadata according to user groups and according to attachment objects (like statistical activity or statistical activity instance). These distinctions do not constitute classifications in the strict sense, because - as of yet - we have neither an exhaustive list of user groups nor an exhaustive list of attachment objects (the latter being the same as an overarching exhaustive metadata model). However, we do classify metadata according to the processes that use or produce these metadata thereby using the process model as a classification.
Apart from classification, the terms quality metadata and production metadata are used in the office. Quality metadata refers to after the fact interpretation of metadata and applies to all metadata that is deemed important for evaluating data quality. Since such an evaluation must be based on existing metadata (frequently called documentation in this context), the degree to which such metadata exists is itself an important quality indicator and part of quality metadata. Production metadata is a term often heard in connection with software development indicating metadata used to execute and control (sub-)-processes in the production of statistical data.
** Sundgren, Bo (2008): Classifications of Statistical Metadata. Paper presented at the Joint UNECE/Eurostat/OECD work session on statistical metadata (METIS), Luxembourg, 9-11 April 2008.
*** Classification meant as a list of mutually exclusive categories that exhaustively classifies each object within its scope according to some explicit or implicit criteria
3.2 Metadata used/created at each phase
So far, no process model has yet been used at Destatis to guide the collection of metadata across all statistical activities. Within the census, however, our adapted version of the METIS GSBPM-model will be used in this way (see 2.1). For each sub process we have established a set of metadata objects for documentation. Each documentation object can be structured, semi-structured or unstructured. Variables (as a general concept, including all object types in the Neuchâtel model), statistical units and rules for generating variables are seen as structured documentation objects. Other objects are essentially text documents, to be delivered as .pdf, word or excel files. So far, there are 41 of these textual documentation objects. Some of them result relatively straightforward from their respective processes. This is the case with drafts for new statistical laws (1.4 in the process model, there are individual laws for most statistical activities), business cases for IT-systems (1.4), technical specifications for IT-systems from the client's side (1.5) and technical specifications (plus handbooks) for IT-systems from the developer's side (2.1). In other cases, more general documentation objects were requested, like "description of output" which could be any document detailing the planned products for the census (1.1). Important aspects for the assessment of data quality were covered by documentation objects on sub-processes coding (4.3), data editing (4.4) and imputing missing values (4.5), with one object elaborating on the intended procedure of the respective processes and one consisting of after the fact
3.3 Metadata relevant to other business processes
In general, all metadata collected along the core process chain is also relevant to other business processes, albeit often on a more condensed level. The Destatis process model (featured in 2.1 as the first model) details these other business processes that are not always part of the METIS GSBPM-model.
The processes that need more detailed metadata are "management of statistics (statistical activities)", "methodology development" and (not mentioned) "quality and metadata management". Apart from the core processes, management and support processes also need metadata, although mostly either in a very general form or very detailed according to specific requests. To deal with this issue, the Statistikdatenbank will be made available to more users with the possibility to link to budget and accounting systems or other resource planning software.