3.1 Metadata Classification

25. The metadata is classified according to their usage and their role in the statistical production process.

The main types of metadata according to this criteria are as follows:

  • Definitional metadata - The definitional metadata refer to metadata that act as identifiers and descriptors of the data. They are prior to the data, are created and maintained independently from the data and are used to define the data structure. Examples of definitional metadata are country names and codes, currency names and codes and their relation to the countries, definitions of the indicators, classifications like ISIC Rev. 2, ISIC Rev. 3, etc. Through these core data are defined also some basic metadata elements like metadata classes, stages, sources and methods, etc. Historically this metadata type was the first to be established (ported from the Mainframe, re-factored and formalized) in ISDE. The definitional data are maintained by the statistical staff using the tool Nomenclature Explorer (NE) following strictly the user authorisation and ownership.
  • Implicit metadata - The implicit metadata are a special class of metadata arising throughout the specific usage of other metadata. Typical example are the ISIC combinations. For example several industry categories can be combined and reported together by a given country for a given indicator and years. In the questionnaire returned by the NSOs such a combination is expressed in the following way (see - Figure 6):

The codes 1511, 1512 and 1513 are combined and reported as a single number '1234'. The combined industries are linked by the footnote a/. This is resolved by the system as a dummy ISIC code 1511A defined as "1511 includes 1512 and 1513" which is used throughout the production process and appears accordingly in the publications as well as in the pre-filled Questionnaire. In a similar way can be solved other country specific classification discrepancies like industry codes at 3-digit level that exclude one or more specific 4-digit industry codes. The implicit metadata can be used also for defining of synonyms - for example '040' is the country code of Austria and this is the same as, i.e. substituted by the ISIC code 'AUT'. Or for specifying of aggregation e.g. the aggregation code 'EU' is composed by the codes of the single countries. The keywords substitute, included, excluded used in the above described context are called operators.

  •  Operational Metadata - The operational metadata are generated by the process of data transformation and attributed to the respective data items. As described in the presentation of the Data Transformation phase, each data item is stored in the database with a stage indicator reflecting its credibility. Also the transformation process generates "Source" and "Methods" metadata, describing the source of the data item and methods applied for its generation.
  • System metadata - these metadata are used to drive automated processing throughout the phases of the life cycle. These can be layout definitions for the yearbook (for each country, for each edition of the yearbook) as well as country lists, etc., used in the automatic generation of the PDF output; Installation and packaging lists, directories, templates, etc. for creation of the CD product. These metadata are specific for the application where they are used and do not relate to the data, therefore, although stored in the centralized repository, are maintained by each application separately and are called "Properties" of the respective process, i.e. Yearbook properties, Questionnaire properties, etc.
  • Descriptive and Methodological metadata - these form the main bulk of metadata. They are received from the primary data reporters, using the UNIDO Questionnaire and than are further processed together with the data. During this processing additional metadata can be added by the UNIDO statistical staff. Descriptive or methodological metadata can be attached to all possible levels ranging from the complete data set down to individual data items. This is done by assigning to the metadata same dimensions as those of the data.

3.2 Metadata used/created at each phase

26. In the rest of this section is described each single phase of the statistical production life cycle and for each phase the metadata used or created is specified.


27. The main output of this phase is the pre-filling of the out-going UNIDO General Industrial Statistics Questionnaire with previously reported statistical data and metadata for their possible revision by the NSO. The questionnaire is created in Excel format in one of the three languages (English, French or Spanish) appropriate for the particular country. The pre-filling is automated using the available data and metadata.

Data Collection

28. After receiving back the completed questionnaires, they are entered in the system for validation and
further processing. The excel file is read automatically in and the user has range of tools for validation, analysis, correcting etc. - see Figure 7. During the processing of the particular questionnaire with all data and metadata included it can be stored in the interim storage in XML format. The metadata can be edited or new can be entered - see Figure 8 and Figure 9.

29. The preparation of appropriate statistical metadata in support of the INDSTAT databases requires concrete
and well-documented metadata inputs from the primary data compilers. Thus, UNIDO requests NSOs to provide, together with available statistical data, such descriptive information through its industrial statistics country questionnaire. The key items for which the organization needs to obtain metadata include:

   o Name of the supplier of the statistical data (i.e. reporting agency),

   o Basic source of data (e.g., annual industry survey),

   o Data reporting system (major deviations from ISIC),

   o Reference period (e.g. calendar year),

   o Reference unit (type of statistical unit)
      i. Establishment ii. Enterprise
      iii. Other o Scope of the annual survey (type of reference units covered) - information on coverage and the cut-off size,

   o Employed method of data collection,

   o Employed method of enumeration (direct interview, mail or web-surveys),

   o Response rate,

   o Treatment of non-response,

   o Concepts and definitions of the variables on which data are reported (details about each indicator),

   o Related national statistical publications and

30. The provided metadata are sometimes not described from the viewpoint of international comparability but rather from the viewpoint of national standards. In such cases the UNIDO statistical staff re-describes/rearranges the provided metadata into explicit information for the deviation from the international standard. This is often a difficult task and requires additional metainformation from the concerned NSO.

31. Additionally to each data item in the questionnaire can be attached one or more metadata items (footnotes in the older UNIDO terminology), like "Missing because of confidentiality reasons" or combinations of ISIC codes like "1511 includes 1512", etc - see Figure 5. 32. The metadata that are provided by NSOs often do not explicitly indicate deviations from international standards. In such cases, UNIDO attempts to r-describe/re-arrange the provided metadata into explicit information concerning the deviations from the international standards. This is often a difficult task and requires additional clarifications from the concerned NSO.

33. Data for OECD member countries, collected through joint OECD/UNIDO questionnaire and transmitted to UNIDO (excel format) are entered into the system in a similar way and are ready for further validation and processing. These questionnaires do not contain metadata, which is extracted from other OECD publications - OECD (2003) Industrial Structure Statistics, Volume 1, Core Data 


34. The data collected by UNIDO from the NSOs and further transformed according to the quality requirements
in the transformation phase constitutes the major source of data for several recurrent publications produced by PCF/RST/STA. The metadata collected from the NSOs together with the data undergoes the same transformation process as the data and is complemented by metadata generated by the transformation process. All resulting metadata, including the necessary structural metadata, are used in the dissemination process:

35. The data collected from the primary sources are further transformed to a ready-to use data set. The
data transformation is done in five stages, which not only constitute an operational framework for UNIDO statisticians, but also provides additional description of statistics (generated metadata which is attributed to each data item) to users. For details about these stages see UNIDO (1996), pp 6-8, only a brief summary is in order:
    i. Manual detection and if possible correction of obvious reporting errors. The data are kept in original form (Stage 1 data). These data are used for pre-filling the following edition of the questionnaire for the particular country;
    ii. Inconsistent data are corrected using supplementary information from national publications (Stage 2 data). Stage 1 and Stage 2 data are considered as official;
    iii. Data are adjusted to eliminate the departures from the level of ISIC aggregation using national and international sources or supplementary data (Stage 3);
    iv. Missing data are estimated by UNIDO statisticians applying related proportion or interpolation whenever applicable (Stage 4) and
    v. Provisional estimates are made for the latest year (Stage 5).

36. At the same time Source and Method metadata are maintained for each data item. If appropriate, redescription of the provided metadata from viewpoint of international comparability is performed.

37. During the processing period a range of descriptive metadata also requires updating such as country names, national currencies and country groups. For example in 1990's after the fall of USSR and break-up of Yugoslavia, a number of new sovereign states emerged in Euro-Asia region. On the other side 12 EU member countries adopted common currency Euro replacing the previous national currencies. More recent changes were related to the democratic republic of Timor Leste and Republic of Montenegro, also recently two more countries joined EU (Bulgaria and Romania) and two countries (Malta and Cyprus) adopted the Euro as national currency.


   o To define the dissemination products - for this purpose are used the structural metadata like country names and codes, currency names and codes, classifications, etc.;

   o To guide the dissemination process - for example the selection of data to be published in the different products depends on the degree of confidence they deserve as identified by the stage (metadata generated in the transformation process);

  o To provide users with the information they may need to interpret the disseminated data.

39. The International Yearbook of Industrial Statistics is the main UNIDO statistical product, which has been the most important medium of data dissemination for many years. The latest yearbook released in 2008 covered the data for the period from 1995 to latest year. The country data was updated for 74 countries and is compiled from the Stage 1 and Stage 2 (as described elsewhere in this document).

40. Another medium of UNIDO data dissemination are CD products, which might include data from all stages described earlier. The demand of CD products is increasing every year from national and international institutions, academia and researches. For information on purchasing procedures and licensing the readers should refer to . The latest release of the CD products in 2008 covered the following statistics as shown in Table 5.

41. Another form of data dissemination is providing statistics by selected variables from the different UNIDO databases for each member state which are posted in UNIDO web-site under the item Country Statistics. Country data in the web site are presented for several years together with the figures for the world and region for comparison over time as well as in relation to the region level.

42. Apart from the recurrent publications listed above, industrial statistics data can be disseminated on ad-hock queries mainly for internal but in some cases also for external users. 43. In the following Figure 10, Figure 11, Figure 12 and Figure 13 are shown examples of metadata shown in the different dissemination products.