Message-ID: <1226267035.9231.1411055769953.JavaMail.confluence@ece-vmapps> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_9230_1077839640.1411055769952" ------=_Part_9230_1077839640.1411055769952 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html
Statistics Austria has no "official" classification of metadat=
a. But during the conceptual work for BASIS 2000+, STAT+ and the integrated=
metadata system IMS a multidimensional approach - similar to Bo Sundgren's=
proposal in working paper 7 of the METIS 2008 meeting - was worked out.
In this model metadata content is considered to be the focal point. Stat= istical metadata appear in many different forms: e.g., as title of a table,= text in a document (describing, for example, conceptual objects like a sur= vey, a statistical concept or a validation rule), source code statements in= a software program, technical attributes of a file, and so on. In principl= e these metadata items can be seen as instances of a set of object types, w= hich are connected by different kinds of relations (which themselves are pa= rt of the metadata).
These different types of metadata can be investigated from varying point=
s of view. The IMS project team differentiated between the following six di=
1. Dimension "Function":
This dimension describes metadata's purpose. Basically, metadata are req= uired for the following reasons:
2. Dimension "Statistical Life Cycle":
Statistics production can be described as a process which transforms inp=
ut data into output data via several steps and using statistical methods. I=
n Statistics Austria, this statistical life cycle is structured in the form=
of "statistical projects" of different types (surveys, registers=
, analytical projects or systems). This is described in more detail in sect=
3. Dimension "Users":
Statistical metadata are no end in themselves, but are required by diffe=
rent groups of users for varying purposes. "User" must here be un=
derstood in a broad sense, comprising not only persons but also IT systems.=
We can distinguish roughly between two main groups of users: external= ones (which do not belong to the NSI) and internal users. External users a= re mostly "consumers" of statistics, they may however also be pro= viders of raw data (respondents). Internal users are often both"produc= ers" as well as consumers. Among others, external users may be politic= ians, scientists, economic enterprises, journalists, private persons or int= ernational organisations. In general, the users in these groups differ in t= he amount of their previous knowledge, the level of detail they wish for in= the statistical information they are seeking, and the resources at their c= ommand. From the point of view of the amount of metadata they require, one = must keep in mind that this may also vary within the relatively heterogeneo= us groups. Furthermore, the requirements may evolve with time.
4. Dimension "active / passive":
This dimension treats the degree to which metadata play an active role i=
n statistics production, i.e. controlling the process or automating process=
ing steps (e.g. when an electronic questionnaire is generated automatically=
based on the specification of a survey's questions). With regard to effici=
ent production of statistics one should aim at letting as many active metad=
ata elements as possible be defined directly by the statistical subject mat=
5. Dimension "formatted / unformatted":
A distinction can be drawn between formatted and unformatted data. The s=
tructure of the former is agreed beforehand (e.g., every record in a file c=
onsists of the same sequence of data fields, which in their turn exhibit pr=
earranged characteristics such as data type, length, etc.; or a data file c=
onforms to a predefined XML schema) and thus easily lends itself to automat=
ed processing with computers. Unformatted data on the other hand - texts, g=
raphics, voice etc. - are much more difficult and cost more effort to proce=
ss, especially with regard to IT programs "understanding" their c=
ontents. Statistical metadata often occur in unformatted form, e.g. as text=
6. Dimension "manual / automatic":
The criterion by which this dimension classifies metadata is whether the=
y are recorded manually by the persons entrusted with planning and implemen=
ting statistical projects, or whether they are created automatically by too=
Apart from these dimensions, which serve as a means to describe and unde=
rstand the multilayered topic "statistical metadata", other impor=
tant aspects must be taken into consideration within the context of metadat=
a management and the development of metadata systems.
When talking about quality in statisti= cs, in most cases the quality of data and statistical results is regarded. = In this context, a definition of quality as well as quality criteria have b= een elaborated, and many NSIs have introduced routines for quality reportin= g within their institutions.
Compared to data quality, the topic of &q= uot;metadata quality" has received much less attention. In our opinion= , the definition of quality criteria for metadata should become a central t= ask of international working groups in the future.
This topic comprises organization= al questions within an NSI (for instance: is there a central metadata unit?= If yes: what are its tasks?), but also issues regarding the registration a= nd administration of metadata items (for example access rights, stewardship= , life-cycle status, locking of items while they are updated).
In the process of software development m= etadata play a decisive role. In order to produce
software of high qua= lity and in an economic way, the availability of tools - to support the man= agement of "software metadata" (including the source code of the = programs) and to provide services to alleviate the software engineers' work= - has long been recognized as necessary. Especially when several programme= rs are cooperating in a software project, the storage and administration of= all information items in a central repository seems indispensable.
The production of statistics exhibits a high degree of similarity to the=
production of software. However, in statistics the advantages offered by s=
pecialized tools and a centralized metadata repository are not yet generall=
Numerous papers point ou= t that the development of a long term strategy forms a necessary and fundam= ental basis for the step-by-step realization of an integrated metadata syst= em. The elaboration of a "construction plan" as a flexible and ex= tendable architecture is cost-intensive and time consuming, but it is also = an investment into a stable fundament which will pay off in the future.
Some important general goals of a metadata strategy are:
In the "4-layer model" metadata are represented as an "in= frastructure" layer accompanying the phases of statistical production;= in every phase newly produced metadata are stored in the metadata systems = and existing metadata are accessed and perhaps re-used. A higher degree of = model detail concerning different types of metadata was not attempted, howe= ver.
For the purpose of cost planning and controlling, SAP software is used.<= /p>