Message-ID: <969362713.11558.1422802738810.JavaMail.confluence@ece-vmapps> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_11557_1389463113.1422802738809" ------=_Part_11557_1389463113.1422802738809 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html
|3. Statistical Metadata in each phase= of the Statistical Business Process (Statistics Austria)||Statistics Austria=||5. System and design issues (St= atistics Austria)|
ISIS (short for Integrated Statistical Information System) is a statisti= cal output database which was already developed in the early 1970s and has = been consistently maintained and developed further since then. It contains = thousands of multi-dimensional data cubes as well as metadata of various ki= nds (e.g., short descriptions of the data cubes and the underlying surveys;= keywords and a hierarchically structured topic tree are furnished for data= searching) and implements a large part of the Statistical Data System SDS = in the life cycle model. Although ISIS is still very modern from the point = of view of the conceptual design of its contents, the software itself has r= eached the end of its life span, as only one programmer now still possesses= sufficient technical know-how to maintain the mainframe Assembler and PL/I= programs. Because of this, a successor system (ISIS New) is currently bein= g developed on the basis of the Australian company Space-Time Research's Su= perSTAR product range.
e-Quest is a system consisting of several tools for metadata-driven gene=
ration of electronic questionnaires, administering them and preliminary pro=
cessing of the incoming questionnaires. Subject matter experts can design t=
he questionnaires with a user-friendly graphical editor. The active metadat=
a thus specified are stored in XML format and then used to represent the qu=
estionnaires dynamically in a Visual Basic 6.0 rich client application (whi=
ch must be installed by the respondents) on the one hand; on the other hand=
, JSP pages and SQL table definitions for electronic questionnaires accessi=
ble via a uniform Web questionnaire portal. e Quest thus covers important a=
reas of phase "data production".
Currently the project "= ;e-Quest New" is running with the goal of replacing the Visual Basic c= omponents by a Java-based solution. Simultaneously, better integration of t= he stand-alone and Web questionnaire subsystems is being aimed for.
Using document management software from the company Stellent (which sinc= e has been acquired by Oracle) the publication data system PDS was created = during the last few years. This stores all publications (i.e., documents of= various types, from tables over print publications and press releases to t= he so-called standard documentations) together with metadata relating to th= e documents. Since the Web re-launch on June 1st 2007, Stellent is also uti= lized as a Web content management system. The subject matter experts now cr= eate Web pages in the form of standardized Word documents which are automat= ically converted to HTML and copied to the correct position in Statistics A= ustria's website on the basis of associated metadata (in particular a hiera= rchical topic and navigation structure). The navigation structure is also u= sed for generating links to related documents with data and metadata. The o= nline directory of print publications (many of which can be downloaded free= of charge as PDF files) was also implemented in the Stellent system.
In 2006 the Classification Database KDB was released. This allows Web ac=
cess to almost 20 voluminous classifications such as PRODCOM, NACE, COICOP,=
SITC and CPA, including comments and correspondences. More than one versio=
n is available for several classifications.
Up to now an application f= or interactive editing and processing of classifications has not been devel= oped.
STF is an XML specification which permits cross-classified tables to be = stored together with extensive metadata in a hardware- and application-inde= pendent format - for long-term storage, among other uses. Converters from S= TF to Excel and HTML and from Excel tables to STF are supplied. When Excel = tables are checked into the Stellent publication database, they are automat= ically converted to STF format. ISIS query results can also be stored in ST= F format.
The standard documentations - which can be downloaded as PDF documents o= ver the Web - serve as the most important source of metadata about statisti= cal projects and the quality of the statistical results they produce. The d= ocuments exhibit a standardised chapter structure and hitherto describe mor= e than 100 statistical projects or survey versions, in part in great detail= (they number between 8 and 100 pages; in many cases further documents are = provided as attachments which can be accessed via hyperlinks in the text). = Among other things they do carry the disadvantage of usually being written = and made available to the statistics' users in a separate and additional wo= rk step after the fact, although they contain many documentation elements w= hich come into existence in the early phases of planning and preparing the = statistical project. Another weak point is that there are no quantitative q= uality-indicators included.
This system was implemented through a Word template. Every manager of a =
statistical project is obliged to use this template when compiling a standa=
The main headlines are the following:
Every chapter is divided into subsections which are more or less standar= dized.
The calendar of planned releases is available at http://www.statistik.at/web_de/ueber_uns/ veroeffentlichungster=
mine/index.html. It consists of two PDF-files which are updated on a regula=
r basis (in the first one releases are sorted by date, in the second by sta=
From the same Web address, a file with information o= n the dates of data transmissions to Eurostat can be downloaded. There is a= lso a link to the advance release calendar at the SDDS site of the IMF.
The planned press releases of the upcoming week are published at http://www.statistik.at/web_de/press= e/presseservice/index.html
This is an MS-Access application available only to internal users which = contains information about administrative data sources.
Metadata systems form a fundamental information infrastructure for the p=
roduction of statistics. More than 15 years ago, Bo Sundgren wrote the foll=
owing about this topic:
"Statistical metainformation systems = (...) exhibit some characteristics, which are typical for inf= rastructures:
(Bo Sundgren, Organizing the Metainformation Systems of a Stati=
stical Office, Statistics Sweden 1992)
When metadata can be utili= zed to standardize and automate production processes ("active metadata= "; see section 3.1), the costs for the development of metadata systems= (which in many cases are quite substantial) are balanced by prospective lo= ng term monetary benefits, which in the long run may result in major cost s= avings. One example of this is Statistics Austria's metadata driven electro= nic questionnaire system e-Quest. Compared to the development of a tailor-m= ade electronic questionnaire for a single survey, its initial development c= osts were inevitably higher. But now e-Quest facilitates the cost-effective= creation of electronic questionnaires. By using the system repeatedly with= in many different statistical projects, the break-even-point was reached qu= ickly.
The situation in the case of developing systems for the collection and a= dministration of passive metadata is, however, quite different. Passive met= adata are an integral component of statistical information. Their availabil= ity and easy accessibility contribute to the quality of statistical product= s, but in many cases do not result in cost reductions (they may even increa= se the work load of subject matter statisticians). Opportunity costs caused= by the non-existence of centralized end-to-end metadata systems are rarely= found in accounting systems. Thus high investments are accompanied "o= nly" by a gradual gain in quality (which may not even be recognized by= all user groups). Under these circumstances it is understandable that in t= imes of economic crisis the willingness to invest in metadata projects is n= ot high.
The concept of "high-quality statistics" is a dynamic one. The= needs and requirements of users are changing and will probably increase in= the future, e.g. with regard to harmonization of statistics or the linkage= of data with relevantmetadata items (respectively linkage of= metadata items with related metadata items), so that they can be accessed = at the push of a button. If metadata are stored in the continuous text of b= ulky documents, these new requirements cannot be met. The management of met= adata in an "atomic" and structured form, however, is a challenge= with respect to both financial resources and personnel.
The fundamental principles of metadata management, which have been defin= ed by experts during recent years (and which can be found, for example, in = part A of the Common Metadata Framework) will become more and more commonly= accepted standards and state of the art for the production and disseminati= on of statistical information.
The task of implementing these standards can certainly not be carried ou= t at short notice. In this respect, it is not easy to answer the question w= hether to continue building isolated metadata systems whenever the need for= one specific system arises, or whether to strive for an integrated system = based on a global architecture. The first approach is certainly less expens= ive in the short run and produces quicker results, but in the long term it = will cause quite substantial "repair" costs.
Similar to the BASIS 2000+ concept, a modular implementation approach wa= s a major design principle of the IMS. In order to minimize the complexity = of the complete system, the individual components (subsystems) should be ab= le to work independently, communicating with each other and the central &qu= ot;Registry" by means of a web service and program interface layer. Th= us - considering the limited resources - stepwise realization and gradual c= ommissioning and expansion of the IMS (in the sense of "evolution inst= ead of revolution") should be facilitated.
Regarding the integration of previously existing legacy systems into the= IMS, several options are possible. A very simple form of coupling can be r= ealized by manually registering information objects (for example a classifi= cation from the Classification Database) in the IMS Registry. A tighter and= more sophisticated integration will require some programming effort, so th= at a legacy system can communicate with other components of the IMS via web= services.