Collection Management System (CMS)
This manages high level information about "statistical activities" ("collections") undertaken by the ABS. These "statistical activities" include surveys, censuses, statistical analysis of administrative data sources and statistical "compilation" activities such as preparing the national accounts.
The basic definition of a "collection" suitable to be registered in CMS involves inputs, processing/transformation and output. Simply collating data from other collections, therefore, results in a new "product" rather than being a new "collection" in its own right.
Each collection may have many instances (cycles) - such as a monthly survey. Information can be recorded at the collection, cycle or an intermediate level called "profile". (One purpose of the "profile" level is to document small to medium "redesigns" and other changes that can occur over time within a collection.)
Many (but not all) end to end processing systems do refer to the Collection ID and Cycle ID based on the registration of the relevant activity to CMS. This provides a good starting point in terms of end to end "metadata glue" and means the corporate registry function of CMS is being used relatively actively.
As a repository for descriptive information about statistical activities undertaken by the ABS, however, it sits to one side of the processes themselves and the content is often of relatively poor quality to start with and then poorly maintained over time. This is despite the fact that managers of these activities are asked to sign off on CMS content. Much of the content visible through CMS, therefore, cannot be relied upon as an accurate, up to date description of activities in the ABS.
A subset of this content is signed off to the ABS website to become visible in the ABS Directory of Statistical Sources. This disseminated content tends to be better (but not perfectly) maintained.
CMS also hosts "Quality Declarations" that are now disseminated alongside ABS data. The following is an example for monthly Labour Force data http://www.abs.gov.au/Ausstats/abs@.nsf/0/74BA4626F8C20DF5CA2573D20018F6F9?OpenDocument.
The basic design of the CMS dates back to the 1990s although it was updated to Version 5 in 2001. Its structure for describing activities doesn't correspond to the ABS Caterpillar or the GSBPM that was developed subsequently. Also
- more of the information entered in CMS should be actively driving actual business processes rather than being "passive" independent documentation, and
- more of the content visible through CMS should be sourced from other stores of actively used metadata.
DDI-L is seen as providing a sound, standard means of structuring information about "study units" and groups (and sub-groups) of study units. The information model underpinning DDI-L is likely to provide a standards aligned structured "backbone" to support extensive redevelopment, or replacement, of CMS in future.
This is a widely, but not universally, used registry for defining "dataset" metadata associated with a specific unit record file, data cube etc. This metadata includes
- the set of individual "data elements" included within the dataset
- where the data is stored and how it is structured (eg field names)
- what business unit owns the dataset, when it was last updated etc
- what statistical activity (collection) produced the data
This registry is an element of the ABSDB which, as described in BHM, was developed in the 1990s. The ABSDB is intended to catalogue all available "output" datasets within the ABS and assists in their management including long term retention.
Some systems working with data in specific environments have their own dataset registries, which includes structuring "dataset" metadata in somewhat different ways. Extending the corporate registry to integrate with the definition and management of "input" and "intermediate" datasets would be of value in an end to end context including being able to trace metadata usage within the ABS. (Querying the metadata model currently allows us to know, for example, which output datasets make use of a particular classification but not which input or intermediate datasets might do likewise.)
As the main corporate register dates back to the 1990s, the characteristics of "data elements" recognised within its model are not fully harmonised with ISO/IEC 11179 although the differences are not monumental. This is another driver for updating the model underpinning the current registry, in addition to the need to extend that model to better support definition and management of input and intermediate datasets.
Development of the MRR is expected to address these needs in the medium and longer term.
Classification Management System (ClaMS)
This is another system that largely dates back to the 1990s. It features a "pre Neuchatel" ABS developed model for classifications. As infrastructure it is used relatively widely (although not universally) in end to end statistical processes within the ABS. For example, in addition to being used universally as part of the defining metadata for output datasets, these classifications can be linked into metadata definition for
- the Input Data Warehouse
- processing of Household Surveys
- driving aggregation, estimation and consequential confidentialisation processes
- driving the layout of publication tables
- eg indenting labels according to the depth of the classification item in the classification hierarchy
- labelling and describing time series
- eg based on the classification item labels associated with each dimension of the "key" for that particular time series.
While quite useful for many systematic purposes, the current system is very weak in terms of enforcing rational reuse of classifications across the ABS. For example, while a business area might define their own version of a classification and use that version more or less on an end to end basis, they are unlikely to reuse a classification defined by another area. This is because
- it is relatively hard to find existing classifications that would be structurally suitable to be reused for the area's purpose(s)
- it is relatively easy for areas to define new classifications that meet their required specifications
- areas like to exercise full control over "their" classifications rather than being dependent on other management processes
In addition, ClaMS does not properly support the following
- detailed definitions (as opposed to labels) for individual classification items
- item by item mappings from one version of a classification to another version of the same classification
- item by item mappings from one classification to another
- "special" concepts such as "cut off values" used to translate continuous variables to categorical codes
At the same time, however, the levels of sophistication and complexity of classifications which can be supported within ClaMS can make it "overpowering" for users who have very simple and basic requirements.
It should be noted, also, that ClaMS is sometimes used for defining lists (eg of valid values) in addition to "proper" classifications.
Data Element Registry (DER)
The DER is an ISO/IEC 11179 based facility developed in accordance with the 2003 Strategy for End-to-End Management of ABS Metadata. It replaced a number of older "Data Item" systems.
DER was developed using a "services architecture". At the core is a repository of data elements and their building blocks (eg object classes, properties, value domains etc). There are then low level Create, Read, Update, Delete services which are in turn called by a higher level "business based" service layer. A generic user interface is supplied for the DER but it is expected that most users will be interacting with the DER as part of more general "business workflow level" metadata assembly (including reuse) tools that will work with data elements in combination with questions, question modules, collection instruments etc rather than in isolation.
The first main "take up" of the DER was via the Questionnaire Development Tool (QDT) developed as part of the ISHS project. (See BHM for more information). The second main "take up" was expected to relate to the Input Data Warehouse associated with business statistics. This meant that the first uses of DER were at the "input" end of the statistical cycle, but full end to end utilisation, including support for dissemination requirements, was expected in future.
In addition to the "data element" repository component based on the ISO/IEC 11179 Part 3 metamodel, the DER includes a more general "metadata registration" component based on ISO/IEC 11179 Part 6. The latter has been designed to be able to be separated out as a register and set of services in its own right which could support registration and management of metadata "objects" that are outside the Part 3 metamodel (eg questions, "collections", "collection instruments", datasets).
The key standards underpinning IMT (ie SDMX and DDI) have defined relationships with ISO/IEC 11179 and the architectural design of the DER lends itself to integration with the MRR. It is still expected that DER will make a broader contribution to metadata management in future, although this will now be through working with the MRR rather than the approach envisaged in the original project plan based on the 2003 strategy.
Questions, Question Modules, Collection Instruments
The ISHS project for household surveys developed new metadata repositories and associated services related to the above, as well as making use of the new corporate Data Element Registry and the existing Collection Management System.
While the actual development work on these repositories and services to date has concentrated on household survey requirements, the high level design and IT architecture was selected with an expectation that these repositories could be generalised and "corporatised" in future even if the higher level business services and workflow interfaces developed as part of ISHS, which currently interact with these repositories, remain specific to household survey processes.
Analysis suggested some extensions to the repositories and services would be required to support business statistics and other corporate uses but this should not impact existing use by household surveys.
The infrastructure developed by ISHS is still in the process of being "commissioned" for actual use by household surveys .
The initial use of these repositories and services focused on survey development and input processing but full end to end utilisation, including support for dissemination requirements, was expected in future - first by household survey processes and then more generally.
With the advent of IMT, the outlook for these facilities may be similar to the outlook for DER. Early work has already been undertaken reviewing the relationship between the "metadata framework for ISHS" (which informed the design of the repositories and services) and the information model and schemas for DDI-L – which was released around the time the main ISHS development effort was completed.