4.1 IT Architecture
The introduction of Service Oriented Architecture (SOA) into Statistics NZ was the culmination of researching industry trends and evaluating those trends against the new technical challenges that were arising in response to the BmTS.
The BmTS has three core deliverables:
- A standard and generic end-to-end process or processes to collect, process, analyse and disseminate data (Value Chain).
- An approach to data management which is disciplined and consistent (Information Architecture).
- An agreed organisation-wide technical architecture as a framework for making system decisions.
To support the first two deliverables and to ensure that the third deliverable is achieved Statistics NZ has adopted a Service Oriented Architecture (SOA) approach. A SOA resolves the behaviour of the organisation's IT assets into a set of common behaviours or services. Services can be business services and technical services.
The SOA is a key enabler of BmTS exposing common services (business / statistical and technical) as an abstract, decoupled and consistent set of interfaces enabling the communication of as much of the process and data in Statistics NZ's core business. In addition, there are a number of benefits related to the incorporation of third party software; this includes off-the-shelf applications and providing and using services to and from other statistical agencies. Key aspects of the Statistics NZ SOA are that the consumer of the service can find and bind to services at runtime and the SOA extends to the development, deployment and management of services.
This architecture will enable the transparent exchange of metadata between different systems and tools. The current service layer is supporting some existing metadata components like process management (workflow), business rules (rules engine) and the integration with main systems (CRM-based respondent management and call centre), tools (SAS, ETL, Blaise) and databases (SQL Server).
4.2 Metadata Management Tools
Currently within Statistics New Zealand, a programme of work is underway developing the plans for the components of the key metadata infrastructure. While the plans are still unconfirmed, the summary below addresses the current thinking, likely direction and known issues for each component.
Search and discovery, Metadata and data access/ registration - These components will be developed within our wider Information Portal development. Currently investigations are underway to implement a single searching tool which will facilitate access across data environments and documentation stores. While further planning is needed to generate the high level strategy for access, search and discovery, the principles of reuse and integration are being incorporated in other components to ensure these components can be developed.
Data Definition - The data definition component will be based on metadata directly linked to the data in IDE (Input Data Environment) and ODE (Output Data Environment). The conceptual vision for the IDE and ODE is that they will contain several 'areas' which reflect the use of data. For example, in the IDE this includes a Load Area, Operational and Exceptions area (for processing), Clean area and Aggregate area (for analysis) and Data Marts (for Time Series, Longitudinal data etc). The data definition component will need to reflect the metadata needs for each of these areas and will emphasise reuse by ensuring the flow of metadata as the data flows through the environments.
Passive Metadata Store - The passive metadata store is currently implemented within the Lotus Notes Environment and is not directly linked to other metadata stores. While the strategy for developing this component are still being planned, it is recognised that there are current issues with structure and reusability which need to be addressed. Currently passive metadata is stored based on a flat structure where metadata for each output is stored. However this does not recognise the complex nature of collections where one input can be used in several outputs, an output can become an input for another collection, and inputs come in several forms (including survey collections and administrative data collections). There is also a recognised need for developing a more dynamic glossary which can be linked into multiple stores of metadata.
Classification Management - CARS (Classifications And Related Standards) is in use for classification management. The system in regular production is based on the relational model (currently implemented in Sybase) and an application for classification management (currently implemented with Centura). There is a plan to upgrade the platform to .NET/SQL and enhance for integration within the new SOA architecture.
Question/Variable Library - Current thinking in regard to this component is that a tool is required which manages variable definitions as well as question use. This is likely to take the form of a reference store where variables can be configured linking variable definitions, classifications, value domains, statistical objects and collection elements (e.g. question, questionnaire etc). At this stage planning is underway to determine the feasibility of combining the development of this component with the enhancement to our Classification Management component.
Business Logic - The Business Logic component encompasses the operational metadata for the statistical business process. This includes the processes used to change and transform data, the configuration which outlines the inputs and outputs, the business rules which set parameters for changing data, and the active metadata which is used to run the processes (e.g. variable identifiers, programme code). Business logic will also include quality metadata for a particular instance which defines the process that was run, rules applied, and audit trails including who, what and when, etc. Currently the storage of operational metadata is being developed with separate components such as workflow tools (K2) and transformation tools (CANCEIS, Logiplus etc). Investigations are currently underway to determine a way to integrate these components through generic storage schemas. A separate investigation is also planned looking at rule engine usage and the storage of business rules in a generic form.
Frame and Reference Stores - The business frame is in regular production and there is a link between this component and the data definition component (implemented as a reference from IDE to record in business frame). The similar component for persons & household frame is under development.
Documentation and Reports - Document One is a Lotus Notes application for the management of documentation. It is a central system for corporate document management but will be linked into the metadata environment through the information portal development.
Standards and Processes- Two aligned applications are currently in development to manage standards and processes (the Standards Framework and the gBPM Repository). The Standards Framework is a Lotus Notes application under development for the central storage of statistical and methodological standards. The tool stores standards using the generic Business Process Model as the framework so that users can access the relevant standards for any process they undertake. The application includes the development of standards within the store, through the use of versionning, notification systems and recording of consultation. The gBPM repository is a tool for storing our corporate process models including detailed process descriptions down to activities and tasks. Following the completion and population of the tools, further investigation will be conducted regarding the integration with other components within the metadata environment.
4.3 Standards and formats
When defining a concept based model for to be used as the overarching metadata framework, four concept based models were reviewed, specifically DDI, SDMX, MetaNet Reference Model v2.0 (MetaNet), and Neuchatel Terminology Model. In December 2006 a working group determined that no one model met all the needs of Statistics NZ. A blended model was recommended taking the best components of two models to create a single model - MetaNet and SDMX. Further analysis by the working group provided clarity for each model evaluated, including details of risks, impacts and gaps. They produced a second recommendation, this time a primary model - MetaNet with a secondary layer to treat any gaps - SDMX. Selection was based on simplicity, adaptability and integration and ability to support the business process.
During this stage the MetaNet model was analysed and mapped against internal metadata stores to assess the usability within Statistics New Zealand. As a result of this process a series of recommendations were presented for adapting the model to better meet our needs. Work has been progressing to adapt this model with a revised version due for completion early in 2008.
4.4 Version control and revisions
Currently the broad principles for versioning have been developed, however the application will need to be part of the individual developments. Returning to the Logical View of our data and metadata stores in section 2, it is intended that versionning will be maintained within the 'reference metadata layer'. There will also need to be versions of the data definitions within the 'structural metadata layer', however these will essentially be linking structures which identify the relevant versions of reference metadata.
4.5 Outsourcing versus in-house development
The strategy for developing components of the metadata environment is still being developed, however in general a principle of enhance first, then buy before build is applied. A summary of the state of current developments is as follows:
- Currently the solution for our data definition is being developed in house as part of the wider development of the Input Data Environment (IDE).
- The decision has also been made that many of the business logic components will be dependent on the tools used to run transformations and workflows.
- Planning is underway to investigate the feasibility of enhancing our current classification management tool (CARS) to incorporate functionality for question and variable management.
- Longer term we also aim to redevelop our current survey metadata tool (SIM), however the form of this development is as yet undecided.
- Additional tools have also been adopted for managing other metadata components, eg documentation and report management is being enhanced through the implementation of Document ONE on top of our current Lotus Notes functionality.