METIS

Quick links

GSBPM

Common Metadata Framework

Metadata Case Studies

GSIM

 All METIS pages (click arrow to expand)
Skip to end of metadata
Go to start of metadata

 

5.1 IT Architecture

Overview of Stats SA's IT Architecture


The Stats SA's IT environment, within which the ESDMF is developed, requires systems to adhere to the following architectural principles: 

  • Integration

The system must integrate with other organizational systems. API's will be built for various applications that need to connect to the ESDMF. However, most of the connection is expected to be at a data level. With the exception of SAS, the organisation uses relational databases. Integration at this level is attained using ODBC connection. SAS supports ODBC and in addition to that, has native support for various databases.

  • Interoperability

To ensure interoperability, the ESDMF uses Java as a development standard because of its platform independence. The development of the system as a web application also means that only a web browser is needed to access the application.

  • Modularity

The development of all the components of the ESDMF is based on the organisational requirement for building modular systems that allow ease of management and flexibility. The metadata management system is modularised according to the different categories of metadata.

  • Scalability

Stats SA's computer applications have to be built such that they can scale up to accommodate the inevitability of growth of an organization. Both the database designs and storage hardware for all the components of the ESDMF are developed to cater for such growth.

  • Flexibility

Applications must meet the diverse needs of Stats SA. These needs change with time, and new ones are also discovered. Development of flexible applications that may be easily changed or added to is vital. Part of the insistence on the use of object oriented programming was informed by the need for flexibility. This will minimize "spaghetti programming" associated with large software projects. 

 

IT Infrastructure Specification


The metadata management system is deployed in an IT infrastructure with a set of minimum specifications. These minimum specifications list the hardware items needed to run the system without going into details of the hardware items themselves. 

  • Operating System(s)

Desktops are in Microsoft Windows. The application is deployed in an Open Source operating system (Novell SuSe Linux).

  • Computer Network

The network architecture is based on open protocols and industry standards. It allows remote access to some employees. This supports both local area (LAN) and wide area (WAN) networks.

  • Computer Servers

The system is developed as a client-server application. This means that there is a need for powerful computer servers capable of handling intensive processing.

  • Storage

Because of the vastness of data to be generated and/or captured in the system, there is need for a well-managed storage system. The Storage Area Network (SAN) is the technology used at Stats SA to provide storage management.


 

A. Development Environment

 

 

 

Function

Make/Model

Operating System/ 
Database Engine

Comment

Application Server

HP BL45p 
Quad processor 
4 GB RAM 
2 x 72 GB HDD

SuSe Linux Ver. 10

Make/Model exceeds recommendation

Database Server

HP BL45p 
Quad processor 
16 GB RAM 
2 x 72 GB HDD

Oracle 10g or 
Sybase ASE and Sybase IQ 

Unix/Linux/Windows

Make/Model exceeds recommendation

Build Server

HP DL 320 
Dual processor 
2 GB RAM 
2 x 72 GB HDD

SuSe Linux Ver. 10

Make/Model exceeds recommendation

B. User Acceptance Test (UAT) Environment

 

 

 

Application Servers

2 x HP BL45p 
Quad processor 
8 GB Ram 
2 x 72 GB HDD

SuSe Linux Ver. 10

Make and model exceeds recommendation

Database Servers

2 x HP BL45p 
Quad processor 
32 GB Ram 
2 x 72 GB HDD

Oracle 10g or 
Sybase ASE and Sybase IQ 

Linux

 

C. Production Environment

 

 

 

Application Servers

2 x HP BL45p 
Quad processor 
8 GB Ram 
2 x 72 GB HDD

SuSe Linux Ver. 10

Make and model exceeds recommendation

Database Servers

2 x HP BL45p 
Quad processor 
32 GB Ram 
2 x 72 GB HDD

Oracle 10g or 
Sybase ASE and Sybase IQ 

Linux

 


Table 4: Hardware and software specifications for the ESDMF infrastructure 



   
Figure 10: Hardware and software specifications for the ESDMF infrastructure 

 

Components of Metadata Management Application


The application is web-based and developed in Java. Tomcat is used to implement Java Servlet API and HTTP functionality. The following are physical divisions of the application: 

  • User Interface (UI)

The user interfaces for all the metadata management system applications is web-based. This allows us to quickly deploy the tool to users in the organization. Client workstations only need to have a web-browser to access server based applications. The main supported web-browsers are Microsoft Internet Explorer and Firefox. 

  • Database

The application is supported by a relational database management system (RDBMS). Stats SA uses a variety of RDBMS engines. The RDBMS engine of choice for this project is Sybase 12.5.x. The project is currently using the open source RDBMS, MySQL. 

  • Business logic

The business logic controlling the interaction between the UI and the underlying database is coded using Java server side scripting. There is also business logic coded using stored procedures. This mostly performs housekeeping within the database. 

  • Application/Web Server

The application is served to the client via Tomcat, which processes Java code. Tomcat also handles HTTP calls from the web browser.

5.2 Metadata Management Tools

The developed metadata management application allows Stats SA staff members to perform a number of tasks in the metadata management process. The application groups these tasks into three modules or tools. 


  • Administration module

This module is used to manage users of the system, make changes to certain categories of captured metadata and other housekeeping activities. The administration module will also be used to administer other categories of metadata. 

  • Metadata Capturing and Editing module

The survey metadata will be continually captured by the originating components whenever an instance of a given survey is required. The metadata captured here is specific to the instance of a survey. This module allows the users to capture and edit survey metadata into the system. A special user role, the Approver, is given permissions to approve all the captured survey metadata, at which point it is exposed for use in the organisation. 

  • Query and Reporting Module

The metadata repository is query-able and therefore can be reported on. A metadata report is used as one of the ways to document survey data. This may happen in two situations. In the first situation, an internal user may want to view captured metadata. Producing a report of this metadata provides a structured way of viewing this metadata. Another way of viewing metadata is to use the "View Metadata" functionality of the Metadata Capturing and Editing module. 


  

Figure 11: Different modules of the tool 




  

Figure 12: Survey information page with navigation on the right hand side 

 

  

How Meta-Information System Integrates to Other Stats SA Applications


Although this feature has not been implemented yet, the metadata management system, like the rest of the ESDMF, is designed link with other statistical processing applications and data repositories. In the immediate future, the repository allows access via the following two methods: 

  • ODBC Connection by SAS


SAS can extract metadata from the repository for use as input to data processing and analysis activities of statistical production. At this stage, an Open Database Connectivity (ODBC) connection will provide SAS with ability to access the database. When our database is migrated from MySQL to Sybase, there will be an option to use SAS Access to Sybase. 

  • APIs


Application Programming Interfaces (APIs) will be developed for each application that needs to exchange information with the metadata system. At this initial stage of the project no application uses the metadata management system in this way and therefore no API has yet been developed. 

 

5.4 Version control and revisions

Metadata is expected to change due to revisions of concepts and their definitions, changes to classifications, business rules and user requirements. Sometimes more than one version of certain metadata used for the same purpose may exist at the same time. 
In the current Survey Metadata tool the "Edit" functionality of the application allows for the revision of captured Survey metadata. These revisions may only be performed by users with requisite permissions. For changes to be effected, revised/edited metadata must be approved by an assigned Approver. Survey metadata can only have a single version. This means that the Edit process serves to update the metadata repository. 
Version control will be introduced when metadata categories with metadata that can have more than one version are incrementally built into the system. 
It is important to note that version control will be built into every aspect of the ESDMF.

5.5 Outsourcing versus in-house development

The development of all of the ESDMF, including the metadata management system, is outsourced. Two issues influenced the decision to outsource. These were: the fact that Stats SA does not have enough skilled resources and the need to have views which would not be obscured by prior opinions of a statistical environment. This scenario requires that the outsourced resources invest a lot of time in understanding the organisation and analysing the requirements. 
It is important to note that we conducted two stages of outsourcing. In the first stage we outsourced the task of gathering the requirements for the whole of the ESDMF. These requirements contain details of each of the components of the ESDMF, including the metadata management system. The second stage is the development of the system. The two tasks were done by two different organisations. This separation of tasks was done in order to maintain the focus on requirements gathering. In this development model, the development team mainly verifies existing requirements.