Message-ID: <1880185389.40585.1461960102714.JavaMail.confluence@ece-vmapps> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_40584_2115652224.1461960102713" ------=_Part_40584_2115652224.1461960102713 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html
|4. Statistical Metadata Systems (Stat= istics Canada)||Statistics Canada||6. Organizational and workplace= culture issues (Statistics Canada)|
Statistics Canada is moving towards a SOA. A key enabler of SOA is the E= nterprise Application Integration Platform (EAIP) that allows the delivery = of solutions based on meta-data driven, reusable software components and st= andards. Most business segments will benefit from the common core business = services, standard integration platform, workflow and process orchestration= enabled by the EAIP. The platform also simplifies international sharing an= d co-development of applications and components.=20
Web services currently in use and under development by EAS are associate= d to information objects representing core business entities (e.g.= , questionnaires, classifications, tax data, business registry) that are classified into GSIM=E2=80=99s Concep= ts and Structures groups. This fits nicely with GSBPM as well= : services provide the inputs and outputs to GSBPM statistical processes. T= hey satisfy a basic set of SOA principles, i.e., they are loosely coupled (= consumer and service are insulated from each other), interoperable (consume= rs and services function across Java, .NET and SAS), and reusable (they are= used in multiple higher-level orchestrations and compositions). Work conti= nues to establish a complete framework, including discoverability (via a se= rvice registry and inventory) and governance.=20
At this point, Statistics Canada has a combination of services and silo-= based/point-to-point integration that can be described as a combination of = maturity levels 3 and 4 in terms of the Open Group Service Integration Matu= rity Model (OSIMM) maturity matrix (see Figure 1). During the transition ye= ars to a corporate-wide SOA, incremental changes are being made by applying= SOA adoption and governance by segment in which cross-silo services and co= nsumers coexist with point-to-point integration of systems and data. Early = adopters of SOA services include IBSP, SSPE and SNA.=20
Developing Data Service Centres (DSC) is a key initiative that fits into= Statistics Canada=E2=80=99s emerging SOA. The objective of the DSC is to m= anage statistical information as an asset =E2=80=93 to maximize its value b= y improving accessibility, utility, accuracy, security and transparency thr= ough the use of a centralized inventory of statistical data holdings, assoc= iated metadata and documentation. Key statistical files and associated stan= dard metadata (i.e., file name, type, description, creators, owners, etc) w= ill be registered and integrated into statistical processes via SOA. This i= ntegration will rely on a data access layer with common interfaces to acces= s statistical files without the user needing to know their location, format= and/or technology.
IMDB metadata discovery is performed via a Wiki-based solution and MetaW= eb. Each Wiki page provides the context of the information and all availabl= e links. These pages are programmatically generated based on templates deve= loped for the IMDB. MetaWeb is a JSP and Servlets-based application. = Data are collected and populated into the IMDB via a Microsoft Excel IMDB E= xtraction/Loader, an Oracle PL/SQL IMDB Loader and MetaWeb.=20
The starting point for the Common Tools project (See Section VII - Figur= e 2) is the Questionnaire Development Tool (QDT) used to enter specificatio= ns for social survey data collection instruments. All question metadata is = entered in the QDT, including questions and answer category text, interview= er instructions and conditions controlling flows. The Processing and = Specifications Tool (PST) then loads variable metadata such as variable nam= e, length and type. These are linked to question metadata already entered v= ia QDT so no re-entering of question or answer category text is required. F= inally, the Social Survey Processing Environment (SSPE) utilities use colle= ction layouts or schema to generate variable metadata to be loaded to the m= etadata repository. Two projected tools will complete the pictu= re: the Data Dictionary Tool (DDT), which will provide an interface to the = metadata repository for updating descriptive variable metadata, and the Der= ived Variable Tool (DVT), which will allow entry of specifications for deri= ved variables and will be used to produce detailed documentation for data u= sers. Within Statistics Canada=E2=80=99s SOA, the SSPE metadata repository = will export metadata in a canonical model to IMDB via an EAIP service under= development .=20
Solutions and tools are needed to support other types of metadata, speci= fically in the GSIM Structures and Production groups.= =20
 See Section IV-F for more information on SOA.
The following is a list of standards and formats and where they are bein= g used:=20
For web services that expose information assets, not only the underlying= data evolve (both in content and structure) but also the services that exp= ose it. As a result, (potentially) different versions of the same data will= be published and exchanged by (potentially) different versions of the same= service. No centralized versioning framework for data exists a= nd many areas have customized versioning schemes.=20
For example, the IMDB allows time travel by version and effective period= . A new version of a metadata item is created by copying an existing item, = making necessary changes and assigning the version number to the immediate = next version. Each version has an interval of validity (or effective period= ) associated to it. In other words, the lifespan of each version of a metad= ata item can be determined; conversely, the version of an item in effect at= a specific point in time can also be determined.=20
Service versions are identified using a three-digit versioning scheme: <= em>major.minor.patch. An increment in the major version requi= res some of its consumers to change their code. This happens because of a m= ajor change in the service contract, e.g., at least one operation has been = removed or an operation signature has changed in a way not foreseen by the = extension points defined in the Web Service Description Language (WSDL) fil= e. An increment in the minor version does not require chan= ges on the consumer applications. These are implementation changes and/or b= ackwards compatible changes to the interface, e.g., additions of operations= or extensions to data types in the WSDL file. An increment in the patc= h version is only used for bug fixes.=20
Service versions are designed with the goal of making them as forwar= d and backward compatible as possible. By making the interfac= e extensible, forward compatibility makes room for future, uncerta= in functional requirements. This approach is guided by knowledge and best p= ractices in SOA interface design, XML schema design and type theory (since = forward compatibility of service interfaces is essentially a special case o= f subtyping). Backward compatibilityis achieved in the usual way: = by ensuring that consumer applications developed for older versions of the = service can continue to work with the new version.=20
 Not every change to a metadata item generates a new version:= versioning of different entity types (surveys, classifications, questionna= ires, etc) are handled by a different set of business rules.
 Interface specification that describes the functionality and= data types of a web service.
External consultants were contracted for building DDI services and tools= , specifically to develop in-house DDI expertise and a set of core SOA web = services around the IMDB. These services expose IMDB content in a sta= ndard format compliant with the DDI XML specification to support applicatio= ns that focus on different types of metadata (e.g. surveys, variables, clas= sifications, concepts, etc.). Rather than integrating with the IMDB on a ca= se-by-case basis (point-to-point integration), the web services enable appl= ications to gain access to its content in a standard based format. This ini= tial effort defined and implemented a core metadata service that delivers I= MDB content encoded in DDI XML. A testing tool was also developed based on = a set of common use cases (see Figure 3) to validate the effectiveness of t= he approach. The service is used to support the Data Liberation Initiative = (DLI) and the Canadian Research Data Centre Network (CRDCN) Metadata projec= ts comprising 25 Research Data Centres (RDCs) from universities across the = country. The services were developed with a Java technology stack, includin= g some JPA components for database access that were reused in other in-hous= e services.=20
In addition, EAS developed a proof-of-concept client based on JSPs, Serv= lets and XSLTs to transform and render the DDI XML content returned by the = data service into human-readable HTML and other proprietary formats for int= eroperability with internal applications (e.g., SQL Server, SAS).==20
 See Section VII =E2=80=93 Figure 4 for the overall architect= ure of the IMDB DDI services.=20
Statistics Canada=E2=80=99s emerging SOA is providing the next generatio= n of software components to be shared across the Agency. Services are= reusable: they are designed to be combined with other services to create m= ore complex solutions. In addition, generalized systems are bei= ng wrapped with a service interface to increase interoperability by shieldi= ng users from older technologies and multiple platforms.=20
One of the main challenges of this approach is that the same abstract in= formation object (e.g., questionnaire, classification, T1 tax data) can be = physically implemented by different data producers (and even by different d= ata consumers) in different ways. This =E2=80=9Cimpedance misma= tch=E2=80=9D has historically been addressed by point-to-point data integration, i.e., either the producer or the consumer has to co= nform to the other=E2=80=99s data model. With SOA, canonical information mo= dels are created to which both producers=E2=80=99 and consumers=E2=80=99 mo= dels will map (SOA data integration). Canonical informati= on models are enterprise-wide, common representations of information object= s =E2=80=93 a sort of =E2=80=9Clingua franca=E2=80=9D for data exchange. Th= ese models enable the organization to share and exchange enterprise informa= tion that is consistent, accurate and accessible. A mapping is a specificat= ion that describes how concepts from two different models relate to each ot= her. At the physical level, it actually specifies how data are translat= ed between two models. Canonical models are not intended to replac= e the disparate set of heterogeneous physical models in use across the= organization. Prescribing a single model would be impractical and counterp= roductive. Instead, both data consumers and producers can continue to use t= heir own models (relational database schemas, SAS files, etc.) within their= own environments and just map to the canonical only when data need to be e= xchanged.=20
Within the SOA framework, canonical models are implemented as object mod= els that are serialized into XML Schema Definition (XSD) types. Data produc= er and consumer schemas are mapped to the canonical object models used by s= ervices via schema mappings =E2=80=93 object-relational (ORMs) or object-XM= L (OXMs). An inventory of canonical XSD types is currently being created; i= t can be referenced and reused by multiple service contracts (WSDL) in the = EAIP schema registry. These XSD types will be maintained by the service dev= elopers within the governance framework set up by the EAIP.=20
When exchanging data from a source database to a consumer application, t= here are a number of mappings involved along the way. First, data need to b= e extracted from a relational or multidimensional database into the canonic= al object model. This could be done automatically by object-relational mapp= ing (ORM) tools, when the source schema is close in structure to the canoni= cal, or it may require customized SQL/MDX extraction queries. At the = other end of the process, the canonical object model is serialized into XML= /JSON to be shipped to the client application via a web service interface. = This mapping is done automatically by the EAIP tools. Finally, the client a= pplication needs to map the XML/JSON produced by the service into its own o= bject model via an automatic de-serialization process. This process may inc= lude some XSLT transformation when the canonical model is very different fr= om the consumer model and requires restructuring.=20
(a) Example: Classification se= rvice=20
Classifications were one of the first core business entities to use an E= AIP service. The Classification canonical model is based on GSIM and Neuch&= acirc;tel. The first version contains the basic classes needed to support a= classification structure, namely Scheme, Levels and Items. Each Scheme con= sists of one or more Levels (i.e., classes), each of which consists of one = or more Items (i.e., members). This model will be extended to include Versi= ons and Variants as necessary.=20
To expose IMDB data in this canonical model, the IMDB=E2=80=99s ISO/IEC = 11179 Metadata Registries entities need to be mapped to GSIM/ Neuchât= el. The IMDB data model does not have Scheme, Level and Item concepts (at l= east not with the usual GSIM/Neuchatel semantics), so a mechanism identifie= s and extracts them from the IMDB physical model via SQL mappings. At the c= onceptual level, this can be done by defining containment mappings= that are expressed as subtypes between both models.=20
There are parent-child hierarchies defined on Classification Level and C= lassification Item. The Level hierarchy is linear (each level has at most o= ne child) and the Item hierarchy is a tree (each item may have zero or any = number of children). Both hierarchies are related by a constraint that ensu= res that two items are in a parent-child relationship only if their respect= ive levels are in a parent-child relationship as well. This constraint ensu= res that both hierarchies remain consistent.=20
 Section VII - Figure 5 depicts the entire process of exchang= ing data from a source database to a consumer application.
 Section VII - Figure 6 shows the relationship between both m= odels (the canonical entities are those starting with the word =E2=80=9CCla= ssification=E2=80=9D). For the purpose of defining a mapping, Classificatio= n Schemes and Levels can be viewed as a subtype of Enumerated Value Domain = (EVD), whereas Classification Items are a subtype of Permissible Value (PV)= . Classification Schemes are a special type of EVD with no PV (i.e., Item) = directly associated to them =E2=80=93 Items are only associated to Levels. = All Items associated to a given Level have different Values.=20
 Section VII - Figures 7 and 8 show the actual physical= mapping for Classification Scheme, Item and Level. The mapping is defined = by UML notes (the boxes with the bended corners). The syntax of the mapping= is straightforward: the =E2=80=9C<<=E2=80=9D symbol indicates an ass= ignment from the attribute on the right to the attribute on the left. In ad= dition, there are constraints on code sets from the IMDB content model.
---- Daniel W. Gillman, 19= 99: Corporate Metadata Repository (CMR) Model; U.S Bureau of Labor Statisti= cs.