Message-ID: <2089508828.8938.1394282326908.JavaMail.confluence@ece-vmapps> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_8937_1024098498.1394282326907" ------=_Part_8937_1024098498.1394282326907 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html
Statistics Canada is moving towards a SOA. A key enabler of SOA is the E= nterprise Application Integration Platform (EAIP) that allows the delivery = of solutions based on meta-data driven, reusable software components and st= andards. Most business segments will benefit from the common core business = services, standard integration platform, workflow and process orchestration= enabled by the EAIP. The platform also simplifies international sharing an= d co-development of applications and components.=20
Web services currently in use and under development by EAS are associate= d to information objects representing core business entities (e.g.= , questionnaires, classifications, tax data, business registry) that are classified into GSIM=E2=80=99s Concep= ts and Structures groups. This fits nicely with GSBPM as well= : services provide the inputs and outputs to GSBPM statistical processes. T= hey satisfy a basic set of SOA principles, i.e., they are loosely coupled (= consumer and service are insulated from each other), interoperable (consume= rs and services function across Java, .NET and SAS), and reusable (they are= used in multiple higher-level orchestrations and compositions). Work conti= nues to establish a complete framework, including discoverability (via a se= rvice registry and inventory) and governance.=20
At this point, Statistics Canada has a combination of services and silo-= based/point-to-point integration that can be described as a combination of = maturity levels 3 and 4 in terms of the Open Group Service Integration Matu= rity Model (OSIMM) maturity matrix (see Figure 1). During the transition ye= ars to a corporate-wide SOA, incremental changes are being made by applying= SOA adoption and governance by segment in which cross-silo services and co= nsumers coexist with point-to-point integration of systems and data. Early = adopters of SOA services include IBSP, SSPE and SNA.=20
Developing Data Service Centres (DSC) is a key initiative that fits into= Statistics Canada=E2=80=99s emerging SOA. The objective of the DSC is to m= anage statistical information as an asset =E2=80=93 to maximize its value b= y improving accessibility, utility, accuracy, security and transparency thr= ough the use of a centralized inventory of statistical data holdings, assoc= iated metadata and documentation. Key statistical files and associated stan= dard metadata (i.e., file name, type, description, creators, owners, etc) w= ill be registered and integrated into statistical processes via SOA. This i= ntegration will rely on a data access layer with common interfaces to acces= s statistical files without the user needing to know their location, format= and/or technology.
IMDB metadata discovery is performed via a Wiki-based solution and MetaW= eb. Each Wiki page provides the context of the information and all availabl= e links. These pages are programmatically generated based on templates deve= loped for the IMDB. MetaWeb is a JSP and Servlets-based application. = Data are collected and populated into the IMDB via a Microsoft Excel IMDB E= xtraction/Loader, an Oracle PL/SQL IMDB Loader and MetaWeb.=20
The starting point for the Common Tools project (See Section VII - Figur= e 2) is the Questionnaire Development Tool (QDT) used to enter specificatio= ns for social survey data collection instruments. All question metadata is = entered in the QDT, including questions and answer category text, interview= er instructions and conditions controlling flows. The Processing and = Specifications Tool (PST) then loads variable metadata such as variable nam= e, length and type. These are linked to question metadata already entered v= ia QDT so no re-entering of question or answer category text is required. F= inally, the Social Survey Processing Environment (SSPE) utilities use colle= ction layouts or schema to generate variable metadata to be loaded to the m= etadata repository. Two projected tools will complete the pictu= re: the Data Dictionary Tool (DDT), which will provide an interface to the = metadata repository for updating descriptive variable metadata, and the Der= ived Variable Tool (DVT), which will allow entry of specifications for deri= ved variables and will be used to produce detailed documentation for data u= sers. Within Statistics Canada=E2=80=99s SOA, the SSPE metadata repository = will export metadata in a canonical model to IMDB via an EAIP service under= development .=20
Solutions and tools are needed to support other types of metadata, speci= fically in the GSIM Structures and Production groups.= =20
 See Section IV-F for more information on SOA.
The following is a list of standards and formats and where they are bein= g used:=20
For web services that expose information assets, not only the underlying= data evolve (both in content and structure) but also the services that exp= ose it. As a result, (potentially) different versions of the same data will= be published and exchanged by (potentially) different versions of the same= service. No centralized versioning framework for data exists a= nd many areas have customized versioning schemes.=20
For example, the IMDB allows time travel by version and effective period= . A new version of a metadata item is created by copying an existing item, = making necessary changes and assigning the version number to the immediate = next version. Each version has an interval of validity (or effective period= ) associated to it. In other words, the lifespan of each version of a metad= ata item can be determined; conversely, the version of an item in effect at= a specific point in time can also be determined.=20
Service versions are identified using a three-digit versioning scheme: <= em>major.minor.patch. An increment in the major version requi= res some of its consumers to change their code. This happens because of a m= ajor change in the service contract, e.g., at least one operation has been = removed or an operation signature has changed in a way not foreseen by the = extension points defined in the Web Service Description Language (WSDL) fil= e. An increment in the minor version does not require chan= ges on the consumer applications. These are implementation changes and/or b= ackwards compatible changes to the interface, e.g., additions of operations= or extensions to data types in the WSDL file. An increment in the patc= h version is only used for bug fixes.=20
Service versions are designed with the goal of making them as forwar= d and backward compatible as possible. By making the interfac= e extensible, forward compatibility makes room for future, uncerta= in functional requirements. This approach is guided by knowledge and best p= ractices in SOA interface design, XML schema design and type theory (since = forward compatibility of service interfaces is essentially a special case o= f subtyping). Backward compatibilityis achieved in the usual way: = by ensuring that consumer applications developed for older versions of the = service can continue to work with the new version.=20
 Not every change to a metadata item generates a new version:= versioning of different entity types (surveys, classifications, questionna= ires, etc) are handled by a different set of business rules.
 Interface specification that describes the functionality and= data types of a web service.
External consultants were contracted for building DDI services and tools= , specifically to develop in-house DDI expertise and a set of core SOA web = services around the IMDB. These services expose IMDB content in a sta= ndard format compliant with the DDI XML specification to support applicatio= ns that focus on different types of metadata (e.g. surveys, variables, clas= sifications, concepts, etc.). Rather than integrating with the IMDB on a ca= se-by-case basis (point-to-point integration), the web services enable appl= ications to gain access to its content in a standard based format. This ini= tial effort defined and implemented a core metadata service that delivers I= MDB content encoded in DDI XML. A testing tool was also developed based on = a set of common use cases (see Figure 3) to validate the effectiveness of t= he approach. The service is used to support the Data Liberation Initiative = (DLI) and the Canadian Research Data Centre Network (CRDCN) Metadata projec= ts comprising 25 Research Data Centres (RDCs) from universities across the = country. The services were developed with a Java technology stack, includin= g some JPA components for database access that were reused in other in-hous= e services.=20
In addition, EAS developed a proof-of-concept client based on JSPs, Serv= lets and XSLTs to transform and render the DDI XML content returned by the = data service into human-readable HTML and other proprietary formats for int= eroperability with internal applications (e.g., SQL Server, SAS).==20
 See Section VII =E2=80=93 Figure 4 for the overall architect= ure of the IMDB DDI services.=20
= Statistics Canada=E2=80=99s emerging SOA is providing the next generation o= f software components to be shared across the Agency. Services are re= usable: they are designed to be combined with other services to create more= complex solutions. In addition, generalized systems are being = wrapped with a service interface to increase interoperability by shielding = users from older technologies and multiple platforms.
One of the mai= n challenges of this approach is that the same abstract information object = (e.g., questionnaire, classification, T1 tax data) can be physically implem= ented by different data producers (and even by different data consumers) in= different ways. This =E2=80=9Cimpedance mismatch=E2=80=9D has = historically been addressed by point-to-point data integration= , i.e., either the producer or the consumer has to conform to the othe= r=E2=80=99s data model. With SOA, canonical information models are created = to which both producers=E2=80=99 and consumers=E2=80=99 models will map= (SOA data integration). Canonical information models are ent= erprise-wide, common representations of information objects =E2=80=93 a sor= t of =E2=80=9Clingua franca=E2=80=9D for data exchange. These models enable= the organization to share and exchange enterprise information that is cons= istent, accurate and accessible. A mapping is a specification that describe= s how concepts from two different models relate to each other. At the physi= cal level, it actually specifies how data are translated between t= wo models. Canonical models are not intended to replace the dispar= ate set of heterogeneous physical models in use across the organization. Pr= escribing a single model would be impractical and counterproductive. Instea= d, both data consumers and producers can continue to use their own models (= relational database schemas, SAS files, etc.) within their own environments= and just map to the canonical only when data need to be exchanged.
= Within the SOA framework, canonical models are implemented as object models= that are serialized into XML Schema Definition (XSD) types. Data producer = and consumer schemas are mapped to the canonical object models used by serv= ices via schema mappings =E2=80=93 object-relational (ORMs) or object-XML (= OXMs). An inventory of canonical XSD types is currently being created; it c= an be referenced and reused by multiple service contracts (WSDL) in the EAI= P schema registry. These XSD types will be maintained by the service develo= pers within the governance framework set up by the EAIP.
When exchan= ging data from a source database to a consumer application, there are a num= ber of mappings involved along the way. First, data need to be extracted fr= om a relational or multidimensional database into the canonical object mode= l. This could be done automatically by object-relational mapping (ORM) tool= s, when the source schema is close in structure to the canonical, or it may= require customized SQL/MDX extraction queries. At the other end of t= he process, the canonical object model is serialized into XML/JSON to be sh= ipped to the client application via a web service interface. This mapping i= s done automatically by the EAIP tools. Finally, the client application nee= ds to map the XML/JSON produced by the service into its own object model vi= a an automatic de-serialization process. This process may include some XSLT= transformation when the canonical model is very different from the consume= r model and requires restructuring.
(a) &nb= sp; Example: Classification service
Classification= s were one of the first core business entities to use an EAIP service. The = Classification canonical model is based on GSIM and Neuchâtel. The fi= rst version contains the basic classes needed to support a classification s= tructure, namely Scheme, Levels and Items. Each Scheme consists of one or m= ore Levels (i.e., classes), each of which consists of one or more Items (i.= e., members). This model will be extended to include Versions and Variants = as necessary.
To expose IMDB data in this canonical model, the IMDB= =E2=80=99s ISO/IEC 11179 Metadata Registries entities need to be mapped to = GSIM/ Neuchâtel. The IMDB data model does not have Scheme, Level and = Item concepts (at least not with the usual GSIM/Neuchatel semantics), so a = mechanism identifies and extracts them from the IMDB physical model via SQL= mappings. At the conceptual level, this can be done by defining contai= nment mappings that are expressed as subtypes between both models.=
There are parent-child hierarchies defined on Classification Le= vel and Classification Item. The Level hierarchy is linear (each level has = at most one child) and the Item hierarchy is a tree (each item may have zer= o or any number of children). Both hierarchies are related by a constraint = that ensures that two items are in a parent-child relationship only if thei= r respective levels are in a parent-child relationship as well. This constr= aint ensures that both hierarchies remain consistent.=20
 Section VII - Figure 5 depicts the entire process of exchang= ing data from a source database to a consumer application.
 Section VII - Figure 6 shows the relationship between both m= odels (the canonical entities are those starting with the word =E2=80=9CCla= ssification=E2=80=9D). For the purpose of defining a mapping, Classificatio= n Schemes and Levels can be viewed as a subtype of Enumerated Value Domain = (EVD), whereas Classification Items are a subtype of Permissible Value (PV)= . Classification Schemes are a special type of EVD with no PV (i.e., Item) = directly associated to them =E2=80=93 Items are only associated to Levels. = All Items associated to a given Level have different Values.=20
 Section VII - Figures 7 and 8 show the actual physical= mapping for Classification Scheme, Item and Level. The mapping is defined = by UML notes (the boxes with the bended corners). The syntax of the mapping= is straightforward: the =E2=80=9C<<=E2=80=9D symbol indicates an ass= ignment from the attribute on the right to the attribute on the left. In ad= dition, there are constraints on code sets from the IMDB content model.
---- Daniel W. Gillman, 1999: Corporate Metadata Rep= ository (CMR) Model; U.S Bureau of Labor Statistics.