Message-ID: <602881614.2784.1464266505996.JavaMail.confluence@ece-vmapps> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_2783_1056464430.1464266505996" ------=_Part_2783_1056464430.1464266505996 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html
|4. Statistical Metadata Systems (Aust= ralian Bureau of Statistics)||= Australian Bureau of Statistics||6. Organizational and workplace= culture issues (Australian Bureau of Statistics)|
ABS Enterprise Architecture harnesses The Open Group Architecture Framew=
ork (TOGAF) which recognises domains of business, data, applications and te=
chnology architecture. In describing "IT Architecture" below, ref=
erence is primarily made to applications and technology architecture. Conne=
ctions with data architecture are also explored.
Unless otherwise noted, de= scriptions in this section refer back to the main metadata systems as descr= ibed in Section 4.1.
The newer metadata facilities are based on a Service Oriente= d Architecture. The older facilities tend to have monolithic coupling of th= e repository, the business logic and business rules (which are built into t= he application rather than embedded in services) and the User Interface.&nb= sp;
Ne= vertheless, selected information about the collections defined in CMS is &q= uot;projected" from CMS into an Oracle database. While only a small su= bset of the total information held in CMS, this comprises all of the core &= quot;structural" registration details about collections, cycles and pr= ofiles. Basic (read only) "collection metadata services" based on= this content on Oracle are then provided for statistical processing applic= ations to access.
A similar approach applies in the case of classifications= except a much greater percentage of the total information held in regard t= o classifications is both "structural" and available on Oracle.Apart from CMS and ClaMS (which include some descriptive content held o= nly in IBM's Lotus Notes product) the other metadata holdings are all based= in Oracle. There is extensive use of Oracle Stored Procedures for reusable= services/functions and some use of true web services.
In summary, mor= e recently developed facilities based on recent architectura= l standards within the ABS, tend to consist of
Wh= ile SOA offers a lot of opportunities and potential, it also comes with a l= ot of new complexities compared with earlier approaches. It requires new un= derstandings and a new mindset from those developers who are being asked to= take up, and interact with, the available services as well as requiring th= e same from the business analysts and programmers within the team responsib= le for providing the metadata repositories and services. It can make the ov= erall environment More complicated in some ways (eg services are calling se= rvices that call services etc and then somewhere at a low level a service i= s updated and everything needs to be configured appropriately to allow prop= er testing of that change). Implementing SOA in environments that include a= lot of "legacy" processing systems that are not enabled for the = new architectural directions is particularly challenging
During 2008 it bec= ame clearer that a significant aspect of the work on establishing an update= d and coherent metadata framework for the ABS amounts to defining Enterpris= e Information Architecture (EIA) in the context of a statistical organisati= on. Without a clear and coherent EIA, there is a risk each service, or each= bundle of services, is delivered with its own explicit or implicit informa= tion model. The ABS could have gone from having a dozen or so environments = with subtle and not so subtle differences in their underpinning information= concepts and structures to having an array of services based on a plethora= of different, and unreconciled, information models. On the positive side, = SOA can help make EIA practical and consistent. Rather than having the same= objects and relationships specified in the EIA implemented, and extended, = differently across a number of different environments, a single consistent = but flexible bundle of services could be used within each environment. SOA = and EIA are complementary rather than alternative directions.
The IMT strategy addresses the require= ment for SOA and EIA to work together. It enables common information constr= ucts, defined according to schemas aligned with relevant standards such as = SDMX and DDI, to be used consistently via service layers. These service lay= ers enforce core business rules. They also mean application developers can = work with information objects at a business level without needing to unders= tand, and code based on, the full details of the SDMX and DDI information m= odels. The integration with Statistical Workflow Management is also an impo= rtant element of the "to be" IT Architecture.
Statistical processing applications interact with metadata via services =
where possible. As described in BHM, however, many ABS processing applications and third party vendor product=
s are not yet amenable to this approach. Where this approach is used curren=
tly it most often involves the application "reading" relevant con=
tent from the metadata repository rather than writing back new or updated r=
The IMT strategy seeks to fully, and consistently, realis= e this approach. Some existing key applications (and repositories) may need= to be "wrapped" so they can interact with the MRR on a CRUDS bas= is. ("S" refers to harnessing the MRR Search capabilities to supp= ort discovery, selection of relevant content to Read etc.). Other legacy ap= plications may need to be decommissioned, through delivery of services and = interfaces that take their place, and content from a number of legacy repos= itories will need to be migrated to the (logically) centralised repositorie= s associated with the MRR.
In the meantime, as described in the introductio= n to 2.2, there are cases where metadata from the Corporate Metadata Reposi= tory needs to be restructured and/or repackaged relatively manually to make= it suitable for use in particular processing systems.
Standards and formats currently in use for major metadata repositories a=
re described in Section 4.1.
Under&= nbsp;IMT, the primary standards are SDMX and DDI, interoperating with other= "purpose specific" standards such as
Regardless of which standard's inform= ation model is being harnessed, content for interchange (eg to be read by a= pplications) is typically represented in XML. In order to reduce the need t= o exchange large XML structures, where only a small proportion of the total= information may be needed for a particular application, the XML used to de= scribe an object can refer to sub components and related objects "by r= eference" rather than including all this information "in line&quo= t;. The calling application can then resolve the specific references (if an= y) which are relevant to its particular needs =E2=80=93 once again typicall= y resulting in smaller packages of XML than would be the case if a comprehe= nsive set of information related to the component was included "in lin= e".
While XML is used for interchange, current repositories tend to st= ore content using RDBMS (relational database) technology. XML stores and gr= aph databases are technologies being considered for future to augment RDBMS= approaches.
Expression in RDF format (which builds on simple XML represent= ation) is seen as an important additional capability in future. This is see= n as one advantage of harnessing standards =E2=80=93 in many cases the comm= unity for a standard has already developed a recommended expression in RDF.=
The approach to versioning has been a major point of debate within the A=
BS previously. As the systems have grown up at different times, their appro=
ach to version control tends to differ.
In general, where there was not see= n to be a compelling case for supporting formal versioning past development= s tended to avoid that "complexity". Collections, for example, ar= e not currently versioned. Many aspects of change over time for a collectio= n, however, can be handled through descriptions of the "cycle" or= the "profile" rather than edits to the main collection document = itself.
Under IMT, however, versioning is seen as a perquisite for act= ive use and reuse of metadata. The structural definition of a metadata obje= ct at the time it was referenced must remain accessible even if a new versi= on of that object is defined subsequently. This is consistent with the appr= oach taken in standards such as SDMX and DDI. Both of these standards have = a concept of objects being able to be in "draft" mode in which ca= se they should not be referenced for production purposes. The standards do = not require versioning of drafts but it is likely that the MRR will support= versioning of drafts.
Past debates over when a change is so fundamental th= at it should result in definition of a new object, rather than a new versio= n of an existing object, remain to be addressed in the IMT context.
Past de= bates about changes that are so "trivial" (eg fixing a spelling m= istake) that they shouldn't result in version change also remain to be fina= lised in the IMT context.
An example of problems from lack of appropriate s= upport for versioning in current infrastructure is classification system. I= t could benefit, for example, from the Neuchatel approach to modelling classifications, = versions and variants as well as the IMT approach to not overwrit= ing previous content.
Within the current system each registered object is e= ssentially an independent entity (ie a "new classification"). It = is possible to designate one classification as being "based on" a= nother but this can mean many different things
Wh= ere revisions are to be made (or new versions created) as much impact analy= sis as possible is undertaken. This includes, for example, understanding wh= at other metadata objects and processes refer to the object that is about t= o be revised (or versioned) and whether the revision will have any inapprop= riate impact (whether the new version should be referenced instead). The la= ck of fully "joined up" registries (including knowing exactly wha= t metadata is referred to in each processing system) makes impact assessmen= ts difficult and only partially reliable in some cases.
The MRR and Statist= ical Workflow Management working together in future should greatly assist i= n this regard. While existing metadata objects and business processes will = be able to continue referencing the present version of an object that is pr= oposed to be updated/versioned, understanding these existing uses and the r= equirements associated with them
Th= e preceding example illustrates the flow on impacts that versioning can hav= e within a complex and actively used metadata registration system. If the e= xisting metadata objects that refer to the object that just got "versi= oned" now need to refer to the newer version of that object, all those= existing metadata objects themselves now need to get "versioned"= (because they're pointing to a different version of the first object). All= the objects that refer to the objects that referred to the original object= now need to get assessed and potentially versioned themselves, and so on w= ith a ripple effect potentially sweeping across the whole registry originat= ing from just one object being versioned. (While standards such as DDI-L su= pport the option of "late binding", they recommend against it for= many purposes. Under "late binding" a reference to another objec= t is always deemed to refer to the most recent version of that object =E2= =80=93 rather than, eg, to the specific version of the object that was curr= ent at the time the reference to it was made. "Late binding" redu= ces precision and leaves open the possibility that the object referred to w= ill subsequently "evolve" in ways that contradict the initial bas= is for referring to it.)
The IMT approach supports user decision= points (which may be manual or automated) in regard to the "ripple ef= fect" of versioning. It also provides the greatest systematic support = for managing initial and "consequential" versioning processes.
While external expert consultants were engaged from time to time, the ex=
isting metadata systems described in Section 4.1 were all designed and deve=
loped "in-house". Open source and other starting points for the D=
ata Element Registry were seriously considered.
ABS (and the Australian Gov= ernment) ICT Policy and Strategy is placing a greater emphasis on COTS (Com= mercial Off The Shelf) & GOTS (Government Off The Shelf) based. "B= espoke" software developments (whether through in house development or= commissioning of external developers) to deliver all, or part, of a soluti= on is seen as a last resort if other options are demonstrated not to be via= ble.
From an ABS perspective, however, it remains typically the case that i= n house staff
En= suring solutions are consistent with Enterprise Architecture, including Ser= vice Oriented Architecture and support for relevant open standards, promote= s effective integration (with minimum need to re-engineer other systems), r= educes risks of "vendor lock" and facilitates end of life decommi= ssioning (and possible replacement).
The approach to IMT aligns w= ith these ICT strategies and policies. This includes
Wh= ile not all developments related to IMT will necessarily deliver, or harnes= s, open source components, open source is recognised as one important parad= igm for sharing solutions and sustaining their evolution over time.
In addi= tion to seeking to collaborate with other agencies, ABS is drawing on input= from expert consultants to assist developers understand and apply informat= ion standards such as SDMX and DDI and to assist in designing key infrastru= cture such as the MRR.
Development of REEM (Remote Execution Environment for Microdata)= a> is an example of the ABS working which a vendor that shares our int= erest in harnessing standards such as SDMX and DDI-L. Elements of the REEM = solution include
ABS implementation, as ABS.Stat, of the OECD.Stat platform is an example o= f harnessing an existing standards aligned shared solution and entering int= o a collaborative partnership (with OECD, IMF, Statistics New Zealand and I= stat) to maintain and evolve that solution in future.
At present, many systems (as described in Section 4.1) used by the ABS a=
re built in a "monolithic" fashion (combining the repository, the=
business logic and the user interface) and are highly customised for the A=
BS environment (eg they rely on both IBM Lotus Notes and Oracle databases w=
hich are configured in a particular way). CMS, ClaMS and the Dataset Regist=
ry are all in this category. While there is no in principle objection to sh=
aring these components with other agencies, doing so in practice would be v=
ery complex both for the ABS and for the other agency. In any case, as thes=
e facilities were developed more than a decade ago and predate relevant app=
lication architecture and metadata standards, it is not anticipated any oth=
er agency would be interested in making use of these facilities in their cu=
Newer facilities such as the Data Element Registry (DER) and Qu= estionnaire Development Tool (QDT) are architected in a manner that would m= ake it easier to share them. Both of these facilities are designed so that = a user interface interacts with the Oracle database via a "Business Se= rvices Layer" (BSL). In addition to full sharing, partial sharing coul= d be supported (eg the ABS providing the repository and BSL, with the other= agency choosing to develop its own user interface.)
Sharing could be envis= aged in at least two forms. One would be the ABS packaging either the full = facility or some layers from the facility in a form which allowed another a= gency to establish a "stand alone" instance. A second form would = be extending the BSL (and probably repositioning the repository) so that au= thorised and authenticated interactions from outside the ABS became possibl= e in regard to the current instance of the facility. One or more external a= gencies might then act as registration authorities in their own right. This= could have many benefits in terms of sharing, and shared development of, m= etadata content but would be likely to require more thought in terms of ong= oing governance and support arrangements.
A third possibility, which physic= ally "cloned" the repository (ie the first option) but supported = a unified logical perspective across the original repository and the clone(= s) (ie elements of the second option) would also require significant additi= onal work.
While these facilities are deliberately more compartmentalised a= nd self contained in design, they were not developed from the ground up wit= h the intent of sharing beyond the ABS. Some generalisation of ABS specific= aspects (eg linkages of both the DER and QDT to collection information fro= m the CMS) would still be required.
The software the ABS has available shou= ld be able to be made available to other statistical agencies free of charg= e in its current form. If the ABS needed to modify the software and/or prov= ide consultancy support in order for that software to be made operational o= utside the ABS then that work may need to be cost recovered. Alternatively,= and preferably, it may be possible to agree a collaborative arrangement su= ch that the existing facility is extended and generalised in a manner that = benefits both the ABS and the other agency.
The ABS seeks to avoid becoming= a "software house". Any sharing arrangements would be in the con= text of either one off provision or, preferably, some form of partnership. = A relationship along the lines of the ABS acting as a provider to one or mo= re "customers" does not fit with current ABS aspirations and dire= ctions.
A number of other ABS applications (eg ABS Autocoder and REEM) are = also listed in the Sharing Advisory Board's inventory of softwa= re available for sharing.
Short of sharing software itself, the ABS is = very happy to exchange details of data models, application architectures, u= ser experiences etc with other statistical agencies.
New developments such = as the MRR are being designed to be more readily sharable, in whole or part= .
While the ABS has relatively few components currently that other agencies= may be interested in sharing, the ABS is placing a very high priority on e= stablishing collaborative partnerships with other agencies to develop new c= omponents, or to extend existing modern standards aligned components that a= lready exist outside the ABS.