Contents

 

I.               The Problem Statement               3

II.               Common Statistical Production Architecture               5

A.               Scope of Architecture               6

B.               Service Oriented Architecture               7

C.               Using CSPA               9

D.               Impact on organizations               10

III.               Business Architecture               11

A.               Describing Statistical Production               11

B.               Business Architecture Principles               13

IV.               Information Architecture               15

A.               Reference frameworks and their use               16

B.               CSPA implementation specifications               17

C.               Information Architecture Principles:               18

V.               Application Architecture               19

A.               Statistical Service Definitions, Specifications and Implementation Descriptions               20

B.               Architecture Patterns               22

C.               Non-Functional Requirements               23

D.               Implementing Protocols in a Statistical Service               25

E.               Application Design Principles               30

VI.               Technology Architecture               31

A.               Communication Platform               32

VII.               Roles               33

VIII.               Enablers               36

A.               Catalogues               36

B.               Governance               38

Annex 1: Templates               40

Statistical Service Definition               40

Statistical Service Specification               41

Statistical Service Implementation Description               47

Annex 2: Describing “Statistical Production”               48

Annex 3: Glossary               57


I.               The Problem Statement

 

1.               Many statistical organizations are facing common challenges. There are two major threats to the continued efficient and effective supply of core statistics that come from within statistical organizations. These are:  


1) rigid processes and methods and
2) inflexible ageing technology environment  


2.               Over the years, they have built up their organizational structure, production process, enabling statistical infrastructure and technology, through many iterations and technology changes. The cost of maintaining this business model and the associated asset bases (process, statistical, technology) is becoming insurmountable and the model of delivery is not sustainable.  

 

3.               Historically, statistical organizations have developed their own business processes and IT-systems for producing statistical products. Therefore, although the products and the processes conceptually are very similar, the individual solutions are not (as represented in Figure 1).   Every technical solution was built for a very specific purpose with little regard for ability to share information with other adjacent applications in statistical cycle and with limited ability to handle similar but slightly different processes and tasks. This can be referred to as 'accidental architecture' as the process and solutions were not designed from a holistic top down view.    

 

 

Figure 1: Accidental Architectures

 

4.               Often it is difficult to replace even one of the components supporting statistical production.   Use of these processes, methods and an inflexible and aging technology environment mean that statistical organizations find it difficult to produce and share between systems data and information aligned to modern standards (for example, Data Documentation Initiative (DDI) and Statistical Data and Metadata eXchange (SDMX)). Process and methodology changes are time consuming and expensive resulting in an inflexible, unresponsive statistical organization.  
 

5.               Many statistical organizations are modernizing and transforming their organizations using enterprise architecture to underpin their vision and change strategy. An   enterprise   architecture aims to create an environment which can change and support business goals. It shows what the business needs are, where the organization wants to be, and ensures that the IT strategy aligns with this. Enterprise architecture helps to remove silos, improves collaboration across an organization and ensures that the technology is aligned to the business needs.   This work will   enable   them to standardize their organizations (see Figure 2).

 

 

Figure 2:  The result of standardization within an organization

 

6.               Statistical organizations have attempted many times over the years to share their processes, methodologies and solutions, as it has long been believed that there is value in this. The mechanism for sharing has historically meant an organization taking a copy of a component and integrating it into their environment. Examples include CANCEIS (CANadian Census Edit and Imputation System) and Banff (an editing and imputation system for business surveys). However, most cases of sharing have involved significant work to integrate the component into a different processing and technology environment.    

 

7.               Figure 3 attempts to explain why the difficulty in sharing or reuse occurs. The figure assumes that the two statistical organizations develop all their business capability and supporting components in a standard way (i.e. they have an Enterprise Architecture as shown in Figure 2). The first line of the figure shows that Canadian components have a zig zag shape and the second suggests that Sweden has components with slanted edges. If Sweden needs a new component, ideally they need a component with a slanted edge. It can be seen in the third row that while a component from Canada might support the same process and incorporate robust statistical methodologies, it will not be simple to integrate it into the Swedish environment.  

 

Figure 3: Why sharing /reuse is hard now

 

 

II.               Common Statistical Production Architecture  

 

8.                 If the official statistical industry had greater alignment at the business, information and application levels, then sharing would be easier. A number of frameworks focusing on specific areas have already been developed. Notably these include the Generic Statistical Business Process Model (GSBPM) and Generic Statistical Information Model (GSIM).  

 

9.               The Common Statistical Production Architecture (CSPA) will bring together these existing frameworks in addition to new frameworks about Statistical Services to create an agreed top level description of the 'system' of producing statistics which is in alignment with the modernization initiative. The High Level Group for the Modernization of Statistical Production and Services (HLG) has put priority on the development of CSPA during 2013.  

 

10.               The CSPA is the industry architecture for the official statistics industry.   An industry architecture is a set of agreed common principles and standards designed to promote greater interoperability within and between the different stakeholders that make up an "industry", where an industry is defined as a set of organizations with similar inputs, processes, outputs and goals (in this case official statistics). CSPA focuses on relating the strategic directions of the HLG to shared principles, practices and guidelines for defining, developing and deploying Statistical Services.  

 

11.               The CSPA provides a template architecture for official statistics. It describes:  

      What the official statistical industry wants to achieve – This is the goals and vision (or future state).

      How the industry can achieve this – This is the principles that guide decisions on strategic development and how statistics are produced.

      What the industry will have to do - The industry will need to adopt an architecture which will require them to comply with the CSPA.


12.               The CSPA   gives users an understanding of the different statistical production elements (i.e. processes, information, applications) that make up a statistical organization and how those elements relate to each other.   It also provides a common vocabulary with which to discuss implementations, often with the aim to stress commonality.   It is   an approach to enabling the vision and strategy of the statistical industry,   by providing a clear, cohesive, and achievable picture of what's required to get there.  

 

13.               The goal of the CSPA is to provide statistical organizations with a standard framework to:  

      facilitate the process of modernization

      provide guidance for transformation within statistical organizations  

      provide a basis for flexible information systems to accomplish their mission and to respond to new challenges   and opportunities

      facilitate the reuse / sharing of solutions and services and the standardization of processes, and thus a reduction in costs of production

      provide guidance for building reliable and high quality services to be shared and reused in a distributed environment (within and across statistical organizations)

      enable international collaboration initiatives for building common infrastructures and services

      foster alignment with existing industry standards such as the Generic Statistical Business Process Model (GSBPM) and the Generic Statistical Information Model (GSIM), and

      encourage interoperability of systems and processes

 

A.    Scope of Architecture

 

14.               CSPA is a statistical industry reference architecture for statistical production.   The scope of CSPA is statistical production across the processes defined by the GSBPM (i.e. it does not characterize a full enterprise architecture for a statistical organization).  

 

15.               CSPA is descriptive, rather than prescriptive, its focus is to support the facilitation, sharing and reuse of Statistical Services both across and within statistical organizations.   CSPA is not a static reference architecture; it is designed to evolve further over time.    

 

16.               CSPA is designed for use by investment decision makers in developed statistical organizations.   While developing organizations are not excluded, a reasonable level of Enterprise Architecture maturity and a modern technical environment is required for implementation.   There are options for making Statistical Services developed using CSPA available to developing statistical organizations, these will be outlined in future versions of the document.  

 

17.               An important concept in architecture is the "separation of concerns".   For that reason, the architecture is separated into a number of "layers". These "layers" are:  

 

      Business Architecture   which defines what the industry does and how it is done (statistics in our case),

      Information Architecture   which builds understanding of the information, its flows and uses across the industry, and how that information is managed,

      Application Architecture   which describes the set of practices used to select, define or design software components and their relationships,   and

      Technology Architecture   which   describes the infrastructure technology underlying (supporting) the business and application layers.

 

18.               CSPA includes:  

 

      Motivations for constructing and using the CSPA through the description of requirements

      Sufficient business and information architecture descriptions and principles as are necessary for CSPA's scope

      Application architecture and associated principles for the delivery of Statistical Services

      Technology architecture and principles - limited to the delivery of Statistical Services

 

19.                It should be noted that CSPA does not include enterprise, business, application and technology architecture descriptions which are not directly aligned to the CSPA scope, nor does it include technology environments of statistical organizations.

 

20.               CSPA is positioned to align with and leverage the statistical standards GSBPM V4.0 and GSIM V1.0.   During the development of CSPA the key dependencies with the implementation levels of both GSBPM and GSIM have been identified.   Where required for the development of the statistical architecture and services CSPA will provide input to the relevant review or implementation projects.  

 

B.    Service Oriented Architecture

 

21.               The value of the architecture is that it enables collaboration in developing and using Statistical Services which will allow statistical organizations to create flexible business processes and systems for statistical production more easily.  

 

22.               The architecture is based on an architectural style called Service Oriented   Architecture (SOA). This style focuses on Services (or Statistical Services in this case).   A service is a representation of a real world business activity with a specified outcome. It is self-contained and can be reused by a number of business processes (either within or across statistical organizations).    

 

23.               A Statistical Service will perform a task in the statistical process. Statistical Services will be at different levels of granularity. An atomic or fine grained Statistical Service encapsulates a small piece of functionality.   An atomic service may, for example, support the application of a particular methodological option or a methodological step within a GSBPM sub process. Coarse grained or aggregate Statistical Services will encapsulate a larger piece of functionality, for example, a whole GSBPM sub process. These may be composed of a number of atomic services.    

 

24.               The granularity of Statistical Services should be based on a balanced consideration between the efficiency of the Statistical Service and the flexibility required for sharing purposes - larger Statistical Services will usually enable greater efficiency, whereas a finer granularity will allow greater flexibility for supporting sharing and reuse. It is envisaged that the Statistical Services will be shared or reused at one level below a GSBPM sub process.   Services, regardless of their granularity, must meet the architectural requirements and be aligned with the CSPA principles.

 

25.               By adopting this common reference architecture, it   will be easier for each organization to standardize and combine the components of statistical production, regardless of where the Statistical Services   are built. As shown in Figure 4, Sweden could reuse a Statistical Service from Canada because they both use the same "shape".  

 

26.               The CSPA will   facilitate   the sharing and reuse of Statistical Services   both across and within statistical   organizations.   The Statistical Services that are shared or reused across statistical organizations might be new Statistical Services that are built to comply with CSPA or legacy/existing tools wrapped to be Statistical Services which comply with the architecture. This is shown in Figure 4 by the shapes inside the building blocks. It also provides a starting point for concerted developments of   statistical infrastructure and shared investment across statistical organizations.   The CSPA is sometimes referred to as "plug and play" architecture. The idea is that replacing Statistical Services should be as easy as pulling out a component and plugging another one in.  

 

Figure 4: Making sharing and reuse easier

 

C.    Using CSPA

 

27.               There are a number of ways in which CSPA may be used by statistical organizations. These are outlined in the sections below.  

 

Strategic Planning

 

28.               If Statistical organizations are   creating and using an industry strategy ("Industry Architecture") and this leads to projects/work programs. An example is the Proof of Concept for CSPA. They could also integrate/streamline their investment strategies.   Where a statistical organization plans to contribute to and or use CSPA in the future, they should modify and integrate their road maps to align with the CSPA framework. Each statistical organization   needs to define a strategy to move from its current state to the common future state defined   in their roadmap.  

 

Development within statistical organizations

 

29.               When a statistical organization identifies the need for a new Statistical Service, there are a number of options they can pursue. In order to fill the gap, the statistical organization   can look for Statistical Services that are available in the collaborative space (that is, in the Global Artefact Catalogue).  

 

30.               If an appropriate Statistical Service is not found in the CSPA Global Artefact Catalogue, the statistical organization can either:  

      start designing and developing a new Statistical Service internally, or

      modify an existing Statistical Service to meet new functional and/or non-functional requirements.


31.               This could be done independently or in collaboration with other statistical organizations. This development work should be done in alignment with CSPA to ensure that the new Statistical Services can be added to the CSPA Global Artefact Catalogue for sharing and reuse by other statistical organizations.  

 

Vendors

 

32.               A statistical organization may choose to have a vendor develop a Statistical Service. A vendor in this case means either a third party commercial vendor or a statistical organization that is selling a product. In the case of a new Statistical Service, the statistical organization should request that it is built in accordance with CSPA. When the product is already existing, statistical organizations should verify together if the product meets relevant community requirements. If it does not, statistical organizations can try to influence the vendor to meet requirements. If yes, statistical organizations ask the vendor to integrate the product in the CSPA.

 

 

      

 

D.    Impact on organizations

 

33.               There will be a number of required changes for an organization implementing the CSPA.   Adoption of the CSPA   will require investment with a view to generating the long term benefits identified in the value proposition.  

 

34.               The main changes required at the organization level can be grouped in layers:

 

A.      People Changes

      Openness to international cooperation

      Building trust in international partners (especially as they may be building services for your organization)

      Sense of compromise (acceptance that nothing will be optimized for local use, rather it will be optimized for international or corporate use)

      Development of new functional roles to support use of the architecture (e.g. Assembler, Builder)
 

B.      Process Changes

      Adoption of an industry wide perspective

      Different approach to business process management and design

      Commitment to service (contract between different functional units)
 

C.     Technology Changes

      setting up an adequate middleware infrastructure (messaging, repositories)

      uplift of physical network capabilities (bandwidth, etc.)

      management of security features

 

35.               In addition to the costs and the targeted benefits, an organization adopting the CSPA   will benefit from:

 

       a sustainable and efficient strategy to cope with legacy and phasing out of existing applications

       a cycle that enables cost saving from reduction in production costs to be reinvested in further infrastructure transformation  

       a positive image both on national and international/industry scene

 

 


III.               Business Architecture

 

36.               The Statistical Network [1] is currently undertaking a project on Business Architecture. The CSPA will utilize the outputs of this work

 

37.               The definition of Business Architecture being used by the Statistical Network Business Architecture project is given below.  
 

"Business Architecture covers all the activities undertaken by a statistical organization, including those undertaken to conceptualize, design, build and maintain information and application assets used in the production of statistical outputs.   Business Architecture drives the Information, Application and Technology architectures for a statistical organization."  

 

38.               CSPA focuses on architectural considerations associated with statistical production as bounded by GSBPM. Business concerns such as

 

      ensuring that the corporate work program for a statistical organization best addresses the needs of its external stakeholders, or

      recruiting, retaining and developing staff with relevant skills

 

are not central to CSPA.  Such concerns are, however, very important considerations in an organization specific business architecture which sets out to describe the enterprise as a whole.

 

39.               Organizations that have formally defined business architecture can reference CSPA when describing aspects of their business architecture which are fundamentally in common with other producers of official statistics. 

 

A.    Describing Statistical Production

 

40.               To enable efficient and consistent documentation and understanding of CSPA three related concepts have been adopted which are relevant for all readers in relation to “Statistical Production”.  These are business function, business process and business service.

 

41.               The terms and definitions used in CSPA for these concepts are drawn from The Open Group Architectural Framework (TOGAF).  TOGAF is widely known for defining architectural frameworks.  The terminology and modelling of the Production group within GSIM aligns with TOGAF (and thus the ideas presented here are consistent with GSIM).

 

42.               Following is a brief overview of these three concepts [2] , a more detailed discussion of these concepts and statistical production is contained in Annex 2.

 

 

Business Function

 

43.               GSIM defines Business Function as something an enterprise does, or needs to do, in order to achieve its objectives.  This represents a simpler expression of the definition used in TOGAF.

 

44.               When identifying business functions , the emphasis is on an enterprise level (‘whole of business”) perspective, recognizing that different parts of the business may have different detailed requirements in regard to a particular function.  At the level of the business function , there is no implementation detail.  

 

Business Process

 

45.               A Business Process is a series of logically related activities or tasks performed together to produce a defined set of results. Key aspects include:

 

      a process consists of a series of steps (activities/tasks)

      there is sequencing (or “flow”) between steps

      a business process is undertaken for a particular purpose

      what is represented (for the sake of simplicity and clarity) as a single step in a high level depiction of a process might – when viewed in more detail - comprise a lower level (sub)process consisting of multiple steps

 

Business Service

 

46.               A Business Services is who – or what – will undertake the work associated with each function.  TOGAF defines a business service as supporting delivery of a business capability ( business function ) through an explicitly defined interface .   Business services should be scoped to support flexible sequencing and configuration of business functions within different business processes .

 

47.               A central aim for CSPA is to enable more efficient and flexible support for statistical production as described by the GSBPM.  Future versions of CSPA will provide more guidance on how CSPA can be applied when designing, managing and performing statistical business processes.  However, the initial focus in the development of CSPA has been to clearly and consistently define (both conceptually and in practice) the Statistical Services which support statistical production.  The aim is that equivalent business functions (such as imputation) within many different statistical business processes [3] will be able to reuse (or share) the same Statistical Service in the implementation of the business service.

 

48.               Key reasons for the initial focus on Statistical Services include:

 

      Common Statistical Services provide the greatest opportunity for realizing savings through collaborative development, sharing and reuse

      While common statistical standards (or reference frameworks) have been agreed for business processes (GSBPM) and for statistical information (GSIM), there is not yet a common framework for Statistical Services, CSPA is filling this gap.

      Once a common approach to the definition and implementation of Statistical Services is agreed, this will support defining business processes which make use of the common Statistical Services in a manner that supports the delivery of a specific business service, within a given context and purpose (business process) aligned to the business function (GSBPM).

 

49.               Future versions of CSPA will elaborate this area and will draw on the GSBPM revision currently underway scheduled for completion by the end of 2013.

 

B.    Business Architecture Principles

 

50.               Principles are high level decisions or guidelines that influence the way processes and systems are to be designed, built and governed. Principles are derived from the mission and values of the organization, taking into account the opportunities and threats that the organization faces.   In the CSPA, principles are used to express the high level design decisions that will shape the future statistical processes and systems.

 

Decision Principles

 

51.               Decision principles are guidelines to help decide on strategic development. They provide a basis for decision making and informing how the mission is fulfilled. They help enable sound investment decisions. The following decision principles have been selected for the CSPA. They represent the outcomes sought by the High Level Group and key elements of the United Nations Principles for Official Statistics .  These principles provide a basis for decision making through an enterprise and inform how that organization sets about fulfilling its mission. How we decide on strategic development.

 

52.               A number of principles which are common to most organization’s business architecture (whether formally defined or not) are being identified through other initiatives such as work within the Statistical Network.  The following business architecture decision principles are being jointly developed via the CSPA project and the Statistical Network business architecture project:

 

53.                 Principle: Increase the value of our statistical assets

 

Statement: Adding value to the agencies statistical assets (either directly or indirectly) through improved accessibility & clarity, relevance, coherence & comparability, timeliness & punctuality, accuracy & reliability and interpretability throughout the output driven statistical process.  

 

54.               Principle: Maintain community trust and information security

 

Statement: Operations at all levels encourage trust in the statistical organization. This includes the community's trust and confidence in the statistical organization's decision making and practices and in their ability to preserve the integrity, quality, security and confidentiality of information provided.  

 

55.               Principle: Capitalize on and influence national and international developments

 

Statement: Collaborate nationally and internationally to leverage and influence statistical and technological developments which support the development of shared statistical services.

 

56.               Principle: Sustain and grow the business

 

Statement: Statistical organisation's investment and planning are focused on long term sustainability and growth, both in terms of the organization's role and position within its own community, as well as internationally.  

 

57.               Principle: Deliver enterprise-wide benefits

 

Statement: Statistical organizations design and implement new or     improved statistical business processes in a way that maximizes their value at an enterprise level.  

 

58.                 Principle: Maximize the use of existing data/Minimize respondent load

 

Statement: Data for statistical purposes may be drawn from all types of sources, e.g. statistical surveys or administrative records. Statistical organizations are to choose the source with regard to quality, timeliness, costs and the burden on respondents. The respondent burden is proportionate to the needs of the users and statistical authorities monitor the respondent burden, and aim to reduce this over time.  

 

59.               Principle: Take a holistic and integrated view

 

Statement: Data, people skills and knowledge, methods, processes, standards and frameworks, systems and other resources need to be consistent, reusable and interoperable across multiple business lines within a statistical organization.  

 

Design Principles

 

60.               CSPA aims to support organizations in realizing these decision principles in practice. Specific business architecture design principles, which are consistent with the above, have been identified for CSPA. As with the decision principles, the design principles are being developed in conjunction   with the Statistical Network Business Architecture project.

 

61.               Principle: Consider all capability elements

 

Statement: Projects should address all capability elements e.g. methods, standards, processes, skills, and IT, to ensure the end result is well-integrated, measurable, and operationally effective.

 

62.               Principle: Create new for re-use and easy assembly

 

Statement: All capabilities and resources are designed for standardization and re-use, and can be assembled, reassembled and easily modified to accommodate changing user demands.

 

63.               Principle: Metadata driven processes

 

Statement: The design, composition, operation and management of statistical business process, including all input and output interactions, will be driven via standard metadata and automated to the maximum degree possible.

 

64.               Principle: Open standards

 

Statement: Statistical organizations will aim to adopt open, industry recognised, and international standards where available. Statistical industry standards such as the Generic Statistics Business Process Model (GSBPM) and the Generic Statistical Information Model (GSIM) are examples of the standards to be used.

 

65.               Principle: Re-use existing before creating new

 

Statement: Existing data, people skills and knowledge, methods, processes, standards and frameworks, systems and other resources   are leveraged and re-used wherever possible.

 

66.               The Information and Application architecture design aspects of CSPA are directed by the CSPA Business Architecture design principles.

 

 

IV.               Information Architecture

 

67.               Forrester Research provides the following natural language definition of Information Architecture [4]  

 

A framework providing a structured description of an enterprise's information assets — including structured data and unstructured or semi structured content — and the relationship of those assets to business processes, business management, and IT systems.

 

68.               In other words, Information Architecture connects information assets to the business processes that need them and the IT systems that use and manage them. 

 

69.               It includes relating the coherent and consistent definition of information assets at an enterprise level to the information needs of specific business processes and IT systems in practice.  Forrester characterizes this as Information Architecture connecting definition of information on “macro” (enterprise level) and “micro” (practical use for specific business and IT purposes) levels.

 

70.               As an industry architecture, the Information Architecture set out by CSPA must provide an agreed and actionable (rather than purely conceptual) connection between

 

      the common information frameworks and implementation standards agreed within the industry (e.g. GSIM, SDMX, DDI), and

      the practical business goals and needs to be supported under CSPA, such as the ability to share and reuse Statistical Services

 

71.               It must support the needs of:

 

      business leaders, planners and process designers who are seeking to apply the Business Architecture from CSPA and who need to understand the connection between processes and information at a business level

      application architects and developers who are seeking to apply the Application Architecture from CSPA and who need to understand how Statistical Services interact with information

 

A.    Reference frameworks and their use

 

72.               The Information Architecture will identify common reference frameworks to be used for aligning communication and high level (conceptual) designs.

 

      GSBPM will be used as a common reference when recording information in regard to business processes

      GSIM will be used as a common reference when defining the information input to, and output from, business processes

      A common reference framework for recording information in regard to the definition of Statistical Services is being developed as part of CSPA.

      A common reference framework to use when describing statistical methods is a gap at this stage. [5]

 

73.               The completed Information Architecture will not only identify the reference frameworks which apply but also provide guidance on how they are applied, in combination, within CSPA.

 

 

 

 

B.    CSPA implementation specifications

 

74.               A major barrier to effective collaboration within and between statistical organizations has been the lack of common terminology.  Using GSIM as a common language will increase the ability to compare information within and between statistical organizations.  It allows all processes that lead to the production of statistics to be described in one integrated information model.

 

75.               Although GSIM can be used independently, it has been designed to work in conjunction with the GSBPM. It supports GSBPM and covers the whole statistical process. It is assumed in this document that an organization either uses GSBPM or uses another business process model, which can be mapped to GSBPM.

 

76.               In order for interoperability and reuse to be supported in practice when applying CSPA, the industry needs to do more than align conceptual designs using common frameworks. While GSIM is a conceptual framework for describing Statistical Information, when it comes to describing information objects in the real world we need to describe them in terms of standards for representing those objects physically (i.e. in practice) in a manner which is consistent with GSIM.

 

77.               It is necessary to specify how a conceptual design is to be translated into an implementation which is consistent and readily sharable in practice.

 

78.               This “standard” means of operationalizing conceptual designs can be referred to as an implementation specification (in this case, GSIM Implementation). To this end, the following has been agreed:

 

      No firm recommendation has yet been made on implementation specification for business processes. [6]  

      Depending on what information is being represented in practice, DDI and SDMX are expected to provide the primary basis for CSPA implementation specification in regard to statistical information (e.g. data and metadata).

      An implementation specification for Statistical Services is being developed as part of CSPA.

 

79.               There is a need to do more than simply refer to relevant existing standards such as SDMX and DDI.  The CSPA implementation specification for Statistical Services will specify:

 

      whether SDMX, DDI or a custom schema should be used for representing a particular GSIM information object, and

      exactly how the chosen schema will be applied for the particular purpose. In many instances there are multiple technically compliant means of achieving the same business purpose, the implementation specification will specify which should be used.

 

80.               Implementation specifications mean CSPA is prescriptive in regard to some practical details.  While it would be simpler to align with CSPA if it was less prescriptive, the practical value from alignment would be much less.  It is often the case that two developments which have a “common conceptual basis”, but were implemented using completely unrelated approaches, are difficult and expensive to make interoperable and/or sharable (if it is possible at all).

 

81.               In addition, an organization which has already implemented a different standard, or a local specification, can “map” their existing approach to the relevant implementation specification – they are not required to “rebuild” from first principles.
 

82.               CSPA implementation specifications specify approaches which will support maximum interoperability/sharability on a cost effective basis.  If, in a particular case, an organization decides full compliance will be impossible due to operational constraints then compliance to the extent practical will, in most cases, still realize significant benefits.  In other words, while CSPA implementation specifications set the bar reasonably (but not unreasonably!) high, it is recognized not all implementations may be able to achieve it fully in practice.

 

C.    Information Architecture Principles:
 

83.               A number of principles which are common to most organization's information architecture (whether formally defined or not) have been agreed. These are outlined below.  

 

84.                 Principle:   Manage information as an asset
 

Statement:   Information is an asset that has value to the organization and must be managed accordingly.

 

85.                 Principle:   Manage the information lifecycle


Statement:   All information has a lifecycle and should be managed to provide reliable identification, versioning and all information should be managed independently and beyond the scope of a single service.  

 

86.                 Principle:   Protect information appropriately
 

Statement:   All personal, confidential and classified data should be protected and the data should be treated accordingly.  

 

 

 

 

87.                 Principle:   Use agreed models and standards
 

Statement:   All information used as inputs and outputs to Statistical Services should be described using a common, business-oriented, reference model. A single standard should be used to define the encoding of each type of information.  

 

88.                 Principle:   Capture information as early as possible
 

Statement:   Information should be captured in a standard structured manner at the earliest possible point in the statistical business process to ensure it can be used by all subsequent services.  

 

89.                 Principle:   Describe to ensure reuse


Statement:   All information should be described in a manner that ensures information is reusable between services. Reuse is intended to reduce duplication, additional human intervention and reduce errors.  


90.                 Principle:   Ensure there is an authoritative source
 

Statement:   Information consumed and produced by services should be sourced and updated from a single authoritative source. Information should be consistent across all relevant services.  

 

91.                 Principle:   Preserve information input into Statistical Services
 

Statement:   Information input into services must be preserved in the service output to ensure no information loss.  

 

92.                 Principle:   Described by metadata
 

Statement:   All information consumed and produced by services must be described by sufficient metadata.  

 

V.               Application Architecture

 

93.               Application Architecture is defined in CSPA as:

 

“the set of practices used to select, define or design software components and their relationships”

 

94.               The CSPA Application Architecture is based on an architectural style called Service Oriented   Architecture (SOA). This style focuses on Services (or Statistical Services in this case).   A service is a representation of a real world business activity with a specified outcome. It is self-contained and can be reused by a number of business processes (either within or across statistical organizations).  

 

95.               Statistical Services are defined and have invokable interfaces that are called to perform business processes. SOA   emphasizes the importance of loose coupling. Interactions between Statistical Services are independent, that is, they do not talk directly to each other. Organizations will need a technology solution to support communication between Statistical Services. This infrastructure (for example a communication platform) will not affect the interfaces. It should be noted that SOA is not the same as Web Services, although they are often used in SOA.

 

A.    Statistical Service Definitions, Specifications and Implementation Descriptions

 

96.               The level of reusability promised using an SOA is a direct result of the standardized definition of the service capabilities. CSPA has three layers to the service interfaces. These layers are described in the following paragraphs and Figure 5.

 

Statistical Service Definition

 

97.               The Statistical Service Definition is at a conceptual level. In this document, the capabilities of a Statistical Service are described in terms of the GSBPM sub process that it maps to, the business function that it performs and GSIM information objects which are the inputs and outputs.

 

98.               A template and an example of a Statistical Service Definition can be found in Annex1.

 

Statistical Service Specification

 

99.               The Statistical Service Specification is at a logical level. In this layer, the capabilities of a Statistical Service are fleshed out into business functions that have GSIM implementation level objects as inputs and outputs. This document also includes metrics, methodologies and non-functional requirements.

 

100.               A template and an example of a Statistical Service Specification can be found in Annex1.

 

Statistical Service Implementation Description

 

101.               The Statistical Service Implementation Description is at an implementation level. In this layer, the functions of the Statistical Service are refined into detailed operations whose inputs and outputs are GSIM implementation level objects.

 

102.               This layer fully defines the service contract, including communications protocols, by means of the Service Implementation Description. It includes a precise description of all dependencies to the underlying infrastructure and any relevant information about the configuration of the application being wrapped, when applicable.

 

103.               A template of a Statistical Service Implementation Description can be found in Annex1.

 

 

Figure 5: Service interfaces at different levels of abstraction

 

104.               In general, there will be one Service Specification corresponding to a Service Definition, to ensure that standard data exchange can occur. At the implementation level, services may have different implementations (software dependencies, protocols) reflecting the environment of the supplying organization. Each implementation must implement the data format as specified in the Service Specification.

 

105.               There are a number of roles identified in CSPA (see Section VII) who are involved in the definition, specification, and implementation of Statistical Services. Figure 6 illustrates the relationship between these levels.

 

 

Figure 6: Linkages between Statistical Service Definition, Specification, and Implementation

 

B.    Architecture Patterns  

 

106.               In simple terms, architecture patterns describe a re-usable solution to certain classes of problems. They explain how, when and why Statistical Services can be used, as well as the impact of using them in that way. They help a Service Assembler to identify combinations that have been used   successfully   in the past.   Although not identified in this document, there are also anti-patterns which are examples of what should not be done.  

 

107.               The benefits of using architecture patterns can be described by using the analogy of an expert chess player. To play chess, you must learn   the rules and the principles (for example the value of different pieces). However, to improve and become a really good player, you need to learn the patterns used by more experienced players and apply them to your game. In the same way, you can use the principles and non functional requirements of the CSPA, but to get the maximum benefit, Service Assemblers should learn the architecture patterns.  

 

108.                 The CSPA will incorporate both the Request/Response and Publish/Subscribe patterns.

 

Request/Response pattern

 

109.               The Request/Response pattern for activating services implies a rather fixed routing of messages between services. The integration infrastructural platform implementing the process “orchestrates” the routing and executing of services. This pattern leads to less flexibility and tighter coupling between services than the Publish/Subscribe pattern described below.

 

110.               An example of how this pattern could be applied in relation to collection is   each questionnaire is stored in an entity service. The entity service exposes the operation to get the questionnaire through a service call. The indicators are computed and stored and made available using an entity service call.

 

111.               The Request/Response pattern can be used if:

       A functional style and sequential flow is required  

       It is known precisely which service interface should be called  

 

Publish/Subscribe pattern

 

112.               The Publish/Subscribe pattern could be considered an asynchronous version of the Request/Response pattern.

 

113.               As an event is generated by an event source and is sent to the processing middleware. It is not known which functionality is triggered next. In the Request/Response pattern, the concrete service call would have been made, but this is not the case for the Publish/Subscribe pattern. For this reason, the Publish/Subscribe pattern talks about "decoupling" rather than loose coupling.

 

114.               An example of how this pattern could be applied in relation to collection is when   each questionnaire completed   publishes an event that is available for   subscribers downstream.   Early indicators can be produced by processing collection events straight through aggregation.

 

115.               The Publish/Subscribe pattern can be used if:

 

       All recipients that may be interested in the event should be notified  

       It is not exactly known which and how many recipients are interested in the event  

       It is not known how recipients respond to this event  

       Different recipients respond differently to the same event  

       Only one-way communication from the sender to the recipient is possible

 

C.    Non-Functional Requirements

 

116.               In the context of CSPA, a non functional requirement is a requirement that relates to the operation of a system. While functional requirements define what the services does (for example, error localization), the non functional requirements describe a performance characteristic of a system (for example, authorization of who can access the resources and functions of the service). This is, non functional requirements determine how a service behaves rather than what it should do.

 

117.               Non-Functional Requirements are important to be captured in the design of the services. They have a significant influence on the software architecture of a service. The implementation of a Statistical Service provides some functional value when assembled into a value chain within an organization.  The non-functional requirements of a Statistical Service address other concerns or behaviors of the service such as performance, security, process metrics and error handling.  This section provides some guidance on these concerns.

 

 

Multilingual Support

 

118.               It is necessary to provide multilingual support to increment the usability and the capacity to share Statistical Services. All services must be documented at least in English in addition to the local language(s) of the organization developing the Statistical Service. It is highly recommended that organizations that have made translations of the documentation of a Statistical Services in additional languages make them available to the community.

 

Security

 

119.               For the purpose of this document, the security concern relates to controls that are put in place to mitigate the risk that a Statistical Service or the data it controls is misused.  This section provides some basic guidance on some of these controls. However, in general it is strongly advised that each Statistical Service implementation complete a Risk Assessment and document a Risk Mitigation Plan for high and extreme risks identified in the assessment.

 

Authentication and Authorization

120.               For users interacting with a Statistical Service, the process of identifying who they are (authentication) and working out what resources and functions of the service they can use (authorization) will need to be managed as part of the service.  Given that the security architecture and services is a local organization concern, our goal is to avoid excess complexity in these interactions.

 

121.               Authentication should be accomplished through interaction with the communication platform’s authentication function.

 

122.               Authorization controls are the concern of the service implementation.  Authorization will be controlled by either administrative interfaces on the service or via a GUI-based client for the specific service.

 

123.               After the release of v1.0, future iterations of CSPA will look at options to 'design in' support for single sign on.  Consideration will also be given in future CSPA versions to how the architecture can describe a common approach for the communication platform to pass authorization information along with other service context information.

 

Data at rest

124.               Data at rest is of particular interest when a Statistical Service needs to defer state (see discussion in “Service Statelessness” in section V D).   Under this circumstance, the security (e.g. encryption requirements or access control) of the data are entirely the responsibility of the Statistical Service.  Where a Statistical Service already has a functional dependency on underlying technologies or platforms, it would be reasonable to make use of security functions available in those technologies.

 

Data in transit

125.               Security of data in transit (e.g. contained within a message flow as part of a service invocation) will be considered in future iterations of the CSPA (post version 1.0).

 

Data Sensitivity

126.               Sensitivity of statistical data varies amongst organizations and at this stage the architecture does not attempt to converge on a standard definition or treatment.

 

Machine to machine certification

127.               Guidance for this will come in a future iteration of the CSPA (post version 1.0).  Organization specific implementations based on assembly time infrastructure can assure security for service communication (for example, use of a VLAN).

 

Performance

 

128.               No specific guidance is provided on the performance characteristics. However, they should be declared in the Statistical Service Implementation Description and it is recommended that examples of performance level are included.

 

Process Metrics

 

129.               A Statistical Service will generally capture metrics related to the function that it performs.  To all intents and purposes, these process metrics are treated by the Statistical Service as just one of its outputs and should be reflected as such in the Statistical Service Specification.

 

Error Handling

 

130.               Error handling, in this case, relates to situations where the service fails.  Error handling is left to the communication platform to handle as required.  Generally there will be protocol specific requirements for flagging errors.  The error codes and their meanings need to be documented in the Statistical Service Implementation Description.

 

D.    Implementing Protocols in a Statistical Service

 

Service Statelessness

 

131.               The principles of SOA describe that there are often cases where Statistical Service need to keep some form of state. The specific SOA principle on Service Statelessness says that ‘Services minimize resource consumption by deferring the management of state information when necessary’.

 

132.               For the CSPA, this means that there are certain situations when the Statistical Service needs to be able to keep information within the Service until a later time. Service Statelessness means that the Statistical Service can keep information within it. However, to provide sufficient performance, the Statistical Service might need to implement the capability to store away the information in some form, so that it can be read back at a later point when the information is needed.

 

133.               When designing and building a Statistical Service, the capability of deferring state information is important in two specific situations:

 

      When the Statistical Service should support an environment where the main communication pattern is Event Messaging.

      When the Statistical Service involves human-interaction and can therefore be considered long-running.

 

134.               The capability of state deferring is not needed in the case where a Statistical Service is created that does not involve human-interaction and when the decision is made not to support Event Messaging.

 

135.               Statistical Services with the capability of deferring state need to provide an endpoint that the communication platform can use to request information about the deferred state. If the Statistical Service is invoked but it lacks some or all of the required information to perform the service invocation, the Statistical Service should handle this using error handling.

 

Event Messaging and Request Response

 

136.               Communication to and from a Statistical Service could be done using several communication patterns. The two main patterns of communication that CSPA supports are Request Response and Event Messaging. These patterns differ mainly in when the information is being transferred.

 

137.               In the Request Response communication pattern, information is requested whenever a Statistical Service needs it. The information need could be triggered by either a human-interaction with a Statistical Service or when an automated Statistical Service is called from a communication platform.

 

138.               In Event Messaging , information is transferred as it is created. This means a Statistical Service does not need to request information as information is provided to all relevant Statistical Services at the time of creation.

 

139.               The decision as to whether a Statistical Service is built to support both of these communication patterns is related to the environments in which the Statistical Service will work in. Supporting both communication patterns is preferred as this allows the Statistical Service to function both in Statistical Organizations using request-response as the main pattern as well as Statistical Organizations using Event Messaging as the main pattern.

 

140.               From the point of view of a Statistical Service, there is no difference between the Request-Response and Event Messaging communication patterns, as the provision of information is done in the same manner. When a communication platform supplies the information that the Statistical Service needs, it does so by calling the relevant endpoint provided by the Statistical Service.

 

141.               To support a Request Response communication pattern, the Statistical Service must provide some endpoint where an information request can be made. One possibility is that the Statistical Service is required to have an endpoint that can be supplied with parameters. These parameters would describe what information should be part of the response. For example, the parameters could include being able to specify a timeframe or a context.

 

142.               To support Event Messaging , the Statistical Service needs to be able to push information out from the Statistical Service when it is created. This could be done using a configurable endpoint or message queue provided by the communication platform. Configuring the endpoint should be done by the Service Assembler.

 

How to invoke a service

 

143.               A protocol is the technical implementation of a communication mechanism. It is used to invoke the Statistical Services deployed in a CSPA implementation. 

 

144.               A Statistical Service Implementation Description must specify one or more protocols. These protocols are associated with the following aspects of a Statistical Service:

 

1.      making the Statistical Service reachable as an endpoint for invocation

2.      accessing data that is declared to be passed by reference in the Statistical Service Specification

 

145.               In the following, we provide a list of protocols that are accepted within Service implementations conforming to the CSPA specification. Protocols tagged as “recommended” should be considered in the first place because they represent established industry standards and as such they are likely to be supported in most organizations. Protocols that are accepted yet not recommended are listed for supporting legacy requirements of some organizations.

 

Protocols for invoking service endpoints

 

146.               The protocols for invoking service endpoints which are recommended by the CSPA are:

 

      SOAP Web Services – Service exposes a WSDL interface and is addressed by a http URI

      REST Web Services - Service exposes a REST interface and is addressed by a http URI

 

147.               There are also a number of other protocols which are acceptable. These are:

 

      Microsoft Message Queue - Service is a MSMQ consumer

      Java Messaging Service - Service is a JMS consumer

      File-based invocation – the service is “invoked” when a file is placed at a known location which results in an OS-level trigger to the service; alternatively, the service can poll the location for arrival of “message” files and treat them as service invocations

      Command line interface - Service is invoked by specifying a command line to be executed on operating system runtime accessible by the platform.

 

148.               The CSPA recognized that there are other future protocols. However, these will require further exploration:

 

      Possible use of “stream control transfer protocol” (sctp), an efficient guaranteed delivery, connectionless, lightweight transfer protocol

 

149.               In some instances, existing tools support database access. If the database is involved in transfer (and not merely as a local state storage for the service), we recommend that the database access be mediated through the http: protocol access above.

 

150.               In general, the use of an “out of band” data transfer mechanism should be avoided wherever possible, and used only in circumstances involving the need to transfer large volumes of data. Its addition adds increased coupling between the architecture and services, so its use must be managed carefully.

 

Protocols for passing data by reference

 

151.               A data reference must guarantee, in the context of a specific protocol, that it can uniquely identify a dataset, file, etc.  Passing data by reference requires the communication platform to support the resolution of an identifier through the specific protocol it refers to.

 

      FTP: Data is identified by a string specifying a FTP server URL, a full path and, optionally, server authentication information

 

      SCTP: …

 

      HTTP: Data is obtained from the reply of a HTTP GET request. This option does not include the case where the HTTP request is related to a SOAP request message (i.e. the GET body message does not incorporate a SOAP message). This option includes the case where the response is generated by a REST web service.

 

      SMB\NFS: Data is identified by a path on a Samba/Network file system

 

      File System: Data is identified by a path on the local file system accessible by the platform

 

 

Figure 7: Managing large data volume transfers using “pass by reference”

 

Managing large data volumes

 

152.               As a general rule, service invocation will involve the Statistical Service receiving a message via the organization-provided communication platform. This message will contain the necessary information objects as well as the requested service.

 

153.               In certain circumstances, the service requires large data sets as inputs. Examples of this could include administrative data files or large survey response files. The problem is similar to a “pass by value” situation in that the input data is passed to the service via in-message approaches.

 

Problem

154.               There are a number of problems that can arise if we attempt to send these data sets via the messaging interface:

 

      Dataset transfer time can be slow due to messaging overhead (packing / unpacking of data, message segmentation and reassembly, etc.)

      Communications platform performance may degrade due to the load of transporting the messages between services

      Service memory requirements can increase before required use (see State Deferral discussion

 

Solution

155.               In order to address this problem, we provide a “pass by reference” mechanism that avoids the need to use the communication platform messaging layer to transport these large data sets.

 

156.               The approach is as follows:

 

      The data set being sent to the service is stored in a source location in a manner local to each organization – the location name is associated with a Uniform Resource Identifier (URI)

      The service consumer invokes the requested service by sending it a message containing the URI for the dataset

      The service provider receives the URI reference and when ready attempts to retrieve the dataset from repository or cache. If successful, it executes its service actions

      Upon completion, it may update or place a resulting dataset (if relevant) in the repository or cache

 

157.               The implementation of a data source is local to each organization and may be implemented as part of the communications platform. Organizations may choose to implement a utility service, a repository, a file cache, or some other mechanism. URI management is also a part of local operation.

 

158.               The CSPA provides the following guidance for service input dataset retrieval protocols.

 

Recommended protocols:

      Simple http: file transfer from data source to the service logic (without additional protocols such as REST) 

 

Acceptable protocols:

      ftp: file transfer from the data source to the service logic

      Use of network file system services (such as SMB, NFS) with appropriate file reference

 

Not Recommended:

      Database retrieval using queries

 

E.    Application Design Principles

 

159.               The design principles have been selected to maximize the flexibility of the Statistical Services wrapped or developed in the context of the CSPA.  The flexibility of the Statistical Service directly impacts the level of reuse, the flexibility required of the industry vision and the ease with which a statistical organization can implement a Statistical Service.

 

160.               Principle: Use available standards

 

Statement: The design of Statistical Services should align with, and harness, relevant existing standards and frameworks wherever possible. 

 

161.               Principle: Use architecture patterns


Statement: Follow both request/response and publish/subscribe architecture patterns depending on best fit to requirements .

 

162.               Principle: Implement using GSIM


Statement: Manage standardized service contracts based on GSIM objects

 

163.               Principle: Coupling


Statement: Enable services to be loosely coupled externally and be aware of internal coupling.

 

164.               Principle: Service Automony


Statement: Maximize service autonomy (completeness) to enable share-ability and reusability (External & Internal)

 

165.               Principle: Non functional requirements


Statement: Non-functional requirements form a key input in design decisions.

 

166.               Principle: Independence between design and implementation

 

Statement: The descriptions of Statistical Services are layered in conceptual (Statistical Service Definition), logical (Statistical Service Specification) and implementation (Statistical Service Implementation Description).

 

VI.               Technology Architecture

 

167.               Technology Architecture is defined in CSPA as:

 

“The description of the infrastructure technology that underlies (supports) the business and application layers.”

 

168.               Within each statistical organization, there needs to be an infrastructural environment in which the generic services can be combined and configured to run as element of organization specific processes. This environment is not part of the CSPA. The CSPA assumes that each statistical organization has such an environment and makes statements about the characteristics and capabilities that such a platform must have in order to be able to accept and run Statistical Services that comply with CSPA.

 

169.               Platform for Service Communication:   A communication platform provides the capability for communication between Statistical Services. It enables inter-service communication while allowing Statistical Services to remain autonomous and adds additional capabilities for monitoring and orchestrating the information flow. To assemble a built Statistical Service, the communication platform is updated to integrate with new services. There are multiple ways of establishing a communication platform. Examples of architectural components could be BPMS, ESB, Workflow Engines, Orchestration Engines, Message Queuing and Routing.
 
170.               Platform for Configuring and Controlling Services and Processes:   The Platform for Controlling Service and Process execution encompasses the functionalities and   tools to support the management and maintenance of services metadata, artifacts and policies.   Examples of how this mechanism could be achieved include Business Process Modelling System, Lifecycle Management, Service Monitoring and Management.    

   
171.               Platform for Reporting on Services and Processes:   The Platform for reporting is responsible for enabling real-time monitoring and near-real-time presentation of user defined business key performance indicators (KPIs).   Examples of how this mechanism could be achieved are Static Dashboard or Business Activity Monitoring (also generates alerts and notifications to user when these KPIs cross specified thresholds).

 

A.    Communication Platform

 

172.               The CSPA provides guidance on the way that organizations should go about building new or wrapping existing Statistical Services.   When the time comes for an organization to use a Statistical Service that conforms to the CSPA there are some organization specific technology approaches that also need consideration.  

 

173.               CSPA does not specify how organizations will coordinate the use of Statistical Services to implement a wider business process.   Organizations will need a technology solution to support communication between Statistical Services since the Statistical Services are not to talk directly to each other.

 

174.               Where the Statistical Service being used is largely independent and interfaces between that Statistical Service and others it maybe manually managed by a person. There may also be other relatively trivial uses of Statistical Service where a bespoke solution to integrate them is developed.   These are sub-optimal yet pragmatic ways of achieving reuse of Statistical Services.

 

175.               Where the integration of Statistical Services is non-trivial a communications platform of some sort will usually be required.   The key functions of the communication platform are:

 

      Orchestration - managing the sequence of flow of invocations of the Statistical Services;

      Error handling - where Statistical Service fail or where the output of services contain erroneous cases that require a different treatment;

      Message payload translation - in particular where a Statistical Service does not support standard GSIM implementation objects - It is possible to offload this function to specialized Statistical Service ;

      Auditing, Logging, Activity Monitoring;

      Performance Management;

      Security

 

176.               Figure 8 illustrates the relationship between the elements that are specified in the CSPA and the underlying Communications Platform that is local to an Organization.

 

 

Figure 8: Statistical Service Components and Communication Platforms

 

177.               In this diagram, two Statistical Services (Edit, Coding 1) have been defined and specified in compliance with the CSPA, and their implementations are communicating with each other within the environment of a statistical organization. The Statistical Service instances communicate with each other through the organization’s communication platform – this may be a full SOA implementation (bus or broker), a CORE implementation, or some other more rudimentary platform (or no platform at all).

 

178.               It is important to state that CSPA does not prescribe the capabilities and architecture of the underlying Communications Platform – it instead assumes that an organization’s Assemblers and Configurers will be responsible for addressing how the platform supports the use of CSPA-compliant Statistical Services. This allows the CSPA and its Statistical Services to be used by the widest possible community amongst statistical organizations, all of who may be in different stages of development and modernization.

 

179.               In the diagram one can see that there is a second Coding Statistical Service (Coding 2) that doesn’t (yet) implement the complete Statistical Service Specification due to a transitional state. An organization may optionally make use of some form of translation service to address differences between their interfaces (typically at the information encoding level). This is seen as a transitional state – the goal is to ensure that all services adhere to the Statistical Service Specification, while allowing for differences at the Statistical Service implementation level at the protocol level (and underlying platforms).

 

 

VII.               Roles

 

180.               Using CSPA will create new functional roles within a   statistical organization. These roles may already exist in some   statistical organizations and may have particular people (for example, Chief Information Officer, Enterprise Architect) assigned to them.   CSPA makes no recommendation about who in an organization should play these roles – this is left to each   statistical organization to decide who performs that role. This section and Figure 9 explain the functional roles in the form of a user story.  

 

 

Figure 9: Roles in CSPA

 

Investor Story

181.               The Investor receives demand for a new data collection and needs to compare the cost of running a collection using traditional methods, with the cost of using a set of components as per CSPA. The Investor identifies existing Statistical Services to be used and identifies gaps where no Statistical Service already exists. The Investor weighs up creating a fully bespoke processing solution for the collection against having to build a new Statistical Service that fits into a set of existing Services. This would be done in consultation with other roles.

 

182.               One example would be a case where the government legislates the requirement that some data collections must support people completing their reporting obligations on-line using eForms. The investor reviews the impact of this change and identifies that there is a gap in the Services required that supports rendering the design of an eForm for respondents to use. The investor assesses the relative cost and how quickly the new collection capability can be implemented.

 

183.               To assess the cost the investor needs to review what existing Statistical Services are required to implement the new Statistical collection, review the Statistical Service catalogue and identify that an eForm capability is missing from the current catalogue. The requirements for the collection, e.g. the requirement to perform an early release of data after 30 days, could cause some functions to be run in a repeated fashion using coding and editing Services. Once decisions as to how the data collection is to be run are made, these are conveyed to the Designer for more detailed work.

 

Technical Specialist   roles

 

Designer Story

184.               The Designer has been given a set of business requirements at a high level from the Investor, describing what data is needed, and some parameters of the process. In order to determine what functionality is available, the Designer will consider internal and external capabilities. This will involve a search in the Statistical Services Catalogue, to determine if there are external candidates or catalogued internal candidates. When a possible internal candidate is found, there will be a decision made as to whether the existing functionality should be wrapped and exposed as a Statistical Service, or whether a new Statistical Service should be built. In the latter case, potential collaborators should be identified and negotiated with the Investor. There may not be internal existing functionality. This assumes an up-to-date Statistical Services Catalogue which is known and useable. Drivers would include lower cost, synergy with existing applications and other organizations, and a more generalized approach which will realize greater efficiencies.

 

185.               Once development has been decided, or in the case where existing functionality must be heavily modified, for each needed Statistical Service the Designer will specify the needed functionality to meet requirements. The Statistical Service is defined on a conceptual and logical level by a Service Definition using GSIM information objects and a Service Specification using GSIM implementation objects as presented in Figure 5.

 

186.               On an implementation level, decisions must be made about how to realize needed functionality using technology approaches and these are documented in a Service implementation definition by the Builder.

 

187.               Service design across these levels includes information design, technology design, work-flow, interface design, and other relevant aspects. The Designer will address Service dependencies, Service contracts, extensibility, and all data, metadata and performance metrics. Capabilities for configuration by users will also be determined, as well as the degree of configuration to be implemented by the Builder.

 

188.               An alternative scenario is one where Statistical Services are already available, having been found in the Statistical Services Catalogue, and meeting all identified requirements. In this case, the Designer specifies which Statistical Services are to be used, and specifies this for the Assembler to work with directly.

 

Service Builder Story

189.               The Service Builder receives Statistical Service definition and a Statistical Service specification from the Designer. The Statistical Service is then implemented by the Builder, by creating a Service implementation definition. The Builder will also implement features of the Service so that the Assembler can integrate it into the local environment so it can be deployed. The Builder tests the components to support the specified functionality.

 

Service Assembler Story

190.               The Assembler will take the Statistical Service and integrate it according to the understanding of the needed business process, as expressed in the design documentation. This might take place within a process management environment. There are two cases for the Assembler: one where the required Statistical Services would be entirely assembled within the local environment, which provides a high degree of confidence in their compatibility. The second scenario involves the use of external Statistical Services, which might require extension or modification. In this latter case, issues would be communicated to the Designer and Builder for further development.

 

Non-Technical Specialist   Roles

 

Configurer Story

191.               The Configurer takes the assembled process, and makes it suitable for use in the intended statistical domain. Parameters are specified according to the domain knowledge of the Configurer. Any issues with the assembled Service are communicated to the Designer, Builder, or Assembler.

 

User Story

192.               There is no single user but a chain of users along the Statistical process. The user chain covers everyone from the designers of surveys, through the conduct of data collection operations, through to those who process the collected data. The User does not need to know where the data and metadata are stored - in particular the user does not need to actively manage how data flows between parts of the processing environment.

 

193.               Once the survey has been designed and administered, the next User is alerted of the arrival of survey responses, including some data which will be auto-coded, and other data which needs manual coding. Once coded, the data would go through a series of edits, including data cleaning and validation, imputation, confidentialization, etc. the User should be able to perform these operations without experiencing a high degree of frustration or complexity.

 

 

VIII.               Enablers

 

A.    Catalogues

 

194.               A primary aim of CSPA is to support efficient sharing and reuse of process patterns, information and services at an organization and international level.

 

195.               One key requisite in achieving this goal is an ability to reliably and efficiently discover what is available for reuse to support a particular business need.  This includes an ability to efficiently assess whether a potentially reusable artefact is, in fact, “fit for purpose” in practice when it comes to supporting that particular business need.

 

196.               Catalogues of reusable resources have a key role within CSPA. They provide lists and descriptions of standardized artefacts, and, where relevant, information on how to obtain and use them. The catalogues can be at many levels, from global to local. For example, i t is envisaged that each statistical organization will have catalogues of processes, information objects and Statistical Services.

 

197.               However, for the purposes of this project, it is the global level that is the primary interest. The global catalogue is called the Global Artefact Catalogue. The Catalogue will provide information about resources and potential collaboration partners, helping to ensure that the modified component conforms to the requirements of the CSPA. Governance and support mechanisms and processes will need to be defined to ensure the continued relevance and utility of the catalogues.

 

198.               The catalogues should support the lifecycle management, governance and use of components and services, providing the right level of artefacts for service design, integration and transition to a production environment.

 

199.               The catalogues are not designed to be platform specific, and will not necessarily hold copies of executable code. They will, however, provide all necessary information about how to access the artefacts, including contact details for further information. They should provide sufficient contents to support the CSPA.

 

200.               The Global Artefact Catalogue will also contain project plans which will include information about developments that are planned or in progress. This information will facilitate the creation of a global roadmap of statistical organizations developments that can be consulted to see which organizations are developing what and when.

 

201.               The types of artefacts in Catalogues are:

 

       Frameworks include artefacts such as the GSBPM and the GSIM

       Standards include DDI and SDMX

       Policies supporting governance and use, for example "no changes to GSBPM or GSIM unless they are the result of a formal approval process"

       Guidelines concern the practical use and integration of artefacts. For example the functionality of Statistical Services (especially interfaces) and descriptions of how to ‘plug’ component in

       Other knowledge assets could include architectural principles and information on how to use the catalogues

       Project plans about developments, in progress or planned, to promote collaboration at the earliest possible stage in the development of artefacts, e.g. "Statistics New Zealand aim to develop and make available component X by (date)". This will provide the basis for a global roadmap.

 

202.               The Information Architecture will guide the approach to defining, populating and using catalogues.  This will include identifying and reusing appropriate reference frameworks related to registries.  (These reference frameworks, unlike GSIM, are not specific to statistical production.)  

 

B.                  Governance

 

203.               The exact details of the governance of CSPA are still being discussed. In the following paragraphs, the current thinking is outlined in terms of the groups involved and the role they will play.

 

High-Level Group for the Modernization of Statistical Production and Services (HLG)

 

204.               The CSPA is being developed as a project overseen by HLG. This group will be the custodians of the outputs. Whilst the architecture itself will be “owned” by the international official statistics community, it will be administered by the HLG, as top-level representatives of that community.

 

HLG Executive Board

 

205.               The HLG Executive Board will be responsible for the strategic management of on-going HLG projects. It will also prepare new project proposals for agreement by the HLG, and seek support and resources from interested organizations. Following the completion of a project, the board will monitor, and where necessary, coordinate the implementation or use of the project results.

 

206.               In particular regard to CSPA, this group will take a strategic view on where investments into Statistical Services could be made internationally. For example, they may decide that the statistical industry has a particular gap around a capability or activity such as confidentiality. The group would look at what international collaboration could be undertaken to fill this gap. The Modernization Committee on Production and Methods will provide input into these discussions

 

Modernization Committee on Production and Methods

 

207.               The UNECE, on behalf of the international statistical community will provide leadership for maintaining and extending CSPA to retain its relevance and value as an 'industry asset' through the Modernization Committees.

 

208.               Creating a CSPA is only the first part of the story. It is likely that the architecture, or elements of it, will need to be revised and refined from time to time to ensure continued relevance. As the architecture will be a common asset for the international official statistics community, there will need to be a clear and transparent process for change management.

 

209.               For practical purposes, proposals for changes will usually be discussed in the modernization committee or a specific task team under the Modernization Committee. These groups will then formulate recommendations for change, which will typically be bundled together as a package to be formally signed off.

 

210.               The HLG, Executive Board and the Modernization Committee will need to carefully balance the advantages of change, in terms of increasing relevance and usefulness, against the costs of having to implement those changes within statistical organizations. A reasonable degree of stability over time is therefore a key requirement for the architectural framework.

 

Architecture Review Board

 

211.               It is proposed that a task team is set up under the umbrella of the Modernization Committee on Production and Methods. The members of this task team will be drawn from a range of backgrounds including IT, methodology and standards.

 

212.               This group will be available to provide advice to statistical organizations who are building Statistical Services. This may take the form of reviewing the GSIM implementation that is being proposed, interpreting how to apply CSPA to a particular situation, amongst other things.

 

213.               When a statistical organization wishes to add a Statistical Service to the Global Artefact Catalogue, they will be asked to complete a self assessment of how their Statistical Service aligns with CSPA. Then the review team will assess the service and add it to the catalogue if appropriate

 

214.               Figure 10 summarizes these groups and roles.

Figure 10: A proposal for CSPA Governance structure

Annex 1: Templates

 

Statistical Service Definition

 

Template

 

Name

 

Level

 

GSBPM

 

Business Function

 

Outcomes

 

Restrictions

 

GSIM Inputs

 

GSIM Outputs

 

Service dependencies

 

 

Example

 

Name

Auto coding

Level

Definition

GSBPM

5.2 Classify & Code

Business Function

This Statistical Service maps a field to classification code

Outcomes

 

This results in a transformed data set that is coherent with the target classification scheme

Restrictions

None

GSIM Inputs

Unit data set, unit data structure, processing activity, classifications, codelist, Rules

GSIM Outputs

Unit data set, unit data structure, number of failed (uncoded) fields

Service dependencies

 

 


Statistical Service Specification

 

Template

 

Statistical Service Specification: Name of Statistical Service

 

Protocol for Invoking the Service

 

This service is invoked by calling a function called “Name of Statistical Service”.

 

Describe any parameters

 

The protocol used to invoke this function should be in compliance with the guidance provided for developing Statistical Service by CSPA.

 

Input Messages

 

In GSIM terms, the inputs to this service are ……

Describe specific inputs in terms to GSIM implementation

 

Output Message

 

The outputs of the service are …… 

Describe specific outputs in terms to GSIM implementation

 

Example

Statistical Service Specification: Autocoding

 

Protocol for Invoking the Service

 

This service is invoked by calling a function called "CodeDataset". There are the following seven parameters (all of them are expressed as URI’s, i.e. all data is passed by reference)

 

1)           Location of the codelist;

2)           Location of the input dataset;

3)           Location of the structure file describing the input dataset

4)           Location of the mapping file describing which variables in the input dataset to be used

5)           Location of the output dataset generated by the service

6)           Location of the structure file describing the output dataset generated by the service

7)           Location of the process metrics file generated by the service.

 

All parameters are required.  
 

The protocol used to invoke this function is SOAP, and is in compliance with the guidance provided for developing Statistical Service by CSPA.

 

Input Messages

 

The first four parameters for the service refer to input files.
In GSIM terms, the inputs to this service are:

 

1) a NodeSet consisting of Nodes, which bring together CategoryItems, CodeItems, and other Designations (synonyms).

2) a Unit data set – the texts to be coded for a particular variable

3) a Data structure, describing the structure of the Unit data set

4) a set of Rules, describing which variables the service should use for which purpose.

 

1) The codelist to be passed in must be expressed as a DDI 3.1 instance, using the following structure. The table below shows the mapping of the conceptual GSIM objects to their encoding in DDI 3.1:

 

DDI 3.1 Element

GSIM Object

DDIInstance (@id, @agency, @version)

Processing Activity

  ResourcePackage (@id, @agency, @version)

[No conceptual object]

      Purpose (@id)

[No conceptual object]

      Logical Product (@id, @agency, @version)

[No conceptual object]

          CategoryScheme (@id, @agency, @version)

CategorySet

              Category (@id, @version)

CategoryItem

                  CategoryName

CategoryItem/Name

                  Label

Designation

          CodeScheme (@id, @agency, @version)

CodeSet

              Code

CodeItem

                  CategoryReference

[Correspondence with CategoryItem in GSIM]

                      Scheme

[Implementation Specific]

                      IdentifyingAgency

[Implementation Specific]

                      ID

[Implementation Specific]

                      Version

[Implementation Specific]

                  Value

CodeValue

 

For the sake of simplicity, it is assumed that the file contains only one CategoryScheme, that all Codes refer to Categories in the CategoryScheme, and that there is only one CodeScheme.

 

2) The unit data set is a fixed-width ASCII file containing at least a case ID (50 characters maximum) and a variable containing text strings to be coded. Each entry should be on a single line. The corresponding GSIM objects:

Data File

GSIM Object

Unit data set

Unit Data Set

Case ID

Unit Identifier Component

Text string

Attribute Component


3) The structure of the unit data set must be expressed as a DDI 3.1 instance, using the following structure. The table below shows the mapping of the conceptual GSIM objects to their encoding in DDI 3.1:

DDI 3.1 Element

GSIM Object

DDIInstance (@ id, @ agency, @version)

Processing Activity

  ResourcePackage (@id, @agency, @version)

[No conceptual object]

      Purpose (@ id)

[No conceptual object]

      Logical Product (@id, @agency, @version)

[No conceptual object]

          DataRelationship (@id, @version)

Record Relationship

              LogicalRecord (@allvariablesInLogicalRecord="true")

Logical Record

                  VariableScheme (@id, @agency, @version)

[No conceptual object]

                      Variable (@id, @version)

 

Represented Variable/Instance Variable

                          VariableName

Name

                          Representation

Value domain

                              TextRepresentation (@ maxLength)

[No conceptual object]

 

Note:   For the PoC we will simply assume that variables appear in the data set as they are ordered in the DDI file. Furthermore, only one VariableScheme is assumed.

 

4) The mapping of the variables that are used by the service, to the roles they have within the coding process, must be expressed in the XML format described below. In GSIM terms, these mappings can be seen as Rules.

 

XML Element

Description

DatasetMap

Container for mappings

    Mapping

Mapping of variable to a role

          Role

Role of a variable within the service. Can have the content DataId or DataToCode

          VariableReference

Refers to the variable from input 3 (unit dataset structure) playing the given role

                  ID

ID of the variable

                  IdentifyingAgency

Agency identifying the variable

                  Version

Version of the variable

 

Output Messages

 

The output of the service contains of three files. In GSIM terms, the outputs of this service are:

5)           a Unit data set containing the coded data for the variable concerned;

6)           a Data structure, describing the structure of this Unit data set

7)           a Process Metric, containing information about the execution of the service.

These generated files will be placed at the locations indicated by the 5 th , 6 th   and 7 th   input parameters. No return parameter will be generated by the service.

 

5) The unit data set will be a fixed-width ASCII file containing (for the successfully coded entries) the case ID (50 characters maximum) followed by the Code. Each entry should be on a single line.

Data File

GSIM Object

Unit data set

Unit Data Structure

Case ID

Unit Identifier Component

Code

CodeValue

 

6) The structure of the unit data set will be expressed as a DDI 3.1 instance, using the following structure. The table below shows the mapping of the conceptual GSIM objects to their encoding in DDI 3.1:

 

 

 

 

 

 

 

 

DDI 3.1 Element

 

GSIM Object

DDIInstance (@ id, @ agency, @version)

 

Processing Activity

  ResourcePackage (@id, @agency, @version)

 

[No conceptual object]

      Purpose (@ id)

 

[No conceptual object]

      Logical Product (@id, @agency, @version)

 

[No conceptual object]

          DataRelationship (@id, @version)

 

Record Relationship

              LogicalRecord (@allvariablesInLogicalRecord="true")

 

Logical Record

                  VariableScheme (@id, @agency, @version)

 

[No conceptual object]

                      Variable (@id, @version)

 

 

Represented Variable/Instance Variable

                          VariableName

 

Name

                          Representation

 

Value domain

                              TextRepresentation (@ maxLength)

 

[No conceptual object]

 

Again, for the PoC it is simply assumed that variables appear in the data set as they are ordered in the DDI file.

 

7) The Process metrics will be expressed as an XML file structured in the following way:

XML Element

Description

CodingMetrics

Container for the coding metrics

    Result (@Datetime)

Contains the results of the service execution started at the given date/time

          TotalRows

The number of rows found in the input dataset

          TotalCoded

The number of successfully coded records

 

Error Messages

 

When the coding process cannot be executed or is aborted due to some error, the service will return an error message. The following error messages can be generated by the service.

 

 

 

 

 

 

Error message

Description

  Error in input codelist

The input codelist cannot be read, is syntactically invalid or its content is inconsistent

  Error in input dataset

Either, the input dataset, the structure file describing the dataset or the input mapping file cannot be read or contains some error.

  Other/unspecified error

Some error occurred during the coding process

 

The error message will be returned to the caller in the form of a SOAP Exception. Note that this SOAP Exception may contain an InnerException providing more detailed information about the error.  

 

 


Statistical Service Implementation Description

 

Template

Name

A name that identifies the Statistical Service implementation. It must be unique in the Service catalogue.

 

Version

Version number

 

Builder Organization
The owner of the Statistical Service , i.e. the Service Builder’s organization.

 

Statistical Service Definition

The link to the Statistical Service Definition document.

 

Statistical Service Specification

The link to the Statistical Service Specification document.

 

Invocation protocols

List of technical protocols supported by the service for communication. Accepted protocols are listed in this document.

 

Service Interface

Protocol-dependent specification of the information required to invoke the service.

Examples:

-           WSDL interface for SOAP Web Service protocol

-           List of HTTP request parameters for REST Web Service protocol

-           Command line specification for Command Line protocol

-           Add other examples for other supported protocols

 

Data-by-Reference protocols

For each input passed as reference, specify supported protocol(s). Accepted protocols are listed in this document.

 

Technical dependencies

List of technical requirements of the service in terms of:

-           Operating system(s) (specify version)

-           Runtime platforms – any additional software that has to be installed on the machine the service is installed on  (e.g. SAS, R, Java virtual machine, .net runtime, J2EE container, etc. – Specify version)

-           Database(s)

-           Other dependencies (libraries, packages etc.)

 

Installation documentation

Installation guide for the Service Assembler

 

Additional information

Any additional information for a Service Assembler which is deemed relevant by the Service Builder


Annex 2: Describing “Statistical Production”

 

215.               To enable the documentation and understanding of CSPA to be efficient and consistent, it is necessary to define a number of core concepts which are relevant for all readers (e.g. managers, subject matter statisticians and technologists) in regard to “Statistical Production”.

 

216.               It is particularly important to have a common understanding of the concepts Business Function , Business Process and Business Service [7] . The terms and definitions used in CSPA for these concepts are drawn from TOGAF (The Open Group Architectural Framework).  TOGAF is widely recognized and used for defining architectural frameworks.  The terminology and modelling of the Production Group within GSIM already aligns with TOGAF.

 

A traditional view of Statistical Production

 

217.               Before GSBPM, it was common to conceptualize statistical production as a “value chain” (i.e. a chain of activities that an organization operating in a specific industry performs in order to deliver a valuable product or service for the market).

 

218.               Different lines (e.g. different statistical domains) of production typically follow fundamentally similar value chains – although they are not identical in detail.  This is commonly characterized along the lines shown in Figure 11.

 

Figure 11: Simple “matrix” view of Statistical Production

 

219.               When each statistical domain’s line of production is designed and implemented largely in isolation, this is commonly referred to as a stove pipe approach to statistical production.

 

220.               This is recognized as highly inefficient. It means duplication in building and maintaining production infrastructure (for example tools, trained staff) to support production lines, as well as lost opportunities in regard to economies of scale and flexibility.

 

221.               Within an organization, the line of production for an output such as national accounts is dependent on outputs from other domains – even if these lines of production are not well integrated in terms of internal operations.

 

222.               Also, increasingly often, consumers are seeking an integrated view of the economy, society and/or environment rather than being interested only in the outputs of one line of production.

 

The modernization view of Statistical Production

 

223.               CSPA aims to facilitate a move away from stove piping within or across organizations, in order to realize the benefits associated with economies of scale and flexibility.

 

224.               An enabler for this move is to think about statistical production in a manner which recognizes that each line of production (also known as a ‘suite of Statistical Programs’ ) [8]   is designed to address a specific set of statistical needs, and to produce a specific set of statistical outputs based on a specific set of inputs.

 

225.               A specific team of subject matter statisticians will (typically) have overall accountability for ensuring the statistical needs are understood and addressed correctly in the design and operation of a particular suite of statistical programs and that the statistical outputs achieve the agreed level of quality.

 

226.               It should also be recognized that maximum efficiency will be gained if the equivalent Business Function within each suite of statistical programs can be performed using common production infrastructure.  

 

227.               This approach complements a trend in many statistical organizations toward managing statistical production based on a matrix of horizontal (suite of statistical programs) and vertical (functional) roles, responsibilities and accountabilities (see Figure 10).

 

 

 

 

Understanding Business Function, Business Process & Business Service

 

Business Function

 

228.               GSIM currently defines Business Function as something an enterprise does, or needs to do, in order to achieve its objectives.  This represents a simpler, and more self-contained, expression of the definition used in TOGAF.

 

229.               When identifying Business Functions , the emphasis is on an enterprise level (‘whole of business”) perspective. There is a recognition that different parts of the business may have different detailed requirements in regard to a particular function.  These differences in detailed requirements can be one factor that can lead to equivalent business functions being performed by many different organizational units using many different tools (i.e. the stove pipe approach). CSPA aims to minimize the need for separate tools for equivalent Business Functions performed within different suites of Statistical Programs .  

 

230.               Not all Business Functions that statistical organizations need relate directly to Statistical Production (and CSPA).  For example, statistical organizations need to pay their staff.  At a high level, the function of paying staff (e.g. the “payroll function”) for statistical organizations tends to be largely “in common” with other enterprises (government and private sector).  Many statistical organizations already outsource some, or all, aspects of this because generic providers, with large economies of scale often offer a service which performs the function cost effectively for the statistical organization.  Even where this is not the case, statistical organizations tend not to choose to invest in independently designing, building and maintaining tools capable of performing this function.  Typically, they choose to use (including integrate in their own environment) externally developed tools, including applying a greater or lesser degree of customization for the purposes of their specific organization.

 

231.               CSPA can be seen as aiming to make such a paradigm of reuse of tools more viable for supporting Business Functions related to Statistical Production – such as those described in the GSBPM.

 

232.               An example of a Business Function related to Statistical Production is “Impute missing or unreliable data”.  (This corresponds to 5.4 in the GSBPM.)  As GSBPM suggests, Business Functions can be considered at different levels of detail.  For example, a sub-function within “Impute” could be “flag values which have been imputed”.

 

233.               At the level of the Business Function , there is no detail of

 

      what method has been used to impute estimates

      what system or other resource (e.g. a staff member) will perform the work  

 

 

 

 

Business Process

 

234.               A Business Process is a series of logically related activities or tasks performed together to produce a defined set of results. Key aspects include

 

      a processes consists of a series of steps (activities/tasks)

      there is sequencing (or “flow”) between steps. Some sequencing may be conditional (for example, if the output of Step A is an error message then we undertake Activity X, otherwise we proceed immediately to Step B

      a business process is undertaken for a particular purpose

      what is represented (for the sake of simplicity and clarity) as a single step in a high level depiction of a process might – when viewed in more detail - comprise a lower level (sub)process consisting of multiple steps

 

235.               Each Statistical Program (e.g. production of Retail Trade statistics in Australia) will be associated with a Statistical Business Process (as defined in the GSBPM) whose steps can be associated (at a high level) with Business Functions identified within GSBPM.

 

236.               Each Statistical Program , however, is undertaken for a particular set of purposes (e.g. to produce and disseminate particular outputs, with particular levels of quality, that address particular statistical needs).  For this reason, the Statistical Business Process associated with a Statistical Program:

 

      will be unique in terms of the specific inputs and outputs for the process overall, and individual steps within it

      is likely to be unique in terms of the exact sequence of process steps and process flows overall (e.g. from Collect to Disseminate). some “patterns” of process steps and flows (process patterns) are, however, likely to be in common with other Statistical Programs in regard to particular Business Functions

 

237.               The following provides a simplified illustration of possible interactions between Business Process and Business Function in the context of different Statistical Programs.

 

A.      The Business Process for Statistical Program “X” may require that validation of certain variables takes place (GSBPM 5.3) before those variables are subject to Imputation (GSBPM 5.4).  Validation of other variables might then take place, followed by imputation of those other variables if required.

 

In this case the particular Statistical Business Process has cycled through the Business Function “Impute missing or unreliable data” twice, applying it to different content (different variables) each time.
 

B.      The Business Process for Statistical Program “Y” may not require a second “cycle” of imputation for a second set of variables.
 

C.     The Business Process for Statistical Program “Z” may impute all variables in a single cycle and then have a data quality assessment step.  One of the possible outcomes of the data quality assessment step may be to repeat the imputation function using different imputation methods and/or different parameters/thresholds for the methods that were used the first time.  

 

The Business Process for Statistical Program ”Z” potentially involves a second cycle of imputation, similarly to Statistical Program “X”, but the two business processes differ in terms of the process flow that leads to a second cycle, and  how the input settings differ between the first cycle and the second cycle    

 

238.               The Business Function is the same (“Impute missing or unreliable data”) in all cases but the Business Process context for performing the function is different in all five instances (including 2x2 cycles).

 

Business Service

 

239.               Having identified what Business Functions need to be performed during a Statistical Business Process , it is necessary to identify who – or what – will undertake the work associated with each function. 

 

240.               TOGAF defines a Business Service as supporting delivery of a business capability ( Business Function ) through an explicitly defined interface. An explicitly defined interface requires the knowledge of what the service will deliver (including in what time frame) given a particular set of inputs.  This is termed the “contract” for the service.

 

241.               The definition means that not every Business Function in every Business Process is necessarily performed by a Business Service.   Exceptions include:

 

      The (individual or organizational unit) owner of the “horizontal” Business Process performs all the activities associated with that process step ( Business Function ) rather than interfacing with a Business Service that assists them.
 

      Another agent (e.g. person, organizational unit and/or IT application) performs work on behalf of the process owner but either the outputs which will be delivered are unable to be specified in advance or, while desired outputs can be specified, there is not a high level of assurance the agent will deliver in accordance with those specifications.    

 

242.               The simplification, predictability and economy of scale associated with Business Services typically make them a highly desirable means of delivering Business Functions /capabilities.

 

243.               From the perspective of the Business Process that uses the service (“the consumer”) it doesn’t matter how the outputs are produced as long as the service adheres to its “contract” (e.g. Service Level Agreement or similar) and delivers the specified business outputs.  In other words, to the consumer, the service is a simple “black box”, even if the service provider needs to undertake an elaborate process flow of their own in order to produce and deliver the outputs.

 

244.               Taking a simple example, in the past  the service provided by a “copy room” required users (consumers) of the service to provide the documents to be copied together with information about how many copies were required, what sort of paper to use, what form of binding was required, when the outputs were required and so on.  Based on their specifications, the consumer would also know in advance the cost (if any) of receiving the service they had requested.

 

245.               Having set the requirements, the service consumer did not need to concern themselves how the service providers in the “copy room” delivered the service as long as the outputs meet the agreed parameters in terms of time, cost and quality.  If, for example, the agreed turn-around time was 8   hours then it wouldn’t matter, from the service consumer’s point of view, whether the team in the copy room

 

A.      took the full 8 hours using a slow copier and undertaking extensive manual work to apply the required binding

B.      used an ultra-fast and ultra-sophisticated copier which completed the work, including binding, in 20 minutes

C.     outsourced the work to another company that could deliver the work within the required parameters        

 

246.               If the copy room upgraded its technology to move from A to B, it would remain possible to offer the same service (although, alternatively, they could choose to offer improved services).

 

247.               Within many statistical organizations it is already common practice, for example, for the manager of a survey based Statistical Program to be able to draw on a Business Service provided by a corporate team of field staff to assist with collecting data rather than each program needing to maintain and manage its own team of interviewers.  In these cases there is a series of inputs for which the manager of the Statistical Program is responsible such as a questionnaire suitable for administration by the interviewers, any additional instructions for the interviewers and briefings for the interviewers about the aims and design of the Statistical Program in case respondents ask questions of them.    

 

248.               It was identified in the previous subsection that different Business Processes will require equivalent Business Functions applied to different inputs, potentially using different statistical methods and/or different parameters and thresholds.  In many case, therefore, it is highly desirable that, for purposes of reuse and economies of scale, any Business Services designed to perform a particular function is highly configurable to meet the needs of a wide range of Statistical Business Processes .

 

249.               It is also highly desirable that services are scoped in a manner which allows them to support flexible sequencing and configuration of Business Functions within different Business Processes .

 

 

 

 

250.               For example, the GSBPM identifies several sub-functions associated with imputation including

 

A.      the identification of potential errors and gaps;

B.      imputation using one or more pre-defined methods e.g. “hot-deck” or “cold-deck”;

C.     writing the imputed data back to the data set, and flagging them as imputed;

 

251.               It would be possible to design a single Business Service that simply accepted a number of inputs and then performed all three of these functions in a set sequence.

 

252.               It may be, however,

 

      that for a particular Statistical Program there was already a preferred alternative means of performing function A but a desire to use the new service for B and C

      that for another Statistical Program there were unusual requirements (for sound business reasons) in regard to how data was flagged as imputed, but the new service would be ideal for A and B

 

253.               Even if there is already a single underlying IT application which is performs A, B and C it may be possible to provide three different services which provide access to three different aspects of the application’s functionality rather than accessing the full (IT) functionality through a single “all or nothing” service.

 

Getting the verticals agreed

 

254.               Business Functions and Business Services are essential, interrelated, “verticals” in this simple model.

 

255.               If organizations are to be able to identify Business Services for reuse, they first need to be able to identify the Business Function they need performed and then they can identify existing Business Services capable of performing that function.

 

256.               The fact that a Business Service is capable of performing a Business Function does not, in itself, guarantee it will be suitable for reuse in regard to a particular Statistical Business Process .   For example

 

      the Business Service may not be capable of being configured to support the necessary inputs, methods or other non-negotiable requirements associated with the particular Business Function in the particular Statistical Business Process

      the Business Service may not be able to operate successfully in the local legal (e.g. IP/information security), organizational and/or IT environment

 

257.               Nevertheless, without common agreement on Business Functions it is not possible to scope Business Services on a consistent functional basis and then to discover them for potential reuse to perform specific Business Functions .

 

258.               As highlighted in examples provided above, GSBPM provides a good frame of reference for locating and discussing Statistical Business Functions .  As per the discussion on sub-functions within imputation, however, there are cases where identification of individual functions in more detail than the GSBPM currently provides would be useful.

 

259.               Having recognized this “gap”, it is NOT recommended that as a first step, independently of any specific plan to develop shared services, there is an effort made to identify and agree a comprehensive set of lower level sub-functions within the GSBPM.

 

260.               Instead it is proposed that possible sub-functions associated with a particular lowest level building block in the current GSBPM are identified and reviewed. Then agreed Business Services , aligned with CSPA, are being scoped associated with that lowest level building block.              

 

Designing and performing business processes

 

261.               Business Processes are designed (the steps and their sequencing are decided) and then performed .

 

262.               In cases where Business Processes have not been formally specified, it is possible some steps will have been performed before later steps are decided upon (designed).  This is particularly common in the case of exploratory work when a broad plan exists but details of the process are decided based on the results of previous steps.

 

263.               Even where a Business Process has been designed in detail, it is possible that exceptional conditions which arise while that process is being performed will require reconsideration of the design of some steps.

 

264.               While recognizing not all aspects of design are necessarily fixed before a business process starts to be performed, delineation of “process design” and “process performance” is important within CSPA.

 

265.               Each monthly instance of producing Retail Trade statistics involves performing a Statistical Business Process , but most aspects of the design of that Statistical Business Process remain the same each month.  The more the design can be formalized as a repeatable specification, the more opportunities there are for automation and other efficiencies.  In addition, there will be greater assurance of consistency of outputs from each monthly cycle.

 

266.               Separate consideration of design is not only important for statistical activities which have regularly repeating cycles.  During design it is possible to consider the “process patterns” which have been used by other, similar, programs of statistical production.  This can lead to efficiency and consistency in the design of Statistical Business Processes .  The approach highlights opportunities to reuse (with appropriate configuration) the Statistical Services used by existing programs of statistical production to perform particular process steps.

 

 

Designing, Building, Assembling and Configuring Business Services

 

267.               The first step for a new Business Service is to design it, including establishing the scope of the service.  This includes

 

      identifying the Business Functions it will perform/support

      identifying the range of statistical methods (if more than one) for performing the Business Function which will be supported

      identifying the requirements of different Statistical Business Processes for that Business Function which will be supported in terms of configuration options     

 

268.               The service then needs to be built.  This includes ensuring the service is capable of fulfilling its service contract under all of the configurations the service is designed to support.  This may include building appropriate application services and components.  It may also include establishing a team and providing its members with training and operating protocols so they can provide a service. 

 

269.               Once a Business Service has been built, an organization starts to achieve returns on investment when Business Processes start using it.  A Statistical Business Process typically entails many steps and flows spanning many Business Functions .  A Statistical Business Process, therefore, typically, uses many Business Services .  Any early step in designing (or redesigning) a Statistical Business Process is identifying process patterns, and associated Business Services , it can reuse.  This can be seen as “assembling” the range of business services that are fit for supporting the needs of the particular Statistical Business Process.

 

270.               The assembly step is where the “vertical” interests of designers and builders of Statistical Business Services (obtaining as much reuse of each service as possible) meet the “horizontal” interests of the designers of statistical business processes (achieving the specific purposes, including specific statistical quality targets, of specific statistical programs in a cost effective and timely manner).

 

271.               The benefits of CSPA in terms of increased productivity and increase flexibility require, to some degree,

 

      acceptance in some cases of reuse of Business Services which, while essentially “fit for purpose”, are less than “ideal for purpose” given the specific aims of the specific Statistical Business Process

      some flexibility in the detailed design of Statistical Business Process so they are able to reuse available Statistical Business Services , on the proviso that “horizontal” quality, time and cost requirements are still be met           

 

272.               Having assembled the relevant set of Statistical Business Services they then need to be configured (and tested) based on the specific needs of the particular Statistical Business Process.

 



Annex 3: Glossary

Term

Definition

Application Architecture

Application Architecture (AA) classifies and hosts the individual applications describing their deployment, interactions, and relationships with the business processes of the organization (e.g. estimation, editing and seasonal adjustment tools, etc.).   AA facilitates discoverability and accessibility, leading to greater reuse and sharing. Source: Statistical Network BA definition

Architectural Pattern

The description of a recurring particular design problem which comes from different design contexts. The solution schema is specified by describing its components, its responsibilities its relations and the ways they collaborate. Source: CSPA

Business Architecture

Business Architecture (BA) covers all the activities undertaken by an NSI, including those undertaken to conceptualize, design, build and maintain information and application assets used in the production of statistical outputs. BA drives the Information, Application and Technology architectures for an NSI. Source:   Statistical Network BA definition

Business Function

Something an enterprise does, or needs to do, in order to achieve its objectives. Source:   GSIM

Business Process

A series of logically related activities or tasks performed together to produce a defined set of results. Source: CSPA

Business Service

A defined interface for accessing business capabilities (an ability that an organization possesses, typically expressed in general and high level terms and requiring a combination of organization, people, processes and technology to achieve).

Source:   GSIM

Common Statistical Production Architecture

A set of principles for increased   interoperability within and between statistical organizations through the sharing of processes and components, to facilitate   real collaboration opportunities,   international decisions and investments and   sharing of designs,   knowledge and practices. Source: CSPA

Enterprise Architecture

Enterprise architecture is about understanding all of the different elements that go to   make up the enterprise and how those elements interrelate. It is   an approach to enabling the vision and strategy of an organization,   by providing a clear, cohesive, and achievable picture of what's required to get there.   Source: Statistical Network BA definition

Global Artefact Catalogue

A list and descriptions of standardized artefacts, and, where relevant, information on how to obtain and use them. Source: CSPA

Industry Architecture

A set of agreed common principles and standards designed to promote greater interoperability within and between the different players that make up an "industry", where an industry is defined as a set of organizations with similar inputs, processes, outputs and goals.   Source: CSPA

Information Architecture

Information Architecture (IA) classifies the information and knowledge assets gathered, produced and used within the BA. It also describes the information standards and frameworks that underpin the statistical information (e.g. GSIM, DDI, SDMX). IA facilitates discoverability and accessibility, leading to greater reuse and sharing. Source: Statistical Network BA definition

Interface

A type of contract by which subsystems or component communicate.   Source: CSPA

Non Functional Requirements

Non Functional Requirements are the overall factors that affect runtime behavior, system design, and user experience.   They represent areas of concern that have the potential for application wide impact. Source: CSPA

Principles

Principles are general rules and guidelines, intended to be enduring and seldom amended, that inform and support the way in which an organization sets about fulfilling its mission and business objectives. Source: Statistical Network BA definition

Protocol

Formats and rules for exchanging messages in or between computing systems. Source: CSPA

Reuse

Reuse is the concept of using a common asset (implemented component, a component definition, a pattern...) repetitively in different (or similar) contexts (for example in different business processes), and/or by different participants, and/or overtime.   Source: CSPA

Service

A service is a logical representation of a repeatable business activity that has a specified outcome and is self-contained, fulfils a business need for a customer (internal or external to the organization) and may be composed of other services.

Source: Statistical Network BA definition

Service Contract

A   service contract   is comprised of one or more published documents (called service description documents) that express meta information about a service. The fundamental part of a service contract consists of the service description documents that express its technical interface. These form the technical service contract which essentially establishes an API into the functionality offered by the service. A service contract can be further comprised of human-readable documents, such as a Service Level Agreement (SLA) that describes additional quality-of-service features, behaviors, and limitations.   Source:   http://serviceorientation.com/soaglossary/service_contract

Service Interface

A service interface is the abstract boundary that a service exposes. It defines the types of messages and the message exchange patterns that are involved in interacting with the service, together with any conditions implied by those messages.   Source:   http://www.w3.org/TR/ws-arch/#service_interface

Service Oriented Architecture

Service-Oriented Architecture   (SOA)   is an   architectural style   that supports a way of thinking (Service Orientation) in terms of services and service-based development and the outcomes of services.  


The SOA architectural style has the following distinctive features:

It is based on the design of the services – which mirror real-world business activities – comprising the enterprise (or inter-enterprise) business processes.

Service representation utilizes business descriptions to provide context (i.e., business process, goal, rule, policy, service interface, and service component) and implements services using service orchestration.

It places unique requirements on the infrastructure – it is recommended that implementations use open standards to realize interoperability and location transparency.

Implementations are environment-specific – they are constrained or enabled by context and must be described within that context.

It requires strong governance of service representation and implementation.

It requires a "Litmus Test", which determines a "good service".  

Source: The Open Group   http://www.opengroup.org/soa/source-book/soa/soa.htm

Share

Share is an   ownership   concept where an asset is made available   to other participants for use.   There are levels of sharing. A limited form of sharing would be to provide another participant with the means to replicate (make a copy) the asset (for example give the source code) (i.e. they share an aspect of the asset only). A more involved form of sharing would entail that asset is actually been made entirely common (in this case the asset is also reused).     Source: CSPA

Technology Architecture

Technology Architecture (TA) describes the IT infrastructure required to support the deployment of applications and IT services, including hardware, middleware, networks, platforms, etc. Source: Statistical Network BA definition

 

 


[1] The Statistical Network is a collaboration group involving the National Statistics Organisations of Australia, Canada, Italy, New Zealand, Norway, Sweden and the United Kingdom.

[2] Descriptions are also available via the CSPA Glossary

[3] Whether these business processes relate to different subject matter domains in one statistical agency and/or equivalent subject matter domains in different agencies.

[4] While the standards body responsible for TOGAF recognises the term “Information Architecture”, the formal model underlying TOGAF refers to “Data Architecture”.  The definition from Forrester is consistent with TOGAF’s characterisation of “Data Architecture” and expressed more concisely and simply.

[5] The value agencies achieve through applying CSPA would be greater if such a framework existed.  If a framework is agreed in future then it will be referenced the Information Architecture.

[6] The probable relevance of existing standards such as BPMN and BPEL is expected to be considered at a later stage of development of CSPA.

[7] Terms noted in bold text are GSIM terms

[8] In GSIM a ‘line of production’ would be characterized as a suite of statistical programs related to the same statistical domain or the same grouping of statistical domains (e.g. Macroeconomic Statistics and Economic Accounts). The following text uses this GSIM terminology.