Message-ID: <1274581536.22859.1397909968975.JavaMail.confluence@ece-vmapps> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_22858_2074601004.1397909968975" ------=_Part_22858_2074601004.1397909968975 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html
*Partial translation by UN-ES= CAP and UNECE
1. The Joint UNECE / Eurostat / OECD Work Sessions on Statistical Metada= ta (METIS) have, over the last few years, been preparing a Common Metadata = Framework (CMF). Part C of this framework is entitled "Metadata and= the Statistical Cycle". This part refers to the phases of the statist= ical business process (also known as the statistical value chain or statist= ical cycle) and provides generic terms to describe them.
2. During a workshop to progress the development of Part C of the CMF, h= eld in Vienna in July 2007, the participants agreed that the model curre= ntly used by Statistics New Zealand, with the addition of 'Archive' and 'Ev= aluate' phases, would provide a good basis for developing a "Generic S= tatistical Business Process Model" (GSBPM). A first draft of the GSBPM= was presented by the UNECE Secretariat at the METIS Work Session in Luxemb= ourg in April 2008. Following two rounds of comments, another workshop w= as held in Lisbon in March 2009 to finalize the model. This current vers= ion of the model (version 4.0), was approved by the METIS Steering Group fo= r public release in April 2009. It is considered final at the time of relea= se, however, it is also expected that future updates may be necessary in th= e coming years, either to reflect experiences from implementing the model i= n practice, or due to the evolution of the nature of statistical production= . The reader is therefore invited to check the website www.unece.org/stats/gsbpm to be sure of having the latest= version.
3. The original intention was for the GSBPM to provide a basis for stati= stical organizations to agree on standard terminology to aid their discussi= ons on developing statistical metadata systems and processes. The GSBPM sho= uld therefore be seen as a flexible tool to describe and define the set of = business processes needed to produce official statistics. The use of this m= odel can also be envisaged in other separate, but often related contexts su= ch as harmonizing statistical computing infrastructures, facilitating the s= haring of software components, in the Statistical Data and Metadata eXchang= e (SDMX) User Guide for explaining the use of SDMX in a statistical organiz= ation, and providing a framework for process quality assessment and improve= ment. These other purposes for which the GSBPM can be used are elaborated f= urther in Section VI.
4. The GSBPM is intended to apply to all activities undertaken by produc= ers of official statistics, at both the national and international levels, = which result in data outputs. It is designed to be independent of the data = source, so it can be used for the description and quality assessment of pro= cesses based on surveys, censuses, administrative records, and other non-st= atistical or mixed sources.
5. Whilst the typical statistical business process includes the collecti= on and processing of raw data to produce statistical outputs, the GSBPM als= o applies to cases where existing data are revised or time-series are re-ca= lculated, either as a result of more or better source data, or a change in = methodology. In these cases, the input data are the previously published st= atistics, which are then processed and analyzed to produce revised outputs.= In such cases, it is likely that several sub-processes and possibly some p= hases (particularly the early ones) would be omitted.
6. As well as being applicable for processes which result in statistics,= the GSBPM can also be applied to the development and maintenance of statis= tical registers, where the inputs are similar to those for statistical prod= uction (though typically with a greater focus on administrative data), and = the outputs are typically frames or other data extractions, which are then = used as inputs to other processes.
7. Some elements of the GSBPM may be more relevant for one type of proce= ss than another, which may be influenced by the types of data sources used = or the outputs to be produced. Some elements will overlap with each other, = sometimes forming iterative loops. The GSBPM should therefore be applied an= d interpreted flexibly. It is not intended to be a rigid framework in which= all steps must be followed in a strict order, but rather a model that iden= tifies the steps in the statistical business process, and the inter-depende= ncies between them. Although the presentation follows the logical sequence = of steps in most statistical business processes, the elements of the model = may occur in different orders in different circumstances. In this way the G= SBPM aims to be sufficiently generic to be widely applicable, and to encour= age a standard view of the statistical business process, without becoming e= ither too restrictive or too abstract and theoretical.
8. In some cases it may be appropriate to group some of the elements of = the model. For example, phases one to three could be considered to correspo= nd to a single planning phase. In other cases, there may be a need to add a= nother, more detailed level to the structure presented below to separately = identify different components of the sub-processes. There may also be a req= uirement for a formal sign-off between phases, where the output from one ph= ase is certified as suitable as input for the next. This sort of formal app= roval is implicit in the model, but may be implemented in many different wa= ys depending on organizational requirements. The GSBPM should be seen as su= fficiently flexible to apply in all of the above scenarios.
9. The GSBPM comprises four levels:
10. Further levels of detail may be appropriate for certain statistical = business processes or in certain organizations, but these are unlikely to b= e sufficiently generic to be included in this model. A diagram showing the = phases (level 1) and sub-processes (level 2) is included in Section IV. The= sub-processes are described in detail in Section V.
11. According to process modelling theory, each sub-process should have = a number of clearly identified attributes, including:
However, these attributes are likely to differ, at least to some extent,= between statistical business processes, and between organizations. For thi= s reason these attributes are rarely mentioned specifically in this generic= model. It is, however, strongly recommended that they are identified when = applying the model to any specific statistical business process.
12. The GSBPM also recognizes several over-arching processes that apply = throughout the nine phases, and across statistical business processes. Thes= e can be grouped into two categories, those that have a statistical compone= nt, and those that are more general, and could apply to any sort of organiz= ation. The first group are considered to be more important in the context o= f this model, however the second group should also be recognized as they ha= ve (often indirect) impacts on several parts of the model.
13. Over-arching statistical processes include the following. The first = two are mostly closely related to the model, and are therefore shown in mod= el diagrams and are elaborated further in Section VI.
14. More general over-arching processes include:
15. The GSBPM has been developed drawing heavily on the Generic Business= Process Model developed by Statistics New Zealand, supplemented by input f= rom Statistics Canada on phase 8 (Archive), and other statistical organizat= ions with experience of statistical process modelling. However, a number of= other related models and standards exist for different purposes and in dif= ferent organizations, both at the national and international level. It woul= d not be practical to give details of all national models here, but the = main international models and standards are considered below, and related t= o the GSBPM. A diagram of this relationship is included at the end of this = section, and shows that the GSBPM can also be seen as the union of the othe= r models, since it reflects all of their components.
16. This set of guidelines and recommendations was published by the Unit=
ed Nations in 1999. It contains the model of the phases and processes of a =
survey processing system shown below. Although different in presentation to=
the GSBPM, the contents are largely the same.
Source: Information Systems Architecture for National and= International Statistical Offices - Guidelines and Recommendations, United= Nations, 1999, http://www.unece.org= /stats/documents/information_systems_architecture/1.e.pdf
17. The CVD project ("Cycle de Vie des Données" or &quo= t;Data Life Cycle") aims to fundamentally revise the way Eurostat trea= ts statistical data, by providing a coherent set of concepts, metadata stru= ctures and IT tools to be applied in all statistical domains. It also aims = to deliver significant benefits, such as economies of scale for the develop= ment and evolution of computing tools and the pursuit of important corporat= e objectives, such as quality orientation and easier mobility of domain man= agers. The CVD project centres on metadata as its basic integrating co= ncept, recognizing that metadata have a ubiquitous and overwhelming role in= the statistical production process. It also recognizes the GSBPM for model= ling statistical business processes. SDMX standards and guidelines play a k= ey role in the whole CVD approach, from data transmission to dissemination,= as well as for the exchange of data between the components of the producti= on system.
18. This model has been developed within the Data Documentation Initiati= ve (DDI), an international effort to establish a standard for technical doc= umentation describing social science data. The DDI Alliance comprises mainl= y academic and research institutions, hence the scope of the model below is= rather different to the GSBPM, which specifically applies to official stat= istical organizations. Despite this, the statistical business process appea= rs to be quite similar between official and non-official statistics produce= rs, as is clear from the high level of consistency between the models.
19. The main differences between the models are:
20. The SDMX (Statistical Data and Metadata eXchange) standards do no= t provide a model for statistical business processes in the same sense as t= he three cases above. However they do provide standard terminology for stat= istical data and metadata, as well as technical standards and content-orien= ted guidelines for data and metadata transfer, which can also be applied be= tween sub-processes within a statistical organization. The use of commonly = agreed data and metadata structures allows exchanged data and metadata to b= e mapped or translated to and from internal statistical systems. To facilit= ate this, the SDMX sponsors published a set of cross-domain concepts in Jan= uary 2009. The use of these common concepts may help to standardise and imp= rove data and metadata transmissions between different organisations, even = when models and systems are different. As far as metadata transmission is c= oncerned, the mapping between the metadata concepts used by different inter= national organizations, which is also present in the SDMX content-oriented = guidelines package, supports the idea of open exchange and sharing of metad= ata based on common terminology.
21. The relationship between the model and SDMX was discussed at the Apr= il 2008 meeting of the METIS group. The final report of that meeting (pa= ragraph 22) records a suggestion to incorporate the model into the Metadata= Common Vocabulary and/or SDMX as a cross-domain concept. SDMX aims, throug= h the Content-oriented Guidelines to harmonize data and metadata terminolog= y and quality, as well as providing transmission standards. The GSBPM, in o= ffering standard terminology for the different phases and sub-processes of = the statistical business process, would seem to complement, and fit logical= ly within the SDMX Content-oriented Guidelines.
22. This section considers each phase in turn, identifying the various s= ub-processes within that phase, and describing their contents. It therefore= covers levels 2 and 3 of the GSBPM.
23. This phase is triggered when a need for new statistics is identified= , or feedback about current statistics initiates a review. It determines wh= ether there is a presently unmet demand, externally and / or internally, fo= r the identified statistics and whether the statistical organization can pr= oduce them.
24. In this phase the organization:
25. This phase is broken down into six sub-processes. These are generall= y sequential, from left to right, but can also occur in parallel, and can b= e iterative. The sub-processes are:
This sub-process includes the initial investigation and identification o= f what statistics are needed and what is needed of the statistics. It also = includes consideration of practice amongst other (national and internationa= l) statistical organizations producing similar data, and in particular the = methods used by those organizations.
This sub-process focuses on consulting with the stakeholders and confirm= ing in detail the needs for the statistics. A good understanding of user ne= eds is required so that the statistical organization knows not only what it= is expected to deliver, but also when, how, and, perhaps most importantly,= why. For second and subsequent iterations of this phase, the main focus wi= ll be on determining whether previously identified needs have changed. This= detailed understanding of user needs is the critical part of this sub-proc= ess.
This sub-process identifies the statistical outputs that are required to= meet the user needs identified in sub-process 1.2 (Consult and confirm nee= d). It includes agreeing the suitability of the proposed outputs and their = quality measures with users.
This sub-process clarifies the required concepts to be measured by the b= usiness process from the point of view of the user. At this stage the conce= pts identified may not align with existing statistical standards. This alig= nment, and the choice or definition of the statistical concepts and variabl= es to be used, takes place in sub-process 2.2.
This sub-process checks whether current data sources could meet user req= uirements, and the conditions under which they would be available, includin= g any restrictions on their use. An assessment of possible alternatives wou= ld normally include research into potential administrative data sources and= their methodologies, to determine whether they would be suitable for use f= or statistical purposes. When existing sources have been assessed, a strate= gy for filling any remaining gaps in the data requirement is prepared. This= sub-process also includes a more general assessment of the legal framework= in which data would be collected and used, and may therefore identify prop= osals for changes to existing legislation or the introduction of a new lega= l framework.
This sub-process documents the findings of the other sub-processes in th= is phase in the form a business case to get approval to implement the new o= r modified statistical business process. Such a business case would typical= ly also include:
26. This phase describes the development and design activities, and any = associated practical research work needed to define the statistical outputs= , concepts, methodologies, collection instruments and operational processes= . For statistical outputs produced on a regular basis, this phase usually o= ccurs for the first iteration, and whenever improvement actions are identif= ied in phase 9 (Evaluate) of a previous iteration.
27. This phase is broken down into six sub-processes, which are generall= y sequential, from left to right, but can also occur in parallel, and can b= e iterative. These sub-processes are:
This sub-process contains the detailed design of the statistical outputs= to be produced, including the related development work and preparation of = the systems and tools used in phase 7 (Disseminate). Outputs should be desi= gned, wherever possible, to follow existing standards, so inputs to this pr= ocess may include metadata from similar or previous collections, internatio= nal standards, and information about practices in other statistical organiz= ations from sub-process 1.1 (Determine need for information).
This sub-process defines the statistical variables to be collected via t= he data collection instrument, as well as any other variables that will be = derived from them in sub-process 5.5 (Derive new variables and statistical = units), and any classifications that will be used. It is expected that exis= ting national and international standards will be followed wherever possibl= e. This sub-process may need to run in parallel with sub-process 2.3 (Desig= n data collection methodology), as the definition of the variables to be co= llected, and the choice of data collection instrument may be inter-dependen= t to some degree.Preparation of metadata descriptions of collected and deri= ved variables and classifications is a necessary precondition for subsequen= t phases.
This sub-process determines the most appropriate data collection method(= s) and instrument(s). The actual activities in this sub-process will vary a= ccording to the type of collection instruments required, which can include = computer assisted interviewing, paper questionnaires, administrative data i= nterfaces and data integration techniques. This sub-process includes the de= sign of questions and response templates (in conjunction with the variables= and classifications designed in sub-process 2.2 (Design variable descripti= ons)). It also includes the design of any formal agreements relating to dat= a supply, such as memoranda of understanding, and confirmation of the legal= basis for the data collection. This sub-process is enabled by tools such a= s question libraries (to facilitate the reuse of questions and related attr= ibutes), questionnaire tools (to enable the quick and easy compilation of q= uestions into formats suitable for cognitive testing) and agreement templat= es (to help standardize terms and conditions). This sub-process also includ= es the design of process-specific provider management systems.
This sub-process identifies and specifies the population of interest, de= fines a sampling frame (and, where necessary, the register from which it is= derived), and determines the most appropriate sampling criteria and method= ology (which could include complete enumeration). Common sources are admini= strative and statistical registers, censuses and sample surveys. This sub-p= rocess describes how these sources can be combined if needed. Analysis of w= hether the frame covers the target population should be performed. A sampli= ng plan should be made: The actual sample is created sub-process 4.1 (Selec= t sample), using the methodology, specified in this sub-process.
This sub-process designs the statistical processing methodology to be ap= plied during phase 5 (Process), and Phase 6 (Analyse). This can include spe= cification of routines for coding, editing, imputing, estimating, integrati= ng, validating and finalising data sets.
This sub-process determines the workflow from data collection to archivi= ng, taking an overview of all the processes required within the whole stati= stical production process, and ensuring that they fit together efficiently = with no gaps or redundancies. Various systems and databases are needed thro= ughout the process. A general principle is to reuse processes and technolog= y across many statistical business processes, so existing systems and datab= ases should be examined first, to determine whether they are fit for purpos= e for this specific process, then, if any gaps are identified, new solution= s should be designed. This sub-process also considers how staff will intera= ct with systems, and who will be responsible for what and when.
28. This phase builds and tests the production systems to the point wher= e they are ready for use in the "live" environment. For statistic= al outputs produced on a regular basis, this phase usually occurs for the f= irst iteration, and following a review or a change in methodology, rather t= han for every iteration. It is broken down into six sub-processes, which ar= e generally sequential, from left to right, but can also occur in parallel,= and can be iterative. These sub-processes are:
This sub-process describes the activities to build the collection instru= ments to be used during the phase 4 (Collect). The collection instrument is= generated or built based on the design specifications created during phase= 2 (Design). A collection may use one or more modes to receive the data, e.= g. personal or telephone interviews; paper, electronic or web questionnaire= s; SDMX hubs. Collection instruments may also be data extraction routines u= sed to gather data from existing statistical or administrative data sets. T= his sub-process also includes preparing and testing the contents and functi= oning of that instrument (e.g. testing the questions in a questionnaire). I= t is recommended to consider the direct connection of collection instrument= s to the statistical metadata system, so that metadata can be more easily c= aptured in the collection phase. Connection of metadata and data at the poi= nt of capture can save work in later phases. Capturing the metrics of data = collection (paradata) is also an important consideration in this sub-proces= s.
This sub-process describes the activities to build new and enhance exist= ing software components needed for the business process, as designed in Pha= se 2 (Design). Components may include dashboard functions and features, dat= a repositories, transformation tools, workflow framework components, provid= er and metadata management tools.
This sub-process configures the workflow, systems and transformations us= ed within the statistical business processes, from data collection, right t= hrough to archiving the final statistical outputs. It ensures that the work= flow specified in sub-process 2.6 (Processing system and workflow) works in= practice.
This sub-process is concerned with the testing of computer systems and t= ools. It includes technical testing and sign-off of new programmes and rout= ines, as well as confirmation that existing routines from other statistical= business processes are suitable for use in this case. Whilst part of this = activity concerning the testing of individual components could logically be= linked with sub-process 3.2 (Build or enhance process components), this su= b-process also includes testing of interactions between components, and ens= uring that the production system works as a coherent set of components.
This sub-process describes the activities to manage a field test or pilo= t of the statistical business process. Typically it includes a small-scale = data collection, to test collection instruments, followed by processing and= analysis of the collected data, to ensure the statistical business process= performs as expected. Following the pilot, it may be necessary to go back = to a previous step and make adjustments to instruments, systems or componen= ts. For a major statistical business process, e.g. a population census, the= re may be several iterations until the process is working satisfactorily.= p>
This sub-process includes the activities to put the process, including w= orkflow systems, modified and newly-built components into production ready = for use by business areas. The activities include:
29. This phase collects all necessary data, using different collection m= odes (including extractions from administrative and statistical registers a= nd databases), and loads them into the appropriate data environment. It doe= s not include any transformations of collected data, as these are all done = in phase 5 (Process). For statistical outputs produced regularly, this phas= e occurs in each iteration.
30. The Collect phase is broken down into four sub-processes, which are = generally sequential, from left to right, but can also occur in parallel, a= nd can be iterative. These sub-processes are:
This sub-process establishes the frame and selects the sample for this i= teration of the collection, as specified in sub-process 2.4 (Design frame a= nd sample methodology). It also includes the coordination of samples betwee= n instances of the same statistical business process (for example to manage= overlap or rotation), and between different processes using a common frame= or register (for example to manage overlap or to spread response burden). = Quality assurance, approval and maintenance of the frame and the selected s= ample are also undertaken in this sub-process, though maintenance of underl= ying registers, from which frames for several statistical business processe= s are drawn, is treated as a separate business process. The sampling aspect= of this sub-process is not usually relevant for processes based entirely o= n the use of pre-existing data sources (e.g. administrative data) as such p= rocesses generally create frames from the available data and then follow a = census approach.
This sub-process ensures that the people, processes and technology are r= eady to collect data, in all modes as designed. It takes place over a perio= d of time, as it includes the strategy, planning and training activities in= preparation for the specific instance of the statistical business process.= Where the process is repeated regularly, some (or all) of these activities= may not be explicitly required for each iteration. For one-off and new pro= cesses, these activities can be lengthy. This sub-process includes:
This sub-process is where the collection is implemented, with the differ= ent collection instruments being used to collect the data. It includes the = initial contact with providers and any subsequent follow-up or reminder act= ions. It records when and how providers were contacted, and whether they ha= ve responded. This sub-process also includes the management of the provider= s involved in the current collection, ensuring that the relationship betwee= n the statistical organization and data providers remains positive, and rec= ording and responding to comments, queries and complaints. For administrati= ve data, this process is brief: the provider is either contacted to send th= e data, or sends it as scheduled. When the collection meets its targets (us= ually based on response rates) the collection is closed and a report on the= collection is produced.
This sub-process includes loading the collected data and metadata into a= suitable electronic environment for further processing in phase 5 (Process= ). It may include automatic data take-on, for example using optical charact= er recognition tools to extract data from paper questionnaires, or converti= ng the formats of data files received from other organizations. In cases wh= ere there is a physical data collection instrument, such as a paper questio= nnaire, which is not needed for further processing, this sub-process manage= s the archiving of that material in conformance with the principles establi= shed in phase 8 (Archive).
31. This phase describes the cleaning of data records and their preparat= ion for analysis. It is made up of sub-processes that check, clean, and tra= nsform the collected data, and may be repeated several times. For statistic= al outputs produced regularly, this phase occurs in each iteration. The sub= -processes in this phase can apply to data from both statistical and non-st= atistical sources (with the possible exception of sub-process 5.6 (Calculat= e weights), which is usually specific to survey data).
32. The "Process" and "Analyse" phases can be iterat= ive and parallel. Analysis can reveal a broader understanding of the data, = which might make it apparent that additional processing is needed. Activiti= es within the "Process" and "Analyse" phases may commen= ce before the "Collect" phase is completed. This enables the comp= ilation of provisional results where timeliness is an important concern for= users, and increases the time available for analysis. The key difference b= etween these phases is that "Process" concerns transformations of= microdata, whereas "Analyse" concerns the further treatment of s= tatistical aggregates.
33. This phase is broken down into eight sub-processes, which may be seq= uential, from left to right, but can also occur in parallel, and can be ite= rative. These sub-processes are:
This sub-process integrates data from one or more sources. The input dat= a can be from a mixture of external or internal data sources, and a variety= of collection modes, including extracts of administrative data. The result= is a harmonized data set. Data integration typically includes:
Data integration may take place at any point in this phase, before or af= ter any of the other sub-processes. There may also be several instances of = data integration in any statistical business process. Following integration= , depending on data protection requirements, data may be anonymized, that i= s stripped of identifiers such as name and address, to help to protect conf= identiality.
This sub-process classifies and codes the input data. For example automa= tic (or clerical) coding routines may assign numeric codes to text response= s according to a pre-determined classification scheme.
This sub-process applies to collected micro-data, and looks at each reco= rd to try to identify (and where necessary correct) potential problems, err= ors and discrepancies such as outliers, item non-response and miscoding. It= can also be referred to as input data validation. It may be run iterativel= y, validating data against predefined edit rules, usually in a set order. I= t may apply automatic edits, or raise alerts for manual inspection and corr= ection of the data. Reviewing, validating and editing can apply to unit rec= ords both from surveys and administrative sources, before and after integra= tion. In certain cases, imputation (sub-process 5.4) may be used as a form = of editing.
Where data are missing or unreliable, estimates may be imputed, often us= ing a rule-based approach. Specific steps typically include:
This sub-process derives (values for) variables and statistical units th= at are not explicitly provided in the collection, but are needed to deliver= the required outputs. It derives new variables by applying arithmetic form= ulae to one or more of the variables that are already present in the datase= t. This may need to be iterative, as some derived variables may themselves = be based on other derived variables. It is therefore important to ensure th= at variables are derived in the correct order. New statistical units may be= derived by aggregating or splitting data for collection units, or by vario= us other estimation methods. Examples include deriving households where the= collection units are persons, or enterprises where the collection units ar= e legal units.
This sub process creates weights for unit data records according to the = methodology created in sub-process 2.5 (Design statistical processing metho= dology). These weights can be used to "gross-up" sample survey re= sults to make them representative of the target population, or to adjust fo= r non-response in total enumerations.
This sub process creates aggregate data and population totals from micro= -data. It includes summing data for records sharing certain characteristics= , determining measures of average and dispersion, and applying weights from= sub-process 5.6 to sample survey data to derive population totals.
This sub-process brings together the results of the other sub-processes = in this phase and results in a data file (usually of macro-data), which is = used as the input to phase 6 (Analyse). Sometimes this may be an intermedia= te rather than a final file, particularly for business processes where ther= e are strong time pressures, and a requirement to produce both preliminary = and final estimates.
34. In this phase, statistics are produced, examined in detail and made = ready for dissemination. This phase includes the sub-processes and activiti= es that enable statistical analysts to understand the statistics produced. = For statistical outputs produced regularly, this phase occurs in every iter= ation. The Analyse phase and sub-processes are generic for all statistical = outputs, regardless of how the data were sourced.
35. The Analyse phase is broken down into five sub-processes, which are = generally sequential, from left to right, but can also occur in parallel, a= nd can be iterative. The sub-processes are:
This sub-process is where the data collected are transformed into statis= tical outputs. It includes the production of additional measurements such a= s indices, trends or seasonally adjusted series, as well as the recording o= f quality characteristics.
This sub-process is where statisticians validate the quality of the outp= uts produced, in accordance with a general quality framework and with expec= tations. This sub-process also includes activities involved with the gather= ing of intelligence, with the cumulative effect of building up a body of kn= owledge about a specific statistical domain. This knowledge is then applied= to the current collection, in the current environment, to identify any div= ergence from expectations and to allow informed analyses. Validation activi= ties can include:
This sub-process is where the in-depth understanding of the outputs is g= ained by statisticians. They use that understanding to scrutinize and expla= in the statistics produced for this cycle by assessing how well the statist= ics reflect their initial expectations, viewing the statistics from all per= spectives using different tools and media, and carrying out in-depth statis= tical analyses.
This sub-process ensures that the data (and metadata) to be disseminated= do not breach the appropriate rules on confidentiality. This may include c= hecks for primary and secondary disclosure, as well as the application of d= ata suppression or perturbation techniques.
This sub-process ensures the statistics and associated information are f= it for purpose and reach the required quality level, and are thus ready for= use. It includes:
36. This phase manages the release of the statistical products to custom= ers. For statistical outputs produced regularly, this phase occurs in each = iteration. It is made up of five sub-processes, which are generally sequent= ial, from left to right, but can also occur in parallel, and can be iterati= ve. These sub-processes are:
This sub-process manages the update of systems where data and metadata a= re stored for dissemination purposes, including:
Note: formatting, loading and linking of metadata should preferably most= ly take place in earlier phases, but this sub-process includes a check that= all of the necessary metadata are in place ready for dissemination.
This sub-process produces the products, as previously designed (in sub-p= rocess 2.1), to meet user needs. The products can take many forms including= printed publications, press releases and web sites. Typical steps include:=
This sub-process ensures that all elements for the release are in place = including managing the timing of the release. It includes briefings for spe= cific groups such as the press or ministers, as well as the arrangements fo= r any pre-release embargoes. It also includes the provision of products to = subscribers.
Whilst marketing in general can be considered to be an over-arching proc= ess, this sub-process concerns the active promotion of the statistical prod= ucts produced in a specific statistical business process, to help them reac= h the widest possible audience. It includes the use of customer relationshi= p management tools, to better target potential users of the products, as we= ll as the use of tools including web sites, wikis and blogs to facilitate t= he process of communicating statistical information to users.
This sub-process ensures that customer queries are recorded, and that re= sponses are provided within agreed deadlines. These queries should be regul= arly reviewed to provide an input to the over-arching quality management pr= ocess, as they can indicate new or changing user needs.
37. This phase manages the archiving and disposal of statistical data an= d metadata. Given the reduced costs of data storage, it is possible that th= e archiving strategy adopted by a statistical organization does not include= provision for disposal, so the final sub-process may not be relevant for a= ll statistical business processes. In other cases, disposal may be limited = to intermediate files from previous iterations, rather than disseminated da= ta.
38. For statistical outputs produced regularly, archiving occurs in each= iteration, however defining the archiving rules is likely to occur less re= gularly. This phase is made up of four sub-processes, which are generally s= equential, from left to right, but can also occur in parallel, and can be i= terative. These sub-processes are:
This sub-process is where the archiving rules for the statistical data a= nd metadata resulting from a statistical business process are determined. T= he requirement to archive intermediate outputs such as the sample file, the= raw data from the collect phase, and the results of the various stages of = the process and analyse phases should also be considered. The archive rules= for a specific statistical business process may be fully or partly depende= nt on the more general archiving policy of the statistical organization, or= , for national organizations, on standards applied across the government se= ctor. The rules should include consideration of the medium and location of = the archive, as well as the requirement for keeping duplicate copies. They = should also consider the conditions (if any) under which data and metadata = should be disposed of. (Note - this sub-process is logically strongly linke= d to Phase 2 - Design, at least for the first iteration of a statistical bu= siness process).
This sub-process concerns the management of one or more archive reposito= ries. These may be databases, or may be physical locations where copies of = data or metadata are stored. It includes:
This sub-process may cover a specific statistical business process or a = group of processes, depending on the degree of standardization within the o= rganization. Ultimately it may even be considered to be an over-arching pro= cess if organization-wide standards are put in place.
This sub-process is where the data and metadata from a specific statisti= cal business process are archived. It includes:
This sub-process is where the data and metadata from a specific statisti= cal business process are disposed of. It includes;
39. This phase manages the evaluation of a specific instance of a statis= tical business process, as opposed to the more general over-arching process= of statistical quality management described in Section VI. It logically ta= kes place at the end of the instance of the process, but relies on inputs g= athered throughout the different phases. For statistical outputs produced r= egularly, evaluation should, at least in theory occur for each iteration, d= etermining whether future iterations should take place, and if so, whether = any improvements should be implemented. However, in some cases, particularl= y for regular and well established statistical business processes, evaluati= on may not be formally carried out for each iteration. In such cases, this = phase can be seen as providing the decision as to whether the next iteratio= n should start from phase 1 (Specify needs) or from some later phase (often= phase 4 (Collect)).
40. This phase is made up of three sub-processes, which are generally se= quential, from left to right, but which can overlap to some extent in pract= ice. These sub-processes are:
Evaluation material can be produced in any other phase or sub-process. I= t may take many forms, including feedback from users, process metadata, sys= tem metrics and staff suggestions. Reports of progress against an action pl= an agreed during a previous iteration may also form an input to evaluations= of subsequent iterations. This sub-process gathers all of these inputs, an= d makes them available for the person or team producing the evaluation.
This sub-process analyzes the evaluation inputs and synthesizes them int= o an evaluation report. The resulting report should note any quality issues= specific to this iteration of the statistical business process, and should= make recommendations for changes if appropriate. These recommendations can= cover changes to any phase or sub-process for future iterations of the pro= cess, or can suggest that the process is not repeated.
This sub-process brings together the necessary decision-making power to = form and agree an action plan based on the evaluation report. It should als= o include consideration of a mechanism for monitoring the impact of those a= ctions, which may, in turn, provide an input to evaluations of future itera= tions of the process.
41. This process is present throughout the model. It is closely linked t= o Phase 9 (Evaluate), which has the specific role of evaluating individual = instances of a statistical business process. The over-arching quality manag= ement process, however, has both a deeper and broader scope. As well as eva= luating iterations of a process, it is also necessary to evaluate separate = phases and sub-processes, ideally each time they are applied, but at least = according to an agreed schedule. Metadata generated by the different sub-pr= ocesses themselves are also of interest as an input for process quality man= agement. These evaluations can apply within a specific process, or across s= everal processes that use common components.
42. Quality management also involves the evaluation of groups of statist= ical business processes, and can therefore identify potential duplication o= r gaps. All evaluations should result in feedback, which should be used to = improve the relevant process, phase or sub-process, creating a quality loop= .
43. Quality management can take several forms, including:
44. Evaluation will normally take place within an organization-specific = quality framework, and may therefore take different forms and deliver diffe= rent results within different organizations. There is, however, general agr= eement amongst statistical organizations that quality should be defined acc= ording to the ISO 9000-2005 standard: "The degree to which a set of in= herent characteristics fulfils requirements."
45. Quality is a therefore multi-faceted, user-driven concept. The dimen= sions of quality that are considered most important depend on user perspect= ives, needs and priorities, which vary between processes and across groups = of users. Several statistical organizations have developed lists of quality= dimensions, which, for international organizations, are being harmonized u= nder the leadership of the Committee for the Coordination of Statistical Ac= tivities (CCSA).
46. The current multiplicity of quality frameworks enhances the importan= ce of the benchmarking and peer review approaches to evaluation, and whilst= these approaches are unlikely to be feasible for every iteration of every = part of every statistical business process, they should be used in a system= atic way according to a pre-determined schedule that allows for the review = of all main parts of the process within a specified time period.
47. Good metadata management is essential for the efficient operation of= statistical business processes. Metadata are present in every phase, eithe= r created or carried forward from a previous phase. In the context of this = model, the emphasis of the over-arching process of metadata management is o= n the creation and use of statistical metadata, though metadata on the diff= erent sub-processes themselves are also of interest, including as an input = for quality management. The key challenge is to ensure that these metadata = are captured as early as possible, and stored and transferred from phase to= phase alongside the data they refer to. Metadata management strategy and s= ystems are therefore vital to the operation of this model.
48. Part A of the Common Metadata Framework identifies the following= sixteen core principles for metadata management, all of which are intended= to be covered in the over-arching Metadata Management process, and taken i= nto the consideration when preparing the statistical metadata system (SMS) = vision and global architecture, and when implementing the SMS. The principl= es can be presented in the following groups:
i. Statistical Business Process Model=
: Manage metadata with a focus on the overall statistical business=
i. Registration: Ensure the =
registration process (workflow) associated with each metadata element is we=
ll documented so there is clear identification of ownership, approval statu=
s, date of operation, etc.
Relationship to Statistical Cycle / P= rocesses
i. Integrity: Make metadata-=
related work an integral part of business processes across the organization=
i. Identify users: Ensure th=
at users are clearly identified for all metadata processes, and that all me=
tadata capturing will create value for them.
As stated in the section on the purpose of the GSBPM, the original aim o= f the work to develop this model was that it should provide a basis for sta= tistical organizations to agree on standard terminology to aid their discus= sions on developing statistical metadata systems and processes. However, as= the model has developed, it has become increasingly apparent that it can b= e used for other purposes. This has been confirmed by Statistics New Zealan= d, who have either applied, or plan to apply their national version of the = model in several different areas. The list below aims to highlight potentia= l rather than recommended uses, and to inspire further ideas on how the GSB= PM can be used in practice.
1. Harmonizing statistical computing architectures - The GSBPM can be se= en as a model for an operational view of statistical computing architecture= . It identifies the key components of the statistical business process, pro= motes standard terminology and standard ways of working across statistical = business processes. The potential of the GSBPM as a model for statistical c= omputing architectures will be evaluated further in the proposed European U= nion "ESSNet" project on a Common Reference Architecture duri= ng 2009.
2. Facilitating the sharing of statistical software - Linked to the poin= t above, the GSBPM defines the components of statistical processes in a way= that not only encourages the sharing of software tools between statistical= business processes, but also facilitates sharing between different statist= ical organizations that apply the model. It therefore provides an input to = the "Sharing Advisory Board", being created under the auspice of = the UNECE / Eurostat / OECD Work Sessions on the Management of Statistical = Information Systems.
3. Providing a basis for explaining the use of SDMX in a statistical org= anization in the Statistical Data and Metadata eXchange (SDMX) User Guide[1= 4]. Chapter A2 of this user guide explores how SDMX applies to statistical = work in the context of a business process model.
4. Providing a framework for process quality assessment and improvement = - If a benchmarking approach to process quality assessment is to be success= ful, it is necessary to standardize processes as much as possible. The GSBP= M provides a mechanism to facilitate this.
5. Better integrating work on statistical metadata and quality - Linked = to the previous point, the common framework provided by the GSBPM can help = to integrate international work on statistical metadata with that on data q= uality by providing a common framework and common terminology to describe t= he statistical business process.
6. Providing the underlying model for methodological standards framework= s - Methodological standards can be linked to the phase(s) or sub-process(e= s) they relate to and can then be classified and stored in a structure base= d on the GSBPM.
7. Providing a structure for storage of documents - As well as a framewo= rk for methodological standards, the GSBPM can also provide a structure for= organizing and storing other documents within an organization, in conjunct= ion with document management software tools. It can provide a basic documen= t storage classification that allows clear links between documents and the = parts of the statistical business process they relate to.
8. Providing a framework for building organizational capability - The GS= BPM can be used to develop a framework assess the knowledge and capability = that already exists within an organization, and to identify the gaps that n= eed to be filled to improve operational efficiency.
9. Providing an input to high-level corporate work planning - The nation= al business process model developed by Statistics New Zealand has been used= as an input when preparing a high-level survey programme.
10. Developing a business process model repository - Statistics New Zeal= and has developed a database to store process modelling outputs and allow t= hem to be linked to their statistical business process model. They also pla= n to develop a Business Process Modelling Community of Practice - i.e. a re= gular forum to build knowledge of process modelling, to promote the their b= usiness process model and increase understanding of it, and to discuss proc= ess modelling and models as enablers for process improvement.
11. Measuring operational costs - The GSBPM could conceivably be used as= a basis for measuring the costs of different parts of the statistical busi= ness process. This, in turn, could help target development work to improve = the efficiency of the parts of the process that are most costly.
12. Measuring system performance - Related to the point above on costs, = the GSBPM can also be used to identify components that are not performing e= fficiently, that are duplicating each other unnecessarily, or that require = replacing. Similarly it can identify gaps for which new components should b= e developed.
Note - this short glossary just covers some of the key terms and abbrevi= ations used in this paper. For a more comprehensive glossary of terms relat= ed to the statistical production process see the SDMX Metadata Common Vocab= ulary - h= ttp://sdmx.org/?page_id=3D11.
CMF - Common Metadata Framework: The need for a common = metadata framework emerged from discussions in international forums. The jo= int UNECE / Eurostat / OECD Group on Statistical Metadata (METIS) is coordi= nating the work to develop this framework. The aim is to organize the vast = pool of information about statistical metadata into a common framework for = use by national and international statistical organizations. See http://www.unece.= org/stats/cmf/.
Collection / Data collection - A systematic process of =
gathering data for official statistics.
(Source SDMX Metadata Common = Vocabulary, 2009)
For the purposes of this model, "collection&qu= ot; therefore includes obtaining data from administrative sources, as well = as the more traditional data collection through surveys and censuses.
GSBPM - Generic Statistical Business Process Model: A f= lexible tool to describe and define the set of business processes needed to= produce official statistics.
METIS - The joint UNECE / Eurostat / OECD Group on Stat= istical Metadata.
Over-arching process - Processes that apply throughout = and across statistical business processes. They can be grouped into two cat= egories, those that have a statistical component, and those that are more g= eneral, and could apply to any sort of organization
SDMX - A set of technical standards and content-oriente=
d guidelines, together with an IT architecture and tools, to be used for th=
e efficient exchange and sharing of statistical data and metadata.
(S= ource SDMX Metadata Common Vocabulary, 2009)
Statistical business process - The complete set of sub-=
processes needed to support statistical production.
(Source SDMX Meta= data Common Vocabulary, 2009)
Statistical metadata system - A data processing system =
that uses, stores and produces statistical metadata.
(Source SDMX Met= adata Common Vocabulary, 2009)
1. Prepared by Steven Vale (firstname.lastname@example.org), based on previous wo=
rk by Statistics New Zealand (for the first seven phases) and Statistics Ca=
nada (for the Archive phase), with considerable input and feedback from the=
members of the METIS group.
2. See: http://www.unece.org/stats/cmf/
3. The papers from this Workshop are available at: http:= //www.unece.org/stats/documents/2007.07.metis.htm
4. See: http://www.unece.org/stats/documents/ece/ces/ge.40/20= 08/wp.17.e.pdf
5. See: http://www.unece.org/stats= /documents/2009.03.metis.htm
6. Though examples from Australia an= d Norway can be found at the following addresses:
7. See http://ww=
8. http://www.unece.org/s= tats/documents/ece/ces/ge.40/2008/zip.9.e.pdf
9. ISO 9000:2005, Q= uality management systems - Fundamentals and vocabulary. International Orga= nization for Standardization
10. Current organization-specific qualit= y frameworks, containing lists of dimensions, exist for:
OECD: http= ://www.oecd.org/dataoecd/26/38/21687665.pdf
Eurostat: http://epp.eurostat.ec.e= uropa.eu/portal/page/portal/quality/documents/ess%20quality%20definition.pd= f
11. See: http://www.unece.org/stats/cmf/PartA.html
12. http= ://circa.europa.eu/Public/irc/dsis/itsteer/library?l=3D/directors_13-14/pro= posal_essnetdoc/_EN_1.0_&a=3Dd
13. As proposed in the report = of the MSIS Task Force on Software Sharing: ht= tp://www.unece.org/stats/documents/ece/ces/ge.50/2008/crp.2.e.doc
= 14. See: http://sdmx.org/index.php?page_id=3D38, 2009 version