Governments in many countries are developing policies to make all sorts of official data more easily available to the public.The growing political focus on "open data" raises many challenges for official statistics, cutting across all domains, both in terms of inputs and outputs.
The growing availability of open data as an input raises strategic and methodological issues around the acquisition, management, processing and linking of new, and often large data sets. Other strategic issues include how to respond to the threat of competition from private sector and research organisations getting access to micro-data that in some cases have not even been available to statistical organisations until now.
On the output side, there is growing pressure to release more statistical data, including micro-data, as open data. Services that once generated revenue for statistical organisations are now required to be provided for free. New policies and methodologies are needed to manage the release of open data, whilst ensuring compliance with the Fundamental Principles of Official Statistics, particularly with respect to data confidentiality.
In response to these issues, the CES Bureau intends to conduct an in-depth review into the impact of open data on official statistics during 2012/2013.
To stimulate discussion on this topic, MSIS 2012 included a lively panel discussion on open data.
Public Sector Information (PSI) in Europe: http://epsiplatform.eu/
OECD recommendations about PSI principles: provide policy guidelines designed to improve access and increase use of public sector information through greater transparency, enhanced competition and more competitive pricing. Adopted by the OECD Council in April 2008:
- Openness. Maximize the availability of public sector information for use and re-use - openness as the default rule.
- Access and transparent conditions for re-use. In principle all accessible information would be open to re-use by all.
- Asset lists. Strengthening awareness of what public sector information is available for access and re-use.
- Quality. Ensuring methodical practices to enhance data quality through cooperation of various government bodies
- Integrity. Protect information from unauthorized modification or from denial of authorized access
- New technologies. Storing technologies, open formats, multiple languages, technological obsolescence and long term preservation
- Copyright. Intellectual property rights should be respected, exercising copyright in ways that facilitate re-use. Public sector information must be copyright-free.
- Pricing. PSI provided free of charge, or information pricing transparent as far as possible
- Competition. PSI open to all possible users and re-users on non-exclusive terms
- Transparent Redress mechanisms
- Facilitate public private partnerships
- International access and use. Support international co-operation for commercial re-use and non-commercial use
- Best practices. Encouraging the wide sharing of best practices and exchange of information on implementation, training, copyright and monitoring
The first act of Obama as U.S. president (January 2009) was the "Memorandum on Transparency and Open Government" which begins:
“My Administration is committed to creating an unprecedented level of openness in Government. We will work together to ensure the public trust and establish a system of transparency, public participation, and collaboration. Openness will strengthen our democracy and promote efficiency and effectiveness in Government”
In December 2009, the Open Government Directive to implement the Memorandum: the Directive set precise limits within which all governments should put into practice the principles of the Memorandum. Every public administration was obliged to disseminate as open data all data pertaining to them. Also in 2009 the first version of data.gov was released.
Tim Berners-Lee: in a 2006 page (Linked Data) TBL defined open data as part of a continuum of web publishing activities associated with gold stars, like the ones you got in school.
Here they are:
★ make your stuff available on the web (whatever format see here) but with an open licence, to be Open Data
★★ make it available as structured data (e.g. excel instead of image scan of a table)
★★★ non-proprietary format (e.g. csv instead of excel)
★★★★ use URLs to identify things, so that people can point at your stuff
★★★★★ link your data to other people's data to provide context
Software for Open Data:
- CKAN (Comprehensive Knowledge Archive Network) is open-source "data hub" software designed to make it easier to find, share, reuse and collaboratively develop open data and content
Used in UK and many other sites, CKAN is sponsored by Open Knowledge Foundation (OKFN)
- Data.gov code released as OpenSource (it's a modified Drupal version) used also for India → Open Government Platform (in January 2013 data.gov announced that it will move to CKAN software)
The Open Knowledge Foundation implemented "The Open Data Handbook" that introduces you to the legal, social and technical aspects of open data. It can be used by anyone but is especially useful for those working with government data. It discusses the why, what and how of open data – why to go open, what open is, and the how to do open.
One of the best websites presenting Open Data related to Statistics is the World Bank Data site.
In the website you can find:
- more then 8.000 time series that users can select by country, by indicators or by "databases"; databases are a kind of data hypercubes in which users can select dimensions, filters, countries and time of interest - users can also store new query and new visualizations that remain visible to other users
- more than 850 dataset of Financial data, explaining where the World Bank disbursed money, what global funds are managed by the WB etc.; users can explore raw data about the World Bank's finances — slice and dice datasets; visualize data; share it with other site users or through social networks
- a database providing detailed information on over 11,000 WB lending projects in over 100 countries from 1947 onwards
- more than 700 surveys microdata held in catalogs maintained by the World Bank and a number of contributing external catalog
In the section "The World at a Glance" users find Key development indicators from the WB expressed in tabular and graphical schema.
The website stores also an "Open Government Data Toolkit" to help countries in developing their Open Data strategy, suggesting policies and scenarios for different situations
Recently WB also released a couple of app to show data on smartphone platforms.
If it can’t be spidered or indexed, it doesn’t exist
If it isn’t available in open and machine readable format, it can’t engage
If a legal framework doesn’t allow it to be repurposed, it doesn’t empower
The first point means: “Can I find it?” If search engines can't find it, it doesn't exist for most citizens.
After I've found it, second point notes that I need to be able to play with the data. I.e. I need to be able to download it in a useful format. (Technical Openness)
But here we want to focus on third point what we might call “Legal Openness”.
For licensing purposes, when we talk about Open Data, we must distinguish:
Data (the collection)
Contents (individual items, part of the collection, rows/columns)
Structure (schema, metadata, Data Definition)
First of all we must verify the compliance of licenses with the principles of Open Data. Here we use the principles listed by Open Knowledge Foundation (okfn2), a foundation dedicated to promoting the creation, sharing and application of Open Knowledge. At the page “What is open?”3 you can find the following list of principles that have impact on licenses.
1. Access - Data must be available as a whole and at a reasonable reproduction cost, preferably downloading via the Internet without charge. The work must be available in a convenient and modifiable form.
2. Redistribution - The license shall not restrict any party from selling or giving away the work either on its own or as part of a package made from derived work. License without royalty or other fee.
3. Reuse - The license must allow for modifications and must allow them to be distributed under the terms of the original work.
4. Absence of Technological Restriction - The work must be provided in such a form that there are no technological obstacles to the performance of the above activities (eg. open data format)
5. Attribution - The license may require the attribution of the contributors and creators to the work.
6. Integrity - The license may require as a condition for the work being distributed in modified form that the resulting work carry a different name or version number from the original work.
7. No Discrimination Against Persons or Groups - The license must not discriminate against any person or group of persons.
8. No Discrimination Against Fields of Endeavor - The license must not restrict anyone from use the work in a specific field of endeavor. For example, it may not restrict the work from being used in a business, or for genetic research
9. Distribution of License - The rights attached to the work must apply to all to whom the program is redistributed without the need for execution of an additional license by those parties.
10. License Must Not Be Specific to a Package - The rights attached to the work must not depend
on the work being part of a particular package. If the work is extracted from that package, all parties redistributed should have the same rights as the original package.
11. License Must Not Restrict the Distribution of other Works - The license must not place restrictions on other works that are distributed along with the licensed work.
It can be noted that many principles are similar to the Open Software Definition1, a list of principles prepared by Open Software Initiative (OSI). OSI is a company founded by Eric Raymond and is the community-recognized body for reviewing and approving open source software licenses.
Two of the main concepts that should be managed by Open Data licenses data is the attribution of the work to the author (so-called clause "BY", see Principle 5) and the possibility of requiring that the derived work is distributed under the same license (so-called clause "Share-Alike" SA). Open Data licenses are distinguished from each other infact precisely because for the compliance or less to the above clauses.
Here we compare three licenses coming from the OKFN's project “Open Data Commons”2 and three licenses coming from the Creative Commons organization3, a nonprofit organization that enables the sharing of knowledge through free legal tools.
In the following table you find three “Open Data Commons” licenses suitable for Open Data:
Attribution- ShareAlike for data
Attribution for data
Public Domain: all rights waived
CC licenses are in a sense parallel to the ODC licenses, even if they were born to manage content more than data.
Attribution- ShareAlike for content
Attribution for content
Public Domain: all rights waived
Recently many Statistical Organizations started using CC licenses and mostly CC0 to release Open Data: CC0 grants every right to users and in Open Data it's important to maximize the dissemination of data, prompting more app developers to use the data guaranteeing the possibility to sell software or services related to them.
In addition to license released by international organisations, there are also other national licenses, that try to adapt to specific national laws. Below a first list:
Statistical Organizations experiences (to be completed)
- Datasets released in The Data Hub:* Eurostat datasets like NUTS
- Datasets released with Google Public Data Explorer: data from World Bank, Eurostat, US Census, US Labour, OECD
- Data released with open license: Istat CC-BY
6 CC release free, easy-to-use copyright licenses changing the copyright terms from the default of “all rights reserved” to “some rights reserved”; see http://creativecommons.org/