UNECE, Geneva, 2012
Note - You can get an automatic translation in one of over 50 languages using the Google Translate option at the top of the screen
These Principles and Guidelines were produced by the Conference of European Statisticians Sharing Advisory Board and the UNECE Secretariat, with input and peer review from Participants in the 2011 joint UNECE / Eurostat / OECD meeting on Management of Statistical Information Systems (MSIS). This version was published in February 2012.
1. The move towards global standardisation in models such as the Generic Statistical Business Process Model (GSBPM) and the Generic Statistical Information Model (GSIM), combined with progress on the development of standards for exchanging data and metadata, has drawn the attention of statistical software providers to the possibilities of exchanging software internationally. This has prompted the question of how we can actively incorporate this possibility into the development of software from its inception.
4. The purposes of these guidelines are to provide a review some of the common best practices to develop internationalised software, to highlight some of the resources and common standards in the area and to focus on topics that may specifically apply to statistical software such as the treatment of numbers, formatting, dates, formula treatment, and notation.
Internationalisation and Localisation
5. The process of developing software for use in multiple languages, cultures and countries can be viewed as two separate processes, those of internationalisation (I18N) and localisation (L10N). Internationalisation deals with the process of designing software so that it can accommodate changing environments without changes to the code. Localisation deals with the more specific case of designing for example for a country, language and or region. So while internationalisation concerns itself with providing the system to allow the use of more than one language, localisation would instead involve the application of this such as translation etc.
6. A locale is a collection of user preferences applicable to a specific language country and or culture. Locales identifiers usually consist of a language, often combined with a country and occasionally with a further parameter specifying the code set and modifier; for example en_GB is the locale identified for English for the UK. This means a differentiation can be made between English (US) and English (UK). The locale consists of a number of elements including for example the name and ISO identifier of the language, the currency, sorting requirements, numeric preferences such as thousands separators , the calendars to be used and other elements such as text direction (left-to-right or right-to-left, horizontal or vertical) etc.
Principle 1: Software should be multilingual by design
16. A guide to the amount of space required for translated text is reproduced below from Oracle's "Understanding Application Development Guidelines"
Number of English Characters
Additional Space Required
400 percent or 4 characters
More than 70 characters
61. Formatting elements differ between languages, for example bold is commonly used for emphasis in many scripts but for example in Kanji and other scripts it may cause problems with legibility. Other emphasis techniques such as underlining may be used instead.
62. The importance of providing software that can be used in more than one language or culture has increased. The last decade the growth of the internet has fuelled the sharing of software. To gain from future developments within the statistical community software should be developed with an expectation that it will be used in more than one environment.
64. The guidelines above have been drawn from a review of documentation of both practical examples of internationalising software and research. It is recognised that implementing each guideline in every application may not be feasible but they are intended to provide an overview of best practice to work towards.
- IBM have also published a comprehensive guide to developing international software that provides an in-depth review of issues involved in developing software for multi-culture use available at http://www-01.ibm.com/software/globalization/guidelines/
Because the comparison rules for non-Unicode and Unicode data are different, when you use a SQL collation you might see different results for comparisons of the same characters, depending on the underlying data type.