KEY-from-IMAGE : A workaround for a not so ‘intelligent’ ICR (Intelligent Character Recognition)                                                  functionality of a Document Imaging Software

Topic (iii) : Innovation

Author/Developer : Gene V. Lorica ( glorica@census.gov.ph , g.lorica@yahoo.com.ph )

Abstract

Based on PNSO experience in Data Capture using Imaging Technology, it was found out that:

-           accuracy of extracted hand-written data from survey/census forms using Imaging software is considered as not reliable and not acceptable

-           but accuracy of extracted data from OMR fields ranges from 99% to 100%, hence it can be considered as acceptable for surveys/censuses

 

Advantages of automated data extraction from survey/census forms using Imaging Software compared to the conventional manual keying of data

-           Data capture is much faster

-           More accurate ( as claimed by Document Imaging Software vendors )

-           Requires fewer resources

-           Electronic copy of Forms can be kept and make use of for a long period of time

Disadvantages of automated data extraction from survey/census forms

-           Scanner, Scanner program, and Imaging Software are quite expensive

-           Accuracy of extracted hand-written data from survey/census forms using ICR (Intelligent Character Recognition) is very low. Imaging software vendors’ claim of its accuracy is not applicable to survey/census forms

-           It is difficult if not impossible to implement interactive data cleaning where corrections must be reflected both in the data file and forms (scan image file)

To come up with the best of both worlds, the PNSO opted for a hybrid data processing system, i.e., Scan the forms; Extract OMR data from the scan image of the forms using Document Imaging software; and Key-in, Key-verify, and Clean handwritten data using Key-from-Image program. Key-from-Image program is a VB6 program designed to enable operators to encode and key-verify data direct from the scan image of the forms and to be able to do interactive editing or cleaning of encoded data and if there is a need, reflect changes in the scan image of the forms.

Key-from-Image features

-           Data Dictionary and Output data file are CsPro system compatible

-           Drag and drop template preparation (using KFI Template Editor – a separate program)

-           Can be configured for imaging with/without interpreted data items

-           Allows annotating corrections in the scan image of forms

-           Can also be used for conventional data entry

-           With built-in electronic code-book

-           Can specify skipping pattern

-           International Aware

 

Advantages of a hybrid data capture system

-           Data entry of handwritten entries in the forms is much faster than conventional data entry

-           Lesser data items for encoding since OMR data items are extracted from the scan image of the forms

-           Allows verification of keyed-in and extracted data items

-           Accuracy of keyed in data items ranges from 99% - 100% (based on sample key-verification done in 2010 Census of population)

-           Enables PNSO to reuse the forms, that is, preparing the list of sample households by keying the names and addresses direct from the scan image of the sample household forms

-           Allows corrections being reflected both in the data file and the scan image of the forms