Automated Data Collection: Which Approach is Best?

Long gone are the days when mail survey responses were collected manually and key entered into digital format. Today, the question isn’t whether you should automate, but rather which automated data collection approach you should be using.

The most commonly used data capture technologies in the survey industry today are OMR (optical mark recognition) and Image Scanning, each with inherent advantages and disadvantages. While both provide exceptional accuracy and cost efficiency, OMR is significantly faster while Image Scanning offers more flexibility.

Choosing a data collection technology for your project is something you will need to do early in the planning process before your survey forms are designed. Your survey research partner can help you determine which solution will work best for your unique project.

Following are detailed descriptions of these industry-best quantitative data collection technologies:

OMR Technology

OMR technology detects the absence or presence of a mark. It is the fastest data collection technology in the industry and is particularly adept at measuring the darkness of a mark to help determine whether the mark is a valid response or an erasure. OMR is commonly used in standardized school testing such as the fill in the bubble test forms.

OMR forms are very specialized documents that require critical registration. This means that the forms must include precise “timing marks” along the edge of the form to let the OMR scanner know where to look for data. If this is not done correctly, data collection will be adversely affected. Therefore, you must work with a printer who has experience with OMR forms.

Color is also extremely important with OMR documents. Only colors that contain no black as part of their PMS color can be used. If a pen will be allowed, only various shades of red can be used, which further limits color choices. In addition, the paper stock requires the proper reflectance and fluorescence so that it will not read false marks during the data collection process.

As the forms are being scanned, the data is immediately written to the data file. OMR scanning has an accuracy rating of 99.9%, but only when the forms are filled out correctly. Respondents need to use the correct writing instrument and fill the bubbles completely to achieve this type of accuracy rating.

A drawback to OMR technology is that it requires you to produce pre-printed documents, which some clients have found to be inflexible, costly (especially with small quantities) and incapable of meeting design change requirements on short notice.

Image Scanning Technology

Image scanning uses ‘mark sense technology’ to detect marks on a form. While it looks a lot like OMR (collecting data from multiple choice questions), mark sense technology is very different. Rather than look for marks on a form, the scanner takes a bi-tonal (black and white) image of each form field and looks for differences in pixels between the scanned image and a template, revealing the marks in the process.

  • Time

Image scanning does take longer to process. This is because images are taken of each page, then processed against a pre-programmed template, called a document definition. Any fields that fall outside the tolerance are routed to a human verifier who reviews the field on screen and makes the appropriate choice based on the rules that have been established. Only after this step will the data be written to the data file. Testing has shown that image scan processing can take up to 40% longer than OMR depending on the rules established.

  • Flexibility

However, image scanning is much more flexible than OMR. The biggest advantage is during printing, as image scanning does not require special ink colors or the critical registration that OMR must have. Forms can be printed in black and white, and images can be stored and indexed off of any field that is collected during the scanning process.  

Forms can also include fields for open-ended comments (i.e, handwriting) that will be captured using a combination of ICR (Intelligent Character Recognition) software and operator review. Rules can also be established that force a field to be reviewed by an operator for editing. For example, all blank responses should be inspected. This is a popular rule for tests administered to young students who may have circled the choice vs marking the bubble. Other popular rules that are established for human editing include double marks, light marks that do not meet the minimum threshold, missing responses, invalid ID’s, out-of-range marks, and more.

Although the processing takes longer, we have found Image Scanning data to be more accurate than that of OMR because of the operator intervention with the form. While using an operator will certainly increase the cost of collecting data, the flexibility and increased accuracy may be worth it for your project!

 Automated Data Collection - Quick Reference Chart

Summary

OMR and Image Scanning are the best-automated data collection technologies in the industry today. Because of its inherent flexibility, Image Scanning is the more commonly used option. But for those that can adhere to OMR’s strict requirements, there is no faster or more accurate fully-automated way to collect data for multiple choice only questions.

When you do decide what automated data collection approach your project will need, one of the first things you need to prep for is a blueprint of what all the numbers mean in the data file you will get with your results. This is called a Data Schema. Check out our blog post on Creating a Data Schema

For more information on automated data collection, data capture services, or any aspect of mail survey management, contact us today!