Mobile images make up large volume of traffic going through OCR-IT OCR Cloud 2.0 API. Compared to conventional office documents, which are typically black on white at 200 to 400 dpi resolution images, and for which OCR technology has been fine-tuned for over a decade, mobile images vary greatly in resolution, quality, and image content, and present new and interesting challenges for technology and beyond. With mobile image capture, technology is not the only important factor anymore, since behavior and simple actions performed by users can easily make or break any and all available technology. So user behavior became much more important for ‘distributed capture’ of mobile images across very wide network of users with different skills and hardware. Industry has not seen that dependency on user actions before cell phones, because using scanners, faxes, MFPs and copiers for image capture provided predictable image quality expectations controlled by mature technology and without much user intervention.
In the following post I will describe most common situations encountered by OCR-IT Cloud OCR processing of pictures from mobile devices. However, this text should apply to any OCR in general.
I will use specific examples to describe and document common issues with mobile images.
Document type: business card (which is in the top 5 of most frequently requested document types through OCR-IT Cloud API, receipt images being the most frequent document type)
Mobile device: iPhone 4 (which is equivalent to average mobile camera, not top end camera by today’s standards)
Environment: office desk, 8 PM (winter night), one fluorescent desk lamp for lighting
For simplicity of explanation, and to further explain how some OCR engines operate internally, let’s review original and binarized images.
Binarization – the process of converting every pixel in the color or greyscale photo to either black or white pixel, which effectively converts the photo into a pure black & white image.
Binarization process is great for visually evaluating complex color images. Core OCR uses binarized image internally for processing. Some OCR engines will use grey and color bit depth to further increase processing quality, but effects from that additional image information and color depth are minimal compared to major possible image defects, can be positive and negative, and will be ignored in this discussion.
Before we proceed to text recognition, let’s review a few sample images.
Note: original tested images were at 2592 x 1935 pixels in size, which translates into 72 dpi as reported by Windows file property viewer. Images visible below were made smaller for illustration purposes only.
Original sample photo # 1:
Binarized image of sample photo # 1:
OBSERVATION: Original image was underexposed (too dark). Insufficient contrast (difference between light background and dark text objects) prevented binarization to clear background (to white) and solidify text objects (to black).
OCR RESULT: Poor result. Unusable for text extraction.
SOLUTION: Increase contrast in the picture by adding another light source, positioning document closer to light available source, eliminating shadows, or using flash.
Original sample photo # 2:
Binarized image of sample photo # 2:
OBSERVATION: The image is too blurry, due to hand shake or object movement at the time of taking picture. Objects are too ‘fuzzy’ to make out in binarization process.
OCR RESULT: Poor result. Unusable for text extraction.
SOLUTION: Re-take picture in more stable position.
OBSERVATION: Image has an area of reflection, which is common with glossy surfaces if a) light source is exactly picture-opposite of camera , or b) flash is used opposite the document. Text is washed out by reflection.
OCR RESULT: Poor result in washed out area. The rest of the text is clearly visible and produces high level of OCR accuracy.
SOLUTION: Adjust angle to eliminate direct reflections. If using flash, slightly increase distance from the paper and slightly adjust the angle so the flesh reflection appears off paper.
OBSERVATION: Original image has slightly incorrect white balance, perhaps due to florescent lighting, but binarization shows no issues and produces very good black & white representation.
OCR RESULT: OCR produces excellent 100% result. Some color inaccuracy is observed in text font color due to incorrect original white balance in the photo, but that is visible only in export format that carries through font color, such as to MS Word. TXT result is excellent. Logo was not captured in its entirety, and produced a few extra characters of noise.
OCR-IT binarization and image processing algorithms were fine-tuned for mobile images, as well as for many other document types captured at various hardware such as scanned office documents, newspapers, magazine articles, driver licenses, etc. In some specific cases developer (you) can create better binarizaiton for your specific use. If image pre-processing is possible, developer may choose their own binarization algorithms or some external technology to prepare images for OCR (happens before the image is submitted to OCR-IT OCR). This may allow to fine tune binarization process specifically to your hardware, capture method and document type. For example, aforementioned FotoNote app for iPhone uses on-device binarizaiton designed for mobile images from iPhone camera. Because such pre-processing binarization can be fine tuned better than OCR-IT default binarization, it should be considered as important step to increased OCR accuracy.
If you have specific questions, or would like us to evaluate your images, please contact us or send e-mail to support at ocr-it dot com.
CREDITS: Special thanks goes to Ilya Evdokimov from WiseTREND for providing sample images and detailed explanations of OCR processes referenced in this post. (C) WiseTREND 2013.
OCR-IT Support Team