Speed of processing

OCR-IT Cloud OCR API provides access to high-quality OCR from devices and environments where OCR does not reside locally due technical limitations and other constraints.  This enables such environments to perform OCR-related tasks without use of local resources or maintenance and upkeep.  In some cases, cloud-based OCR is the only option to enable image processing and text recognition.  As the result, since images are processed off-device, developers should consider several optimization techniques at every stage of their submission process. In general, the Web OCR process is represented here: The entire conversion workflow can be separated into these logical steps: Image capture, creation, optimization Transmission to cloud Processing Transmission back to source Text/data processing There are multiple actions developers can take at each process stage to achieve fastest possible processing.  Let’s explore each stage separately. 1. Image capture, creation, optimization – preparation of the image for submission to processing.  This is one of the most important steps in successful workflow, since all consecutive stages will depend on the result of this stage.  Image should be as clear as possible to achieve higher level of OCR.  This means using various techniques such as user guidance and training to achieve better images, on-device quality check, resolution check, shake detection, image cleanup to prepare clean and small image for transmission, as well as other techniques.  An average 3G connection upload speed on iPhone or Android device is about 0.85 Mbps (0.11MBps) per PCWorld field tests here.  The average photo size is about 2.5 MB.  This means the upload of the original photo alone will take about 23 seconds.  However, if the image is binarized prior to transmission, the resulting black & white image filesize can be about 30 KB,...

Guide to better mobile images (from cell phone camera) for higher quality OCR

Mobile images make up large volume of traffic going through OCR-IT OCR Cloud 2.0 API.  Compared to conventional office documents, which are typically black on white at 200 to 400 dpi resolution images, and for which OCR technology has been fine-tuned for over a decade, mobile images vary greatly in resolution, quality, and image content, and present new and interesting challenges for technology and beyond.  With mobile image capture, technology is not the only important factor anymore, since behavior and simple actions performed by users can easily make or break any and all available technology.  So user behavior became much more important for ‘distributed capture’ of mobile images across very wide network of users with different skills and hardware.  Industry has not seen that dependency on user actions before cell phones, because using scanners, faxes, MFPs and copiers for image capture provided predictable image quality expectations controlled by mature technology and without much user intervention. In the following post I will describe most common situations encountered by OCR-IT Cloud OCR processing of pictures from mobile devices.  However, this text should apply to any OCR in general. I will use specific examples to describe and document common issues with mobile images. Document type: business card (which is in the top 5 of most frequently requested document types through OCR-IT Cloud API, receipt images being the most frequent document type) Mobile device: iPhone 4 (which is equivalent to average mobile camera, not top end camera by today’s standards) Environment: office desk, 8 PM (winter night), one fluorescent desk lamp for lighting For simplicity of explanation, and to further explain how some OCR engines operate internally,...

Standard Process for Managed Document Conversion and Outsourcing

OCR-IT Document Conversion Services Team uses the following methodology and project progress tracking for every Document Conversion task. Party Stage Task Client INITIATION Issue the Order for Service, complete Service Agreement, discuss project progression. Client PREPARATION Prepare documents for processing.  Documents should be in PDF (without password protection or content extraction limitations), TIF, JPEG, BMP, PNG file format.  Documents may be in sub-folder structure, or in a single folder.  This original structure will be preserved for Delivery. Client SEND Documents should be provided to OCR-IT for processing. Media options are: – FTP (OCR-IT will provide secure FTP location) – HDD (Recommended for large volumes over several GB) – Any other standard storage media, such as USB drive, flash card, DVD, etc. Sending options are: – FedEx – FTP – Local pickup (for urgent projects) OCR-IT SETUP Documents are received and checked for transmission errors.  Processing profile is created.  Processing settings are confirmed to client. OCR-IT PROOF RUN A small sample set is processed using created settings.  Processed set is delivered to client for review. Client PROOF CHECK Sample is reviewed and settings confirmed.  Upon confirmation, OCR-IT locks down settings to be used for the entire volume. OCR-IT PRODUCTION Entire volume goes into production with confirmed settings.  Progress updates are provided every 48 hours to completion. OCR-IT QA Upon completion, results are checked using following techniques: – Total count IN = total count OUT – File name IN = file name OUT – Random spot check to verify successful processing (searchability) and desired file format output (per settings) NOTE: Some projects may or may not include manual verification of text. OCR-IT DELIVERY...

User Scenario: Process digital camera pictures and OCR to extract specific numbers

In this specific project asked by one of our users, we would like to provide analysis and suggestions how to process photos of marathon runners and OCR and extract text data from these pictures. This article will describe the fully automated OCR Cloud 2.0 API approach and automated tools for developers to be used without human intervention in processing of these images. If you are interested in semi-automated process including human verification options, please contact us separately. In this project, there are several parts we will discuss separately, but overall we believe it is possible to achieve good recognition result on most good images. This project can be considered medium-to-hard complexity project, due to multiple factors, technology limitations, and multiple decision steps in approach. We will test several images from the same category to illustrate how OCR works internally, what limitations exist in these specific images, and what we can do to optimize output quality. First, we will test one random image and describe every step happening to that specific image in background processes.  These same processes will happen on each image processed. The original color photograph looks like this: NOTE : It should be noted that original photographs have high resolution, and are large files around 3 MB in file size. Only for this visual explanation and illustration purposes images (above and below) were decreased in size. For simplicity of explanation, and to further explain how OCR engines operate internally, let’s review the binarized image next. Binarization – the process of converting every pixel in the photo to either black or white, which effectively converts the photo into...

OCR-IT Demonstrates Power of Mobile OCR with Free Demo Android App with Full Source Code for Developers to Add OCR Capabilities

OCR-IT Demo Android App and Source Code for Android OS by the host of OCR Cloud 2.0 API lets developers add the capability to process mobile images and create usable text documents to their Android apps.  This sample app demonstrates how easily optical character recognition (OCR) can be implemented for Android OS applications and spur developers to make the leap to adding OCR to newly created applications for that platform. Download the App, full source code for this app, and view screenshots here: Android App and Source Code “The new OCR-IT Demo Android App, for the lack of better name, and its source code are designed to help app developers bring the power of OCR to their Android apps easily and seamlessly,” said company officials at OCR-IT. “In the onslaught of new apps for Android smart phones, we believe that those that integrate OCR will stand out for users. These capabilities add significant values since it lets users do more in less time by transforming paper-based documents into usable formats. We want to help more developers bring these capabilities to end users.” The OCR-IT Demo Android App allows users to capture images of documents taken with a smart phone camera and to create a document library to house document images. The solution provides single-button access that extracts text in several predefined hard-coded languages in seconds. Results can be viewed in two hard-coded formats: Searchable PDF and Plain Text TXT.  In addition, the OCR-IT Demo Android App offers links to external pages that provide additional information about OCR in the Cloud, as well as transparent details that allow developers to see...

OCR for bank statements

I need to be able to OCR bank statements, including getting all the numbers and description in a form that can be processed.  How did you cope with the fact that every bank any a different layout? ANSWER (from http://stackoverflow.com/questions/7362926/what-is-the-state-of-the-art-in-ocr-of-bank-statements-in-net) We have first hand experience, and I have done it in two different ways in the past. Full Page OCR First, you can take the approach of “full-page OCR” and then parse the information into your desired data format. There is a variety of Engines with .NET support, such as ABBYY Engine SDK, or even a completely free-to-start cloud-based on-demand OCR API (OCR Cloud 2.0, http://www.ocr-it.com/ocr-cloud-2-0-api). This is more of a classic approach I used for over past 10 years and up to a few years ago. OCR provides you with a complete text-based result, and you use algorithms to extract informaiton. This approach is quite static and requires heavy programming usually, especially if there are multiple variations. There are two potentially troublesome areas to look for in this approach: A. making sure that OCR provides consistent layout and text structure so it could be parsed reliably. If there is a table without gridlines, or if there is just tabular data that could be detected as a table, then OCR may work unpredictably from document to document, which essentially breaks your parsing down the road. B. making sure that your parsing logic can accommodate various formatting differences and multiple variations of data structures. This is pure programming that requires code changes for adjustments or updates. Dynamic Data Capture Second, use a modern dynamic data capture system that automates template identification and...