You are here: Home - OCR Cloud 2.0 API - API Documentation

API Documentation

OCR  CLOUD  API  GUIDE

Enhanced API
Release 2.3
Updated 2012-10-10

 

SIGN UP, TESTING AND PRICING

OPEN NEW ACCOUNT & GET PRICING:
http://www.ocr-it.com/ocr-cloud-2-0-api-subscription-plans-pricing/

OCR-IT OCR Cloud 2.0 API also provides free Development/Testing account for more automated POST API testing.  Easy subscription to Trial account is available: http://www.ocr-it.com/free-ocr-cloud-2-0-api-trial

For assisted testing and advice, send your image or PDF to support@ocr-it.com for one of experienced OCR-IT Technicians to convert and return results back to you, along with recommendations how to best process your documents. It’s free. No obligations.  Please suggest what OCR language to use for your documents, if it not in English language.  Also suggest what format you would like to receive back if you would like any format other than TXT and Searchable PDF.

More information: http://www.ocr-it.com/ocr-cloud-2-0-api/

Technical support: support@ocr-it.com

Overview

The OCR-IT LLC OCR Web API allows to submit OCR requests (images in PDF / TIF / PNG / JPG / BMP / PCX / DCX formats) and get back textual results (in TXT / PDF / RTF / Word / Excel / XML / CSV / others, with full Unicode support).  Multi-lingual OCR in a variety of languages (listed at the end of this document) is supported.

Key Features:

  • Temporary Cloud Storage for Images
  • Support of Common Image Formats
  • Variety of Print Types
  • Image Cleanup: Deskew, Despeckele, Remove Texture, Automatic Rotation Detection
  • Over 150 OCR Languages
  • Mixed Languages Auto-detection
  • Barcode Recognition
  • Two modes of Text Recognition: Quality, Speed
  • Specialized Text Extraction Algorithms
  • Support for Different Fonts and Print Types
  • All Popular Output Formats
  • Enhanced Error Handling

By using the various API settings, you can optimize the OCR process to a variety of sources (scans, digital camera images, etc) and a variety of purposes (full-text indexing of articles, invoice scanning, etc). Barcode scanning is also supported.  For assistance in optimizing the API for your particular task, please contact Support Team.

Using the API consists of the following stages:

  1. Submit a Job
  2. Handle Job Status – one or both of the following:
    1. Check Job Status manually
    2. Get notified about Job Status automatically
    3. Get Results of a Job

1. SUBMITTING A JOB

OVERVIEW

Submit a job by sending an HTTP POST request to the following URL:

http://svc.webservius.com/v1/wisetrend/wiseocr/submit?wsvKey=[your API key]

NOTE: Make sure to include the “/submit” in the URL

The request message body should contain XML of the following format (explained in detail below):

<Job>
<InputURL>[input image URL]</InputURL>
<InputType>[input  type (PDF/TIF/PNG/JPG/etc.)]</InputType> <!– Optional –>
<NotifyURL>[job status notification URL]</NotifyURL> <!– Optional –>
<CleanupSettings>[image cleanup settings]</CleanupSettings> <!– Optional –>
<OCRSettings>[OCR settings]</OCRSettings> <!– Optional –>
<OutputSettings>[Output settings]</OutputSettings> <!– Optional –>
</Job>

The XML is case-sensitive, using proper leading letter and acronym capitalization, as listed in this documentation.

The order of XML elements should be as listed in this document.  For example, <CleanupSettings> may NOT be placed before <InputURL> within XML.

The Content-Type of the request should be “text/xml”.

The Content-Length is required.

No place to store your images online for conversion?  We got you covered with Short Term Storage!

Sometimes it is more convenient to upload the actual image file that you want OCRed, instead of giving its URL. If this is the case, you can use the convenient WebServius Short Term Storage API. It allows you to very easily store a file at a temporary URL in the cloud, which you can then pass to the OCR API.

http://www.webservius.com/cons/subscribe.aspx?p=wsv&s=sts

Please see the link above for latest pricing details. Special discounts and some amount of free usage may be available for the WebServius Short Term Storage API when it is used in conjunction with the OCR-IT OCR Cloud API.
The WebServius Short Term Storage API offers free and paid accounts as long as you use it in conjunction with the OCR-IT OCR Cloud API.  Please see the link above for latest pricing details and subscription information.

In case of success, the response will be an HTTP 200 (Success) response code, and the following XML (explained in detail below):

<JobStatus>
<JobURL>[URL for checking job status]</JobURL>
<Status>Submitted</Status>
</JobStatus>

In case of an error, an HTTP error code is returned along with XML explaining the error (see section on Error Elements at the end of this document).

 

COSTS AND CHARGES PER UNIT

Account is charged 1 Unit (as defined below) of OCR upon successful job submission, and for the rest of the pages upon successful job completion.  Please note that certain errors (such as a corrupt input file) can only be detected once you’ve already been charged 1 unit for the submission.

A single “Unit” is defined as a single page of size “international A4 format or smaller”, by actual surface area.  Smallest processing unit is 1 Unit, so even smallest images will count as 1 Unit.

Please note that if your image has larger format than A4 size, an appropriate number of Units will be deducted based on a simple formula of ‘your area / A4 area, result rounded up’.  A4 has the size of 210 x 297 mm, or 8.27 x 11.69 inches.  Area is 96.68 sq in.  If you submit a large engineering drawing in size B2, which is 19.69 × 27.83 inches (547.97 sq in area), that drawing will utilize 547.97/96.68 = 5.66, which will be charged as 6 Units.

In multi-page files such as PDFs, each page gets evaluated separately using the above criteria.  For example, if a PDF has 5 pages under A4 size, the PDF will be processed at the cost of 5 pages (Units).

There is a detailed visual diagram and specifications of international sizes here:

http://en.wikipedia.org/wiki/Paper_size

Please note that quantity of export formats does not affect the Unit count charges, i.e. multiple output formats can be requested in a single request and will be charged only based on surface area, not on quantity of outputs.

 

INPUT PARAMETERS

wsvKey (required)

This is your API key, which is issued to you when you subscribe to the OCR-IT LLC OCR Cloud 2.0 API.

InputURL (required)

The URL of the image on which you want to perform OCR (must be http://, https:// or ftp://)

NOTE 1: Make sure that the InputURL is properly XML-encoded. This is especially a concern if the URL contains query parameters. For example, if your image is at:
http://example.com/images?id=565&size=large,
the job request should be:
<Job><InputURL>http://example.com/images?id=565&amp;size=large</InputURL></Job>
Note that the “&” in the original URL has turned into “&amp;”, as required by XML encoding rules.

Normally, if a standard library is used for dealing with XML, this would be done automatically. However, if you are constructing XML manually from strings, you may need to do this manually.

NOTE 2: Do not URL-encode (percent-encode) the InputURL. For example, if your image is at: http://example.com/My%20Picture.jpg

The job request should be:
<Job><InputURL>http://example.com/My Picture.jpg</InputURL></Job>
Note that a real space is used instead of the “%20” percent-encoded version.

The image cannot exceed 20MB in size and cannot take more than 5 minutes to download.

The image must be in a supported format (see table below). If the image URL path (not counting the query string, if any) does not end in a dot followed by a supported extension (case-insensitive, see table below), the InputType parameter must be provided. E.g.:

http://example.com/scan001.tif – InputType not required (TIF auto-detected)
http://example.com/scan001.tif?resolution=high – InputType not required (TIF auto-detected)
http://example.com/scan001 – InputType required
http://example.com/scan001?format=.tif – InputType required

Supported formats and extensions are:

FORMAT EXTENSIONS SUPPORTED FORMAT DETAILS
PDF pdf Version 1.6 or earlier
BMP bmp 2-bit – Uncompressed Black & White
4- and 8-bit – Uncompressed Palette
16-bit – Uncompressed Mask
24-bit – Uncompressed Palette and TrueColor
32-bit – Uncompressed Mask
PCX pcx 2-bit Black & White, 4- and 8-bit Gray
DCX dcx 2-bit Black & White, 4- and 8-bit Gray
JPG jpg, jpeg Jpeg: Gray, Color
Jpeg 2000: Gray Part 1, Color Part 1
TIF tif, tiff Black&White: uncompressed, CCITT3, CCITT3FAX, CCITT4, PackBits, ZIP, LZW
Gray: uncompressed, Packbits, JPEG, ZIP, LZW
TrueColor: uncompressed, JPEG, ZIP, LZW
Palette: uncompressed, Packbits, ZIP
Multi-image TIFF
PNG png Black&white, gray, color

InputType (optional)

Specifies the input type. Must be one of the Supported Formats (left column in the table above). Not required if the type can be auto-detected from the URL (see InputURL above).

NotifyURL (optional)

The URL to which a notification should be sent when the job succeeds or fails (see section 2b on notifications). Must be http:// or https://.

NOTE: The NotifyURL must not be URL-encoded (i.e. should use “ “ and not “%20”), and must be XML-encoded (i.e. should use “&amp;” and not “&”), just like the InputURL. See the InputURL section above for more details and examples.

CleanupSettings (optional):

Settings that control image cleanup, in the following form (every element is optional):

<CleanupSettings>
<Deskew>
[true/false]<Deskew> <!– Optional, default is ‘true’ –>
<RemoveGarbage>
[true/false]</RemoveGarbage> <!– Optional, default is ‘true’ –>
<RemoveTexture>
[true/false]</RemoveTexture> <!– Optional, default is ‘true’ –>
<RotationType>[see below]</RotationType> <!– Optional, default is ‘Automatic’ –>
</CleanupSettings>

The settings are explained below:

Deskew (Boolean)  Specifies whether the skew angle for an image should be corrected during preprocessing. This mode is recommended if you want to automatically correct skew for images you work with. The default value is ‘true’. 

 

RemoveGarbage (Boolean)  Specifies whether garbage (excess dots that are smaller than a certain size) should be removed from the image during preprocessing. The default value is ‘true’. 

 

RemoveTexture (Boolean)  Specifies whether structured background noise should be cleared before the recognition process starts. The default value is ‘true’. 

 

Before                         After

RotationType (String)  Specifies what type of rotation will be performed upon the image during preprocessing. The default value is “Automatic”, which means that rotation will be detected automatically. Allowed values: 

NoRotation – no rotationAutomatic – auto-detect rotation

Clockwise – rotate by 90 degrees clockwise

Counterclockwise – rotate by 90 degrees counterclockwise

Upsidedown – rotate by 180 degrees

 

OCRSettings (optional)

Settings that control image recognition; in the following form (every element is optional):

<OCRSettings>
<PrintType>
[see below]</PrintType> <!– Optional, default is ‘Print’ –>
<OCRLanguage>
[see below]</OCRLanguage> <!– Optional, default is ‘English’ –>
<SpeedOCR>[true/false]</SpeedOCR> <!– Optional, default is ‘false’ –>
<AnalysisMode>[see below]</AnalysisMode> <!– Optional, default is ‘MixedDocument’ –>
<LookForBarcodes>[true/false]</LookForBarcodes> <!– Optional, default is ‘true’ –>
</OCRSettings>

The settings are described below:

PrintType (Semicolon-delimited list of strings)  Specifies the types of printed text in the image.  The default value is “Print”, which corresponds to common typographic text equivalent to laser printer.Print 

Modern Text

Typewriter

DotMatrix

OCR_A

OCR-A Text

OCR_B

OCR-B Text

MICR_E13B

If you would like to recognize more than one text type in the same document, separate types with semicolons without spaces. For example, “Print;Typewriter”.

OCRLanguage (Semicolon-delimited list of strings)  This property allows you to specify which of over 200 supported languages should be used for OCR, including mixed languages within the same document.  See list of supported languages at the end of this document. The default value is “English“. To specify more than one language, separate languages with semicolons (without spaces) – for example:
English;Danish”.
SpeedOCR (Boolean)  This property provides faster recognition speed (by as much as 2-2.5 times, depending on server load) at the cost of a moderately increased error rate (1.5-2 times more errors).  On good, print-quality texts, OCR makes an average of 1-2 errors per page more in this mode, which in some cases is a small sacrifice for the substantial increase in speed. Such moderate increase in error rate can be easily tolerated in many cases, such as full text indexing with “fuzzy” searches, preliminary recognition, etc. The default value is ‘false’.
AnalysisMode (String)  Specifies how aggressively the text should be extracted. The default value is “MixedDocument”.
MixedDocument – This mode is useful if you export your text to document archives: the full page layout is retained and full-text search is available if you save in this mode.  This mode will look for images and text within an image. 

TextIndexing — This mode is used to extract data from a document, including text in pictures.  Note that the OCR retains both the picture and the text in it. Text extracted from a picture block can only be exported to TXT, PDF and XML formats (XML export support is coming soon).  The data can then be used for subsequent full-text indexing and search.  The program retains the logical reading order, pictures, and tables.

TextAggressive — This mode is used to pre-process documents with many small text zones. Usually they are noisy, low-quality images that may contain text within other objects. This mode extracts all text from the image, including tables, pictures, small text areas, and noise. The result is plain text without table blocks and picture blocks.

BarcodesOnly — This mode is used to extract barcodes only.

NOTE: Barcode values are extracted in all modes as long as LookForBarcodes is true.

To understand specific differences between these modes of image analysis, sample processing and testing is encouraged.  Results will vary highly based on a) your need for text vs. preserving pictures and b) based on your specific images.  For example, running in “TextAggressive” mode, OCR will sacrifice images towards extracting as much text as possible.  For example, a road sign in the background or a license plate will be treated as potential text content.  In “MixedDocument” mode, the same road sign will be treated as a picture.  In “TextIndexing” mode, the road sign will be preserved as a picture, but any recognizable text content will still be available for searching.

LookForBarcodes (Boolean) Specifies whether barcodes should be recognized. Default is ‘true’.

OutputSettings (optional)

Settings that control text result output, in the following form (every element is optional):

<OutputSettings>
<ExportFormat>
[see below]</ExportFormat> <!– Optional, default is ‘Text;PDF’ –>
</OutputSettings>

The settings are explained below:

ExportFormat (Semicolon-delimited list of strings)  Specifies the desired formats for text output. The default value is “Text;PDF”, which corresponds to both Text and PDF output.RTF – export to *.RTF (rich-text) format.  Retains full page layout and preserves pictures.  The program will automatically select the most suitable paper size when saving the recognized text and pictures. 

MSWord – export to *.DOC (Microsoft Word) format.  Retains full page layout and preserves pictures.  The program will automatically select the most suitable paper size when saving the recognized text and pictures.

MSExcel – export to *.XLS (Microsoft Excel) format. 

PDF – export to *.PDF format 

DBF – export to *.DBF format

Text – export to *.TXT common formatted ASCII text-only output

CSV – export to *.CSV format

PPT – export to *.PPT format

XML – export to *.XML format

UnicodeText_UTF8 – export to *.UTF8.TXT format

UnicodeText_UTF16 – export to *.UTF16.TXT format

UnicodeCSV_UTF8 – export to *.UTF8.CSV format

UnicodeCSV_UTF16 – export to *.UTF16.CSV format

HTML – export to *.HTM format ^

UnicodeHTML – export to *.UNICODE.HTM format ^

^ HTML output will provide access to the HTML result containing all text.  If original image contained pictures along with the text data, pictures will be referenced in HTML but will not be returned.  Please test HTML output to make sure output matches your desired result.

If you would like to produce more than one output format from the same image request, separate your desired output formats with semicolons without spaces. For example, “PDF;Text;UnicodeText_UTF8”.

NOTE: You will need to know the file extension of the desired format (specified above) to retrieve the job results.

 

Error Element On Failed Job Submission

If the job submission fails, you will receive an appropriate HTTP error code, as well as an <Error>Code</Error> response. The possible values of ‘Code’ are:

Code HTTP Error Code Description
BadInputURL 400 InputURL is invalid or missing, or is not an HTTP/HTTPS URL
BadNotifyURL 400 NotifyURL is invalid or missing, or is not an HTTP/HTTPS URL
BadInputType 400 The specified InputType is invalid,
OR
InputType is missing, and auto-detected file type is not valid,
OR
InputType is missing, and auto-detection of file type has failed
BadRotationType 400 Rotation specified in CleanupSettings is invalid. Please note that it is case-sensitive.
BadAnalysisType 400 AnalysisMode specified in OCRSettings is invalid. Please note that it is case-sensitive.
BadPrintType 400 PrintType specified in OCRSettings is invalid. Please note that it is case-sensitive.
BadExportFormat 400 ExportFormat specified in OutputSettings is invalid. Please note that it is case-sensitive.
OCRSettingsTooComplex 400 OCRSettings are too complex. Try reducing the number of OCRLanguages and PrintTypes you are recognizing.
InternalError:ErrorNumber 500 Internal error has occurred. Contact Support

POST Examples

URL Example:

HTTP POST to http://svc.webservius.com/v1/wisetrend/wiseocr/submit?wsvKey=[your_key_here]

Message body example (simple):

<Job>
<InputURL>http://www.ocr-it.com/online_ocr/english_photo_bw.tif </InputURL>
</Job>

 

Message body example (enhanced):

<Job>
<InputURL>http://www.ocr-it.com/online_ocr/english_photo_bw.tif</InputURL>
<InputType>TIF</InputType>
<NotifyURL>http://www.example.com/ocrNotify.php</NotifyURL>
<OCRSettings>
<OCRLanguage>English</OCRLanguage>
</OCRSettings>
</Job>

 

Message body example (full):

<Job>
<InputURL>http://www.ocr-it.com/online_ocr/english_photo_bw.tif</InputURL>
<InputType>TIF</InputType>
<CleanupSettings>
<Deskew>true</Deskew>
<RemoveGarbage>true</RemoveGarbage>
<RemoveTexture>true</RemoveTexture>
<RotationType>Automatic</RotationType>
</CleanupSettings>
<OCRSettings>
<PrintType>Print</PrintType>
<OCRLanguage>English</OCRLanguage>
<SpeedOCR>false</SpeedOCR>
<AnalysisMode>MixedDocument</AnalysisMode>
<LookForBarcodes>true</LookForBarcodes>
</OCRSettings>
<OutputSettings><ExportFormat>Text</ExportFormat>
</OutputSettings>
</Job>

 

Response example:

<JobStatus>
<JobURL>http:// api.ocr-it.com/ocr/v2/getStatus/123ABCResponseExample123ABC</JobURL>
<Status>Submitted</Status>
</JobStatus>

 

C# Code Example

PHP Code Example

Python Code Example

Java Code Examples

For these and additional sample code please visit our Code Samples section

 

2. HANDLING JOB STATUS

There are two ways to handle job status:

-         You can manually check the status of any job by sending an HTTP GET request to the <JobURL> that you received when you submitted the job.

-         You can automatically get notified when the job succeeds or fails if you provide a <NotifyURL> when you submit a job. There will only be one attempt to notify you. It will be made when the job fully succeeds or fails (you will not get any intermediate status notifications). The notification will consist of an HTTP POST containing XML status information.

Regardless of which method you use, the status report is in the same format, as described below.

2.1 Status for jobs in progress

For jobs that are not yet complete, the status report looks as follows:

<JobStatus>

<JobURL>http://api.ocr-it.com/ocr/v3/getStatus/xxxxx_your_job_id_xxxxx</JobURL>

<Status>[status]</Status>

</JobStatus>

NOTE: Sub-domain may be different depending on which server responds to your initial request.  Make sure to retrieve the entire specific Job URL after your submission.

An example of job ID is “583659A247BCFE55110C2229FFEA7601”, which is a randomly generated value assigned to a job at the time of submission.

“Status” can either be “Submitted” (meaning that the job has been submitted but the image to be OCRed has not yet been downloaded), or “Processing” (meaning that the image has been downloaded and is in the process of being OCRed). Other status values (such as “Finished”) are described below, in the sections about successful/expired/failed jobs.

“JobURL” repeats the URL where updated job status may be obtained.

2.2 Status for successful jobs

For jobs that have completed successfully, the status report looks as follows:

<JobStatus>

<JobURL>http://api.ocr-it.com/ocr/v2/getStatus/xxxxx_your_job_id_xxxxx</JobURL>

<Status>Finished</Status>
<Download>
<File>
<Uri>
http://api.ocr-it.com/ocr/v3/download/123456789.PDF</Uri>
<OutputType>
PDF</OutputType>
</File>
<File>
<Uri>
http://api.ocr-it.com/ocr/v3/download/123456789.TXT</Uri>
<OutputType>
TXT</OutputType>
</File>
</Download>

</JobStatus>

NOTE: Sub-domain may be different depending on which server responds to your initial request.  Make sure to retrieve the entire specific Job URL after your submission.

There will be one <File> entry for each requested output format – by default, there will be one for TXT (plaintext) and the other for PDF. The <File> entries may appear in any order. Each contains an <OutputType> indicating the output type (file extension), and a <Uri> containing the address where the output may be downloaded.

As usual, “JobURL” repeats the URL where updated job status may be obtained.

2.3 Status for expired jobs

Job results are not guaranteed to be kept for more than 24 hours, or after initiating a “clear” command (documented below) for a particular job.  If a job has expired, it will NOT have a <Download> element, and the <Status> will be “Expired”.

2.4 Status for failed jobs

For jobs that have failed, the status report looks as follows:

<JobStatus>

<JobURL>http://api.ocr-it.com/ocr/v2/getStatus/xxxxx_your_job_id_xxxxx </JobURL>

<Status>[FailedStatus]</Status>
<Errors>
<Error>
<Code>
[Code]</Code>
<Message>[
Message]</Message>
</Error>
</Errors>

</JobStatus>

The <Status> may be one of the following:

FailedDownload Could not download the image to be OCRed
FailedConversion Could not perform OCR
FailedNoFunds Insufficient funds for the number of pages you are attempting to OCR
FailedInternalError Internal error, please contact Support

 

The <Errors> element may or may not be present. If it is present, it may contain one or more <Error> elements with <Code> and <Message> sub-elements that can help you debug the problem. Here are some common <Code> values:

ConvertFailed OCR engine reported an error during conversion. Make sure that the input file is not corrupt ad is not password-protected.
SubmitFailed Could not submit the OCR job. Possibly an internal error, contact Support
DownloadRejected Could not download the input image. Ensure that it does not exceed maximum size and that the server with the image responds promptly.
DownloadFailed Could not download the input image. Ensure that the image URL exists and does not require authentication.

 

As usual, “JobURL” repeats the URL where updated job status may be obtained.

3. RETRIEVING JOB RESULTS

To get the results of the job, use the URLs from the successful job status reports (see section 2.2 above). Results will be returned with the correct Content-Type header.

NOTE: The result of processing will be deleted after 24 hours from submission automatically.

CLEARING JOB RESULTS

For additional security, you may choose to delete your processing result.  Once the processing result has been picked up and is no longer needed, making a simple call will initiate the purge process immediately.

Template:
POST {base_URL}/clear/{JobID}

Example:
POST http://api.ocr-it.com/ocr/v3/clear/e83026a84aa0400897f3000883897ce9

Return status 200 indicates successful completion.  Status URL and JobID remain valid but all images and data becomes inaccessible and gets queued for purging.  JobID URL should return status “Expired” after successful clear call.

NOTE: The Content-Length is required, but should be set to 0.  Body message is not required.

 

PRIMARY AND ALTERNATE DNS

There are two (2) addresses available through two different DNS hosting providers for added reliability through redundancy.

PROVIDER # 1 (primary) ocrcloud-api.dyndns.org
PROVIDER # 2 (backup) api.ocr-it.com

All addresses can be used interchangeably, and applications that require an added protection from failing DNS can implement a primary and secondary DNS from two different hosting providers to be checked automatically.  For example, the default status report:

<JobStatus>
<JobURL>http://api.ocr-it.com/ocr/getStatus/583659A247BCFE55110C2229FFEA7601</JobURL>
<Status>[status]</Status>
</JobStatus>

is equivalent to:

<JobStatus>
<JobURL>http://ocrcloud-api.dyndns.org/ocr/getStatus/583659A247BCFE55110C2229FFEA7601</JobURL>
<Status>[status]</Status>
</JobStatus>

 

TESTING WITH FIDDLER2 (OUT-OF-BOX)

Fiddler is a Web Debugging Proxy which generates and logs HTTP(S) traffic between your computer and external servers. Fiddler allows you to inspect all HTTP(S) traffic, set breakpoints, and “fiddle” with incoming or outgoing data. Fiddler is freeware and can debug traffic from virtually any application, including Internet Explorer, Mozilla Firefox, Opera, and thousands more.

Fiddler is a useful tool in testing your OCR requests or debugging.  The following screenshot demonstrates the out-of-box setup to test any combination of settings available in OCR Cloud 2.0.  See POST Examples section for pre-set requests.

LIST OF SUPPORTED LANGUAGES

Languages with full dictionary support

 

  • ArmenianEastern
  • ArmenianGrabar
  • ArmenianWestern
  • Bashkir
  • Bulgarian
  • Catalan
  • Chinese Simplified*
  • Chinese Traditional*
  • Croatian
  • Czech
  • Danish
  • DutchBelgiun
  • DutchNetherlands
  • English
  • Estonian
  • Finnish
  • French
  • German
  • GermanNewSpelling
  • Greek
  • Hebrew*
  • Hungarian
  • Indonesian
  • Italian
  • Japanese*
  • Korean*
  • Latvian
  • Lithuanian
  • Norwegian
  • NorwegianBokmal
  • NorwegianNynorsk
  • OldEnglish
  • OldFrench
  • OldGerman
  • OldItalian
  • OldSpanish
  • Polish
  • PortugueseBrazil
  • PortuguesePortugal
  • Romanian
  • Russian
  • Slovak
  • Slovenian
  • Spanish
  • Swedish
  • Tatar
  • Turkish
  • Ukrainian
  • Abkhaz
  • Afrikaans
  • Agul
  • Albanian
  • Altaic
  • Avar
  • Aymara
  • AzerbaijaniCyrillic
  • AzerbaijaniLatin
  • Basque
  • Belarussian
  • Bemba
  • Blackfoot
  • Breton
  • Bugotu
  • Buryat
  • Cebuano
  • Chamorro
  • Chechen
  • Chukchee
  • Chuvash
  • Corsican
  • Crimean Tatar
  • Crow
  • Dakota
  • Dargwa
  • Dungan
  • EskimoCyrillic
  • EskimoLatin
  • Even
  • Evenki
  • Faroese
  • Fijian
  • Frisian
  • Friulian
  • Gagauz
  • Galician
  • Ganda
  • GermanLuxembourg
  • Guarani
  • Hani
  • Hausa
  • Hawaiian
  • Icelandic
  • Ingush
  • Irish
  • Jingpo
  • Kabardian
  • Kalmyk
  • Karachay-Balkar
  • Karakalpak
  • Kasub
  • Kawa
  • Kazakh
  • Khakas
  • Khanty
  • Kikuyu
  • Kirghiz
  • Kongo
  • Koryak
  • Kpelle
  • Kumyk
  • Kurdish
  • Lak
  • Latin
  • Lezgin
  • Luba
  • Macedonian
  • Malagasy
  • Malay
  • Malinke
  • Maltese
  • Mansi
  • Maori
  • Mari
  • Maya
  • Miao
  • Minangkabau
  • Mohawk
  • Mongol
  • Mordvin
  • Nahuatl
  • Nenets
  • Nivkh
  • Nogay
  • Nyanja
  • Ojibway
  • Ossetian
  • Papiamento
  • Provencal
  • Quechua
  • Rhaeto-Romanic
  • RomanianMoldavia
  • Romany
  • Ruanda
  • Rundi
  • RussianOldSpelling
  • SamiLappish
  • Samoan
  • ScottishGaelic
  • Selkup
  • SerbianCyrillic
  • SerbianLatin
  • Shona
  • Somali
  • Sorbian
  • Sotho
  • Sunda
  • Swahili
  • Swazi
  • Tabassaran
  • Tagalog
  • Tahitian
  • Tajik
  • Tongan
  • Tswana
  • Tun
  • Turkmen
  • Tuvan
  • Udmurt
  • UighurCyrillic
  • UighurLatin
  • UzbekCyrillic
  • UzbekLatin
  • Welsh
  • Wolof
  • Xhosa
  • Yakut
  • Zapotec
  • Zulu
  • Esperanto
  • Ido
  • Interlingua
  • Occidental
  • Basic
  • C_C++
  • COBOL
  • Fortran
  • Java
  • Pascal
  • SimpleChemicalFormulas
  • MICRE-13B
  • NumbersOnly

NOTE: Languages marked with “* are not available in this API release, but are available upon special account.  Consult additional documentation or contact OCR-IT Team for further information.

Languages without dictionary support

  • Abkhaz
  • Afrikaans
  • Agul
  • Albanian
  • Altaic
  • Avar
  • Aymara
  • AzerbaijaniCyrillic
  • AzerbaijaniLatin
  • Basque
  • Belarussian
  • Bemba
  • Blackfoot
  • Breton
  • Bugotu
  • Buryat
  • Cebuano
  • Chamorro
  • Chechen
  • Chukchee
  • Chuvash
  • Corsican
  • Crimean Tatar
  • Crow
  • Dakota
  • Dargwa
  • Dungan
  • EskimoCyrillic
  • EskimoLatin
  • Even
  • Evenki
  • Faroese
  • Fijian
  • Frisian
  • Friulian
  • Gagauz
  • Galician
  • Ganda
  • GermanLuxembourg
  • Guarani
  • Hani
  • Hausa
  • Hawaiian
  • Icelandic
  • Ingush
  • Irish
  • Jingpo
  • Kabardian
  • Kalmyk
  • Karachay-Balkar
  • Karakalpak
  • Kasub
  • Kawa
  • Kazakh
  • Khakas
  • Khanty
  • Kikuyu
  • Kirghiz
  • Kongo
  • Koryak
  • Kpelle
  • Kumyk
  • Kurdish
  • Lak
  • Latin
  • Lezgin
  • Luba
  • Macedonian
  • Malagasy
  • Malay
  • Malinke
  • Maltese
  • Mansi
  • Maori
  • Mari
  • Maya
  • Miao
  • Minangkabau
  • Mohawk
  • Mongol
  • Mordvin
  • Nahuatl
  • Nenets
  • Nivkh
  • Nogay
  • Nyanja
  • Ojibway
  • Ossetian
  • Papiamento
  • Provencal
  • Quechua
  • Rhaeto-Romanic
  • RomanianMoldavia
  • Romany
  • Ruanda
  • Rundi
  • RussianOldSpelling
  • SamiLappish
  • Samoan
  • ScottishGaelic
  • Selkup
  • SerbianCyrillic
  • SerbianLatin
  • Shona
  • Somali
  • Sorbian
  • Sotho
  • Sunda
  • Swahili
  • Swazi
  • Tabassaran
  • Tagalog
  • Tahitian
  • Tajik
  • Tongan
  • Tswana
  • Tun
  • Turkmen
  • Tuvan
  • Udmurt
  • UighurCyrillic
  • UighurLatin
  • UzbekCyrillic
  • UzbekLatin
  • Welsh
  • Wolof
  • Xhosa
  • Yakut
  • Zapotec
  • Zulu
  • Adyghe
  • TokPisin

NOTE: Languages marked with “* are not available in this API release, but are available upon special account.  Consult additional documentation or contact OCR-IT Team for further information.

Artificial languages

  • Esperanto
  • Ido
  • Interlingua
  • Occidental

NOTE: Languages marked with “* are not available in this API release, but are available upon special account.  Consult additional documentation or contact OCR-IT Team for further information.

Formal languages

  • Basic
  • C_C++
  • COBOL
  • Fortran
  • Java
  • Pascal
  • SimpleChemicalFormulas
  • MICRE-13B
  • NumbersOnly

NOTE: Languages marked with “* are not available in this API release, but are available upon special account.  Consult additional documentation or contact OCR-IT Team for further information.

CONTACT SUPPORT

Contact support@ocr-it.com