Paper as the main carrier of information is gradually losing its importance. Instead of paper documents, use their electronic version, if possible. But how to convert existing archives into electronic form? To solve this problem, special programs for text recognition were created.
What are OCR programs and how do they work
These software products use ORC (Optical character recognition) or ICR (Intelligence character recognition) technology. These abbreviations are translated into Russian as "optical" or "intelligent character recognition".
Programs that use OCR work as follows. The photo with text received from the scanner is divided into many fragments. For each of them, the application creates several assumptions. By checking them and comparing them with standards, each fragment is given a score corresponding to the degree of coincidence. Choosing the largest ofthem, the program "sees" the character and displays it in the field of the built-in text editor.
IRC works on the same principle, but artificial neural networks are used for character processing. The main advantage of this method is the compactness of programs and continuous learning. This allows you to effectively recognize words written by a person in handwritten letters. But this technology is not able to "read" continuous handwriting.
For each of the existing operating systems developed their own OCR-programs. The most popular for Windows OS are:
- ABBYY FineReader;
- Samsung Scan OCR Program;
In addition to PC programs, there are many online OCR services available. Among them, the most famous are FineReader Online, OnlineOCR, FreeOCR.
ABBYY FineReader 14
This software product is developed by the domestic company ABBYY, and is one of the best among programs using OCR. The basis of the program is the original engine called Finereader Engine. It provides the following features:
- Fast type recognition with over 98% accuracy. Immune to the quality of the original image. This allows you to equally recognize text in photographs taken with a scanner or camera.
- ADRT technology allows you to recognize not only text, but also its formatting: font, indents, paragraphs, columns.
- Possibility of multi-threaded processingImages. This allows you to use all the processor cores (maximum 4) to speed up the recognition process.
- Supports over 190 languages, including those that use non-Latin or Cyrillic alphabets (Japanese, Chinese, Arabic).
- Built-in text editor allows you to check the recognition result or edit it.
- Interaction with the Office package. It allows you to export recognized text to Microsoft Word and Excel for further processing.
- Ability to learn the program. This function allows you to train the program to "read" specific letter styles. For example, a non-standard font or block letters written by hand.
- Working with PDF. FineReader allows you to recognize text from this file type and "stitch" multiple scanned images into PDF or PDF/A.
The main disadvantage of this program is the price. A perpetual license for the basic version will cost 7 thousand rubles. Versions "Business" and "Enterprise" - 12 and 39 thousand rubles, respectively. If you intend to use the program only at home, then you can download the hacked 11th or 12th version of the product from the torrent tracker.
- Processor: 32-bit or 64-bit, over 1 GHz, supporting SSE 2 instruction set. (Intel Celeron M or better, AMD Athlon 64 or better).
- RAM: 1 GB. If the processor has more than 1 core, then an additional 512 MB is required for each.
- Video card: any,supporting 1024 x 800 resolution.
- Hard drive: 3 GB - for installation and operation.
- Scanner: Supports TWAIN and WIA drivers.
- OS: Windows 7, 8, 8.1, 10.
Users' opinions about FineReader 14
They speak positively about FineReader, highlighting among the advantages the product's ability to recognize text from poor paper originals, a convenient and simple interface and high image processing speed.
Among the problems that arise when using this OCR program, some users note that the image manager does not work correctly. For example: Inadequate operation of the brightness adjustment of the scanned image.
The main competitor of FineReader in the Russian market of ORC programs. In terms of functionality, it is very similar to the opponent, but has several differences:
- Ability to start the scanning and recognition process using the scanner buttons.
- Support for 4-core processors. This allows you to reduce recognition time and convert multiple images at the same time.
- Creating your own e-library for your Kindle e-book reader.
- Automatic detection of recognized language.
Among the shortcomings of the program, one can note the low speed of work, comparable to the 10th version of FineReader, and the price for a licensed copy is $150.
- Processor: x32- or x64-bit, clock speedover 1 GHz, Intel Pentium or better, AMD Athlon or better.
- RAM: 512 MB.
- Video card: any that supports 1024 x 800 resolution and 16-bit color depth.
- HDD: 1.1 GB for installation of all components and 100 MB for operation.
- Scanner: Supports TWAIN, WIA and ISIS drivers.
- OS: Windows XP SP3, Vista SP2 x32/x64, 7, 8.
OmniPage user opinions
They speak sharply negatively about her, tk. there are problems in all parts of the program, ranging from a beautiful but incomprehensible interface to poor help information. The product is not adapted to work in WinXP. It can be made to work, but it will take some time.
OmniPage has recognition issues. For example: it easily recognizes plain black text on a sheet of paper with drawings or tables received from a scanner. When using images from a camera or mobile phone, the recognition accuracy drops to 70%, which is very inconvenient when processing large documents.
Also, the 18th version may not run due to errors in the code. To fix this problem, you need to install patch 18.01.
Read Iris Pro 17
Read Iris is an OCR program that for less money (8000 vs. 12,000) is able to match the functionality and performance of FineReader. The pro version has the following features:
- Full work with PDF: recognition, creation of files for databases, compression and sounding of text.
- Support 140 languages.
- Recognition of paper tables and texts with the ability to export to Excel and Word.
- Getting images from any scanner model.
There is also a corporate version that allows you to watermark PDFs and work with documents over 50 pages.
- Processor: x86 or x64, 1 GHz or higher.
- RAM: 1 GB.
- Video card: any that supports 1024 x 800 resolution.
- Hard disk: 400 MB for installation.
- Scanner: Supports TWAIN, WIA drivers.
- OS: Windows 7, 8, 10 x32/x64.
User opinion about ReadIris
They report this OCR OCR software as a good and fast PDF to Word converter with some problems:
- Complicated interface that is not easy for a beginner to understand.
- Automatically rescan the document when changing the scan area.
- Bad technical support.
- Sometimes the program does not activate due to errors in the program code.
Samsung Scan OCR Program - what is this program?
This is a free software included with Samsung 3-in-1 multifunction devices (printer, scanner, copier). It was developed in collaboration with Iris, the creators of ReadIris Pro, and optimized to work with this manufacturer's MFPs. From original "Ridiris" Samsung Scan ORCdiffers in interface, truncated functionality and size - it occupies 40 MB on the hard disk.
They are an alternative to resource-intensive stationary programs for OCR. For example, OCR program FineReader. The properties of the systems of such projects make it possible to recognize text from images much faster than on a standalone PC. Among the services that extract text from photos, there are 3 most convenient ones: FineReaderOnline, FreeOCR, OnlineOCR.
The first is a direct development of the stationary version of the product. Upon registration, a new user is given 10 free pages to process and 5 every month. You can remove this limitation by purchasing an annual subscription for 3200, 5500, 17800 rubles for 2000, 5000 and 10000 pages, respectively. If the user has a license for FineReader 14, then it is enough for him to register and activate it for use in the online version. In this case, he will receive the number of pages corresponding to the type of license he purchased: "Standard" (2000), "Business" (5000) or "Enterprise" (10000).
The OnlineOCR.com service allows you to convert 15 images/hour (unregistered user limit) to text and save them as.docx,.xlsx or.txt files. After registration becomes available:
- Save to.pdf,.doc,.xlx,.rtf.
- Converting multi-page PDF files.
- Number of pagesincreases to 50.
If there are not enough pages, they can be purchased in quantities of 50-50,000 pieces.
The FreeOCR.com project differs from the previous one in its complete free of charge and the absence of restrictions on the number of pages processed. The OCR engine of this site supports Russian, Ukrainian, Turkish, Vietnamese and all European languages - 29 in total. The only drawback of this portal is that it works only with graphic images that are loaded sequentially, since the processing queue is not provided by the creators. Recognized information is displayed without any formatting in TXT format.
Users' opinions about online OCR services
These sites are needed when downloading and installing a full ORC program is not practical. For example, to insert several voluminous quotes from a book or magazine into an abstract. Among the shortcomings of such sites are conditional free (FineReader) and weak functionality (FreeOCR, OnlineOCR).
Summarizing, we can say that a lot of OCR OCR programs for text with an image or PDF files have been created, and only the most famous ones are listed in the article. Therefore, each user can choose an OCR program for a scanner in accordance with the requirements and budget. Or use one of the many free OCR services.