Tesseract

Search Software

Tesseract is a free optical character recognition engine developed originally by HP and currently being maintained by Google. It has been voted as one of the best OCR engine in the world. It has no layout engine, no output formatting and no GUI. It has been trained to perform recognition on many languages like English, French, German etc. It can also be taught to recognize other languages. Currently it can only read tiff and bmp images.

SW Documentation: 
To run this software interactively in a Linux environment run the commands:
module load tesseract
tesseract imagename textfileoutputname
The image file corresponding to 'imagename' is transcribed and the output is stored in the text file, 'textfileoutputname.txt'.
 
If you need more details, visit the official documentation.
Short Name: 
tesseract
SW Module: 
tesseract
Service Level: 
Primary
SW Category: