CROSSCAP uses an open-source engine called Tesseract, for text recognition (Tesseract is developed and maintained by GoogleCode).
Scanning considerations
You will achieve best text recognition results when using source documents with few or no illustrations. Ideally, such documents should be scanned with the highest feasible resolution and in bi-tonal mode.
OCR default settings
Before first use, you should make all necessary OCR default settings (see chapter Program Settings, section OCR).
Continuous text recognition (Full Text OCR)
Text recognition may be performed for the entire area of all images scanned, i.e. any recognizable text will be processed. In this case, all the necessary OCR settings are made in the export settings of the desired output format.
The following export formats support full-text OCR:
Word file (available only when using the Abbyy Finereader OCR engine!)
Localized text recognition (Zonal OCR)
You may also configure the text recognition engine to process specific areas within images. This is referred to as zonal OCR and may be applied in two different ways:
You may automatically apply zonal OCR, e.g. for creating index data. All settings required will need to be made prior to the start of a project. Please find detailed information on this in chapter Project settings, in the section on Image processing. Alternatively, you may perform zonal OCR manually, during the course of a project. Recognized text may either be inserted into index fields or placed in the Windows clipboard (from where it can be transferred to other applications). Please find detailed information on this in the chapter on Menu bar functions, in section Edit toolbar.
Image processing functions affecting or supporting OCR
The following image processing functions will improve or affect OCR results.
Please refer to respective sections, for more information:
Color replacement - use to remove background coloring. Deskew - use to re-align text, so that it is truly perpendicular. Line removal - use to get rid of interfering lines or frames. Punch hole removal - use to get rid of interfering punch-holes. Despeckle - use to get rid of interfering smudges and speckles.