user's manual readiris - claro software · i.r.i.s. detains the copyrights to the readiris...

128
USE USE USE USE U S E R’S R’S R’S R’S R’S GUIDE GUIDE GUIDE GUIDE GUIDE

Upload: vuliem

Post on 30-Oct-2018

226 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

I

USER’S GUIDE

U S EU S EU S EU S EU S E R’SR’SR’SR’SR’SG U I D EG U I D EG U I D EG U I D EG U I D E

1foreword.PMD 24/11/2005, 11:591

Page 2: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

II

Readiris Pro

© 1990-2006 I.R.I.S. All rights reservedOCR technology by I.R.I.S.

Connectionist, AutoFormat and Linguistic technology by I.R.I.S.ICR and bar code reading technology by I.R.I.S.

© 1990-2006 I.R.I.S. All rights reserved

1foreword.PMD 24/11/2005, 11:592

Page 3: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

III

USER’S GUIDE

SSSSSAAAAAVEVEVEVEVE TTTTTIMEIMEIMEIMEIME, N, N, N, N, NOOOOO M M M M MOREOREOREOREORE R R R R RETYPINGETYPINGETYPINGETYPINGETYPING!!!!!

Congratulations on acquiring Readiris. This software package will undoubt-edly be of great help in recapturing your texts, tables, graphics, bar codes andeven handwritten text.

As efficient as computers are, you have to key in your information first. If youhave ever retyped a 15 page report or a large table of figures, you know howtedious and time-consuming it can be. Use this state-of-the-art OCR package toautomatically enter text in your applications and you’ll acquire an unprecedentedlevel of efficiency and comfort!

Scan a printed or typed document, indicate the zones of interest - or have thesystem detect them for you -, execute the character recognition and export thedocument to your wordprocessor. Documents composed of many pages are pro-cessed from start to finish in a single effort. A few mouse clicks beat long hoursof work as Readiris converts your paper and PDF documents into editable com-puter files: it’s up to 40 times faster than manual retyping!

With the automatic mode of operation, the user’s effort is reduced to a singleclick: he initiates the scanning and saves the text result, all intermediate steps aretaken care of by Readiris. After the recognition, you can send the reading resultsdirectly to your favorite applications - be that a wordprocessor, spreadsheet orweb browser.

Readiris recognizes tabular data and recreates them as worksheets or as tableobjects inside your wordprocessor; your numeric data are immediately ready forfurther processing.

Based on the Connectionist technology from I.R.I.S., Readiris represents thebest OCR has to offer. Font-independant feature extraction is complemented byself-learning techniques derived from a proprietary neural network. The systemcan learn new characters through context analysis: linguistic knowledge aboutsyllables and words improves the OCR performance.

Readiris supports up to 123 languages: all American and European languagesare supported, including the Central-European languages, the Baltic languages,

1foreword.PMD 24/11/2005, 11:593

Page 4: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

IV

Greek and the Cyrillic (“Russian”) languages. (Optionally, you can read Hebrewdocuments and four Asian languages - Japanese, Simplified and Traditional Chi-nese and Korean.) Readiris even copes with mixed alphabets: the software de-tects “Western” words that pop up in Greek, Cyrillic, Hebrew and Asian docu-ments - many untranscrible proper names, brand names etc. are written using theWestern symbols.

Readiris uses linguistics during the recognition phase, not after it. As a directresult, Readiris recognizes documents of all kinds with top accuracy, includinglow-quality documents, faxes and dot matrix printouts. It copes beautifully withbadly scanned and copied documents containing too light or dark font shapes.Joined characters (“ligatures”) are resolved and fragmented forms, such as dotmatrix symbols, are recomposed.

User verification in pop-up style not only flags doubtful characters but alsoincreases the system’s precision. All solutions confirmed by the user are memo-rized, increasing speed and confidence as you go along. Using Readiris meansrendering it more intelligent each time! This powerful learning tool allows you totrain Readiris on special characters such as mathematic symbols and dingbatsbut also to handle distorted fonts as you will find in real documents.

To increase your productivity further, Readiris not only recognizes your texts,but can format them for you as well! Make use of “autoformatting” and Readirisrecreates a facsimile copy of the scanned document: the word, paragraph andpage formatting of the original document are retained.

Similar typefaces are used, the point sizes and typestyles as used in the sourcedocument are maintained across the recognition. The placement of columns, textblocks and graphics follows your original documents. And as Readiris supportsgreyscale and color scanning effortlessly, you can recapture any graphics - bethey lineart, black-and-white photos or color illustrations. When a document con-tains tables, Readiris reorganizes them in real cells and recreates the cell bordersof the original tables.

In other words, Readiris allows you to archive a true copy of your documents,be it editable and compact text files instead of scanned images! Various levels offormatting are available, the choice is up to the user.

1foreword.PMD 24/11/2005, 11:594

Page 5: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

V

USER’S GUIDE

Bar codes that occur on a scanned page can also be read, and the same goesfor handwritten text - such text can be captured as long as you write well-spaced“block letters”.

Readiris supports a wide range of popular scanners: numerous flatbed scan-ners, sheetfed scanners, “all-in-one” devices or “MFPs” (“multifunctional pe-ripherals”) and digital cameras can be used. Readiris supports the Twain scan-ning standard: all models that dispose of a Twain driver are seamlessly supported.

TTTTTABLEABLEABLEABLEABLE OFOFOFOFOF C C C C CONTENTSONTENTSONTENTSONTENTSONTENTS

Save Time, No More Retyping! ..................................................................................... IIITable of Contents ........................................................................................................... VCredits and Copyrights ................................................................................................ VII

Chapter 1: InstallationChapter 1: InstallationChapter 1: InstallationChapter 1: InstallationChapter 1: InstallationSystem Requirements ................................................................................................... 1-1Installing the Readiris Software ................................................................................... 1-1Installing Software Options ......................................................................................... 1-3Installing Related Products .......................................................................................... 1-6Installed Files ............................................................................................................... 1-7

Read Me file and documentation ..................................................................................................... 1-7Handprinting form ............................................................................................................................ 1-7

Uninstalling the Readiris Software ............................................................................... 1-7Register to Vote! ........................................................................................................... 1-8Comfort Isn't Laziness! ................................................................................................ 1-9Installing Your Scanner under Readiris ...................................................................... 1-10Getting Product Support ............................................................................................ 1-11Getting in Touch with I.R.I.S. ..................................................................................... 1-11

Chapter 2: Guided Chapter 2: Guided Chapter 2: Guided Chapter 2: Guided Chapter 2: Guided TTTTTourourourourourStarting the Software up .............................................................................................. 2-1Discovering the Readiris Interface ............................................................................... 2-2Customizing the User Interface .................................................................................... 2-5Getting Started with a First Tutorial ............................................................................. 2-7Zooming in on Images ............................................................................................... 2-11

1foreword.PMD 24/11/2005, 11:595

Page 6: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

VI

One, Decomposing a Scanned Image ......................................................................... 2-14One and a Half, Sorting Windows .............................................................................. 2-17Two, Windowing a Scanned Image Manually ............................................................ 2-20Three, Saving Windowing Templates ........................................................................ 2-24Readiris Takes You around the World ........................................................................ 2-26Readiris Changes Languages As Needed .................................................................. 2-29Defining the Document Characteristics ..................................................................... 2-31Readiris Gets More Intelligent Each Time! ................................................................. 2-33

Learn ................................................................................................................................................. 2-35Don’t Learn ..................................................................................................................................... 2-35Delete ................................................................................................................................................ 2-36Undo .................................................................................................................................................. 2-36Finish ................................................................................................................................................ 2-36Abort ................................................................................................................................................. 2-37

The Role of Font Dictionaries .................................................................................... 2-37Saving the Results in a Text File ................................................................................ 2-38Sending Output Directly to Your Application ............................................................ 2-40Seeing the Text Result ................................................................................................ 2-43Recognizing Multiple Pages ...................................................................................... 2-44Printing the Images .................................................................................................... 2-47Editing Multipage Documents ................................................................................... 2-50Starting a New Document .......................................................................................... 2-52Recognizing Text Zones ............................................................................................. 2-53Organizing the Text Output ........................................................................................ 2-54Setting up Your Scanner ............................................................................................ 2-55Scanning Documents ................................................................................................. 2-57Let the Bad Color Not Be Seen .................................................................................. 2-61Different Devices, Different Resolution ..................................................................... 2-61Adjusting the Scanned Images .................................................................................. 2-65Saving Default Settings ............................................................................................. 2-71Saving Specific Settings ............................................................................................ 2-72Recognizing Pages Automatically .............................................................................. 2-73Readiris Recreates Your Document Layout ................................................................ 2-73Columns Please, Not Frames! ..................................................................................... 2-78Text Formatting, Part 2 ............................................................................................... 2-82Exporting Text Several Times ..................................................................................... 2-83Creating Portable Documents .................................................................................... 2-83

1foreword.PMD 24/11/2005, 11:596

Page 7: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

VII

USER’S GUIDE

... Or Reading Them .................................................................................................... 2-89Saving Graphics Separately ....................................................................................... 2-91Reading Faxes and Deferred Recognition .................................................................. 2-95Recognizing Tables .................................................................................................... 2-97Recognizing Handwritten Text ................................................................................. 2-102Reading Bars and Spaces ......................................................................................... 2-105Getting On-line Help ................................................................................................ 2-107

CCCCCREDITSREDITSREDITSREDITSREDITS ANDANDANDANDAND C C C C COPYRIGHTSOPYRIGHTSOPYRIGHTSOPYRIGHTSOPYRIGHTS

The Readiris software is designed and developed by I.R.I.S. OCR, ICR, barcode reading, Connectionist, AutoFormat and Linguistic technology by I.R.I.S.I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, theICR technology, the bar code reading technology, the linguistic technology, theon-line help system and this manual.

AutoFormat, Cardiris, Connectionist, I.R.I.S. Linguistic Technology, the I.R.I.S.logo and Readiris are trademarks of I.R.I.S.

XML parser developed by Apache. This product includes software developedby the Apache Software Foundation (www.apache.org).

Acrobat and Reader are (registered) trademarks of Adobe. Apple, AppleWorks,Mac OS and Safari are (registered) trademarks of Apple. Entourage, Excel andWord are (registered) trademarks of Microsoft.

1foreword.PMD 24/11/2005, 11:597

Page 8: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

VIII

1foreword.PMD 24/11/2005, 11:598

Page 9: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

1 - 1

USER’S GUIDE

Chapter 1I N S TA L L AT I O N

This chapter discusses the system requirements and installation of the Readirissoftware.

SSSSSYSTEMYSTEMYSTEMYSTEMYSTEM R R R R REQUIREMENTSEQUIREMENTSEQUIREMENTSEQUIREMENTSEQUIREMENTS

This is the minimal system configuration required to use Readiris:a Mac OS computer with a G3 processor.the operating system Mac OS X version 10.3 (“Panther”). Earlier ver-sions of the Mac OS operating system are not supported!110 MB of free hard disk space.

IIIIINSTNSTNSTNSTNSTALLINGALLINGALLINGALLINGALLING THETHETHETHETHE R R R R READIRISEADIRISEADIRISEADIRISEADIRIS S S S S SOFTWOFTWOFTWOFTWOFTWAREAREAREAREARE

The Readiris software is delivered compressed. To install, it is mandatory torun the installation program.

1. Insert the Readiris CD-ROM.2. Double-click on the Readiris installer and follow the on-screen instruc-

tions.You are recommended to use the “easy” installation - it places all thenecessary files on your hard disk, including the sample images whichare used in the tutorial of this manual.

2chapter1.PMD 24/11/2005, 11:031

Page 10: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

1 - 2

The Readiris folder is created automatically by the installation program underthe "Applications" folder.

2chapter1.PMD 24/11/2005, 11:032

Page 11: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

1 - 3

USER’S GUIDE

IIIIINSTNSTNSTNSTNSTALLINGALLINGALLINGALLINGALLING S S S S SOFTWOFTWOFTWOFTWOFTWAREAREAREAREARE O O O O OPTIONSPTIONSPTIONSPTIONSPTIONS

There are two software options available for the Readiris software: the “AsianOCR add-on” and the “Hebrew OCR add-on”. The “Asian OCR add-onAsian OCR add-onAsian OCR add-onAsian OCR add-onAsian OCR add-on”allows you to read Japanese, Traditional Chinese, Simplified Chinese and Ko-rean.

2chapter1.PMD 24/11/2005, 11:033

Page 12: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

1 - 4

The “Hebrew OCR add-onHebrew OCR add-onHebrew OCR add-onHebrew OCR add-onHebrew OCR add-on” predictably allows you to recognize Hebrewdocuments.

2chapter1.PMD 24/11/2005, 11:034

Page 13: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

1 - 5

USER’S GUIDE

By installing the “Asian OCR add-on”, specific documentation becomes avail-able that discusses how you can recognize Asian documents.

2chapter1.PMD 24/11/2005, 11:035

Page 14: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

1 - 6

IIIIINSTNSTNSTNSTNSTALLINGALLINGALLINGALLINGALLING R R R R RELAELAELAELAELATEDTEDTEDTEDTED P P P P PRODUCTSRODUCTSRODUCTSRODUCTSRODUCTS

Depending on the software bundle you acquired, Readiris may be suppliedwith an evaluation version of the related product Cardiris, a business cardbusiness cardbusiness cardbusiness cardbusiness cardreaderreaderreaderreaderreader.

If this free software package is included on your Readiris CD-ROM, it is alsoinstalled by following the on-screen instructions. Contact I.R.I.S. to learn moreabout complementary software.

2chapter1.PMD 24/11/2005, 11:036

Page 15: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

1 - 7

USER’S GUIDE

IIIIINSTNSTNSTNSTNSTALLEDALLEDALLEDALLEDALLED F F F F FILESILESILESILESILES

The installation program has created a folder under "Applications" where theReadiris files are located.

Read Me file and documentationRead Me file and documentationRead Me file and documentationRead Me file and documentationRead Me file and documentationREADME.HTM “Read Me” file (in HTML format)MANUAL.PDF User’s manual (in Adobe Acrobat format)

Handprinting formHandprinting formHandprinting formHandprinting formHandprinting formTEMPLATE.PDF Blank handprinting form for reprinting (in Adobe

Acrobat format)TEMPLATE.DOC Blank handprinting form for editing (in Word format)

UUUUUNINSTNINSTNINSTNINSTNINSTALLINGALLINGALLINGALLINGALLING THETHETHETHETHE R R R R READIRISEADIRISEADIRISEADIRISEADIRIS S S S S SOFTWOFTWOFTWOFTWOFTWAREAREAREAREARE

Uninstalling the Readiris software is very easy: run the installer again, selectthe installation option "Uninstall" and click the "Uninstall" button. (The same goesfor the software options: run the “uninstaller” of the software options to erasethem!)

2chapter1.PMD 24/11/2005, 11:037

Page 16: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

1 - 8

RRRRREGISTEREGISTEREGISTEREGISTEREGISTER TOTOTOTOTO V V V V VOTEOTEOTEOTEOTE!!!!!

Don’t forget to register your Readiris license! Doing so will allow us to keepyou informed of future product developments and related I.R.I.S. products. Theregistration benefits, including free product supportproduct supportproduct supportproduct supportproduct support and special offersspecial offersspecial offersspecial offersspecial offers, arestrictly limited to registered users.

We invite you to register your licence by submitting a registration form on theI.R.I.S. web site - this method obviously requires an Internet connection! Youcan access the registration form with the command "Register Readiris" under the"Help" menu.

You can register in many ways, not just via the web: by faxing or sending inyour registration card and by calling I.R.I.S. during working hours.

2chapter1.PMD 24/11/2005, 11:038

Page 17: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

1 - 9

USER’S GUIDE

CCCCCOMFOROMFOROMFOROMFOROMFORTTTTT I I I I ISNSNSNSNSN'''''TTTTT L L L L LAZINESSAZINESSAZINESSAZINESSAZINESS!!!!!

Some additional steps can be completed for maximal ease of use of Readiris.Drag the Readiris application to the dock dock dock dock dock to make it available at all times.

(You can drag the application away from the dock to remove it again.) Also know

2chapter1.PMD 24/11/2005, 11:039

Page 18: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

1 - 10

that the dock is personal: each user that logs on to a machine may have his ownset of applications on the dock!

IIIIINSTNSTNSTNSTNSTALLINGALLINGALLINGALLINGALLING YYYYYOUROUROUROUROUR S S S S SCANNERCANNERCANNERCANNERCANNER UNDERUNDERUNDERUNDERUNDER R R R R READIRISEADIRISEADIRISEADIRISEADIRIS

Readiris exploits the TTTTTwain driverwain driverwain driverwain driverwain driver of each scanner to support it. In otherwords, as soon as there’s a Twain driver available for your scanner model, Readirissupports it effortlessly!

Here’s how you install your scanner under Readiris.1. Install the scanner drivers using the CD-ROM that comes with your

scanner. Doing so will install the Twain driver on your computer. (Ifnecessary, study the installation instructions that accompany your scan-ner carefully to ensure that these drivers are installed properly.)

2. Verify if the scanner operates correctly with any scanning applicationother than Readiris.

3. Start up the Readiris software.4. Select your scanner model under Readiris with the option "Scanner" in

the "Preferences" command under the "Readiris" menu.

2chapter1.PMD 24/11/2005, 11:0310

Page 19: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

1 - 11

USER’S GUIDE

More about scanner support can be found in the “Read Me” file“Read Me” file“Read Me” file“Read Me” file“Read Me” file that comeswith the Readiris software.

Don’t hesitate to contact your scanner manufacturer or its representative shouldthere be problems with scanner drivers. Most manufacturers allow you to down-load the latest versions of the scanners drivers from their web site.

GGGGGETTINGETTINGETTINGETTINGETTING P P P P PRODUCTRODUCTRODUCTRODUCTRODUCT S S S S SUPPORUPPORUPPORUPPORUPPORTTTTT

The Readiris on-line help details how you can get technical supporttechnical supporttechnical supporttechnical supporttechnical support. Amongother things, you can contact I.R.I.S. by e-mail at the [email protected] and [email protected].

Please describe the phenomenon you experience clearly and include all rel-evant data concerning Readiris, your scanner and your computer system.

You may also check whether software updatessoftware updatessoftware updatessoftware updatessoftware updates are available to be down-loaded. Use the command "Search for Updates" under the "Help" menu to do so.

GGGGGETTINGETTINGETTINGETTINGETTING INININININ T T T T TOUCHOUCHOUCHOUCHOUCH WITHWITHWITHWITHWITH I.R.I.S. I.R.I.S. I.R.I.S. I.R.I.S. I.R.I.S.

You can also contact I.R.I.S. to learn more about its range of software solu-tions.

The Readiris startup screen and the command "I.R.I.S. on the Internet" underthe "Help" menu of Readiris bring you directly to the I.R.I.S. home page(www.irislink.com).

2chapter1.PMD 24/11/2005, 11:0311

Page 20: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

1 - 12

2chapter1.PMD 24/11/2005, 11:0312

Page 21: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 1

USER’S GUIDE

Chapter 2GUIDED TOUR

Readiris is a state-of-the-art OCR package equipped with numerous advancedfeatures. We will discuss all major features in this chapter and add many tips andhints concerning the use of Readiris.

SSSSSTTTTTARARARARARTINGTINGTINGTINGTING THETHETHETHETHE S S S S SOFTWOFTWOFTWOFTWOFTWAREAREAREAREARE UPUPUPUPUP

Double-click on the Readiris application in the Readiris folder (under "Appli-cations") or click the application icon on the dock.

3chapter2.PMD 24/11/2005, 12:031

Page 22: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 2

The Readiris startup screen and the menu bar of the Readiris software aredisplayed. The startup screen displays the version and copyrights of the Readirissoftware. It also indicates the I.R.I.S. home pagehome pagehome pagehome pagehome page - enter the URLwww.irislink.com in your web browser to visit the I.R.I.S. web site.

DDDDDISCOVERINGISCOVERINGISCOVERINGISCOVERINGISCOVERING THETHETHETHETHE R R R R READIRISEADIRISEADIRISEADIRISEADIRIS I I I I INTERFNTERFNTERFNTERFNTERFACEACEACEACEACE

The Readiris application not only contains a menu barmenu barmenu barmenu barmenu bar but also an imagewindow and two button bars that give quick access to all frequent commands.

3chapter2.PMD 24/11/2005, 12:032

Page 23: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 3

USER’S GUIDE

The vertical main toolbar main toolbar main toolbar main toolbar main toolbar gives quick access to all frequent general com-mands, the horizontal image toolbarimage toolbarimage toolbarimage toolbarimage toolbar contains all common commands you needduring the image preview.

As soon as pages get processed, the third toolbar, the page toolbarpage toolbarpage toolbarpage toolbarpage toolbar on theleft side, is put to use: it represents the various pages of the document and givesaccess to the page commands using Ctrl-click operations.

3chapter2.PMD 24/11/2005, 12:033

Page 24: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 4

To learn which command corresponds to a certain button, hold your mousepointer over it for a while: a tooltip tooltip tooltip tooltip tooltip will tell you what the button does. (Thewindow pane or image zone is where the scanned images are displayed.)

The status barstatus barstatus barstatus barstatus bar displays all system information and gives information on thecurrent image - the image size (in image pixels and in KB), the image type (bithdepth) and the image resolution. (When the image window is too small, someinformation may not be visible.)

3chapter2.PMD 24/11/2005, 12:034

Page 25: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 5

USER’S GUIDE

CCCCCUSTUSTUSTUSTUSTOMIZINGOMIZINGOMIZINGOMIZINGOMIZING THETHETHETHETHE U U U U USERSERSERSERSER I I I I INTERFNTERFNTERFNTERFNTERFACEACEACEACEACE

The image toolbar that contains the preview commands can be customizedwith the "Customize" button on the image toolbar (and with the command "Cus-tomize Toolbar" under the "View" menu).

Drag your favorite commands to the image toolbar - adding separators andspaces where needed. But maybe we should issue a warning first: only custom-ize the user interface when you’re sufficiently familiar with the operation ofReadiris. Otherwise, you might quickly come to the conclusion that essential com-mands are missing from the image toolbar... On the other hand, no irreversibleharm is ever done: just drag the default toolbar to restore the factory imagetoolbar!

3chapter2.PMD 24/11/2005, 12:035

Page 26: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 6

This command also determines the layout of the buttons on that toolbar: doyou prefer regular-size or small-size buttons? Should the buttons only contain anicon, only a text label or an icon with a text label underneath?

3chapter2.PMD 24/11/2005, 12:036

Page 27: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 7

USER’S GUIDE

Another minor option concerning the user interface is found in the "Prefer-ences" command under the "Readiris" menu: you can enable the “brushed-metal”or white “Aqua” look for the Readiris interface.

GGGGGETTINGETTINGETTINGETTINGETTING S S S S STTTTTARARARARARTEDTEDTEDTEDTED WITHWITHWITHWITHWITH AAAAA F F F F FIRSTIRSTIRSTIRSTIRST TTTTTUTUTUTUTUTORIALORIALORIALORIALORIAL

The best way to become familiar with the operation of Readiris is undoubtedlyby using it. A number of prescanned imagesprescanned imagesprescanned imagesprescanned imagesprescanned images is provided with the software;they allow you to get started even when there is no scanner connected to yourcomputer. Let’s turn to them now.

Readiris allows you to scan images using your scanner and open prescannedimages: select "File" as image source and use the "Open" button to open prescannedimages, select your scanner as image source and use the "Acquire" button to

3chapter2.PMD 24/11/2005, 12:037

Page 28: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 8

acquire images with your scanner. (You can also set the image source with the"Preferences" command under the "Readiris" menu and you can acquire imageswith the commands "Open Document" and "Acquire Document" under the "File"menu.)

Color, greyscale and black-and-white images are supported on an equal basis:Readiris allows you to open FlashPix images, GIF images, JPEG images, JPEG2000 images, MacPaint images, Photoshop images, PICT images, PNG images,QuickDraw GX images, QuickTime images, Silicon Graphics images, Targa im-ages, (uncompressed, packbits and Group 3 compressed) TIFF images, multipageTIFF images and Windows bitmaps (BMP). (Readiris also opens Adobe AcrobatPDF documents.)

Loading prescanned images is particularly useful to convert your faxes faxes faxes faxes faxes intoeditable text files.

Select your hard disk as image source, click the "Open" button and go to thefolder "Images" under the Readiris folder.

3chapter2.PMD 24/11/2005, 12:038

Page 29: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 9

USER’S GUIDE

Open the image English.jpg in the image folder; the image is read from diskand displayed in the image zone.

3chapter2.PMD 24/11/2005, 12:039

Page 30: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 10

For every greyscale and color image, a black-and-white version is generatedfor the OCR process (“binarization”).

3chapter2.PMD 24/11/2005, 12:0310

Page 31: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 11

USER’S GUIDE

To display a greyscale or color image as black-and-white, disable the option"Image in Color" under the "View" menu.

There’s another way to import image files into Readiris. Drop them on theReadiris icon: Readiris starts up and the image file is opened automatically.

The image toolbar contains all the commands you need during the image pre-view: tools to analyze the page, to indicate the zones of interest, to rotate theimage etc.

ZZZZZOOMINGOOMINGOOMINGOOMINGOOMING INININININ ONONONONON I I I I IMAGESMAGESMAGESMAGESMAGES

Readiris has several commands that allow you to zoom zoom zoom zoom zoom in on the scannedimage, for instance to verify the scanning quality.

Click the "View" button on the image toolbar (or go the "View" menu) todiscover the zoom levels: you can zoom in at real size, display the image at 50%and 200% of its actual size, fit the image to the page width and to fit the entireimage in the preview window. At actual size, a screen pixel corresponds to animage pixel. (Shortcuts are available for all zoom levels!)

3chapter2.PMD 24/11/2005, 12:0311

Page 32: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 12

Note that the current zoom level is indicated in the window title - there’s nozoom level mentioned when the image fits the window or the page width.

Execute Shift-Command-clicks to go successively through the various zoomlevels. You can also Command-click the mouse button over a region of the scannedimage to zoom in at real size immediately. Command-click a second time to zoomout again. As soon as you press the Command key over the image preview, themouse cursor is adapted!

3chapter2.PMD 24/11/2005, 12:0312

Page 33: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 13

USER’S GUIDE

Finally, the magnifying glassmagnifying glassmagnifying glassmagnifying glassmagnifying glass allows you to zoom in on specific details of theacquired images. Click the button "Magnifying Glass" on the image toolbar (orShift-Option-click) and drag the mouse across the image.

3chapter2.PMD 24/11/2005, 12:0313

Page 34: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 14

OOOOONENENENENE, D, D, D, D, DECOMPOSINGECOMPOSINGECOMPOSINGECOMPOSINGECOMPOSING AAAAA S S S S SCANNEDCANNEDCANNEDCANNEDCANNED I I I I IMAGEMAGEMAGEMAGEMAGE

Now that the image is scanned, you have to indicate which parts you want toconvert into editable text by drawing frames, so-called “windows”, around thezones of interest.

Actually, Readiris will do this for you automatically when the option "PageAnalysis" under the "Options" button (or under the "Layout" menu) is enabled.The page analysis is enabled by default.

3chapter2.PMD 24/11/2005, 12:0314

Page 35: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 15

USER’S GUIDE

To force Readiris to decompose the current page - because you disabled pageanalysis by accident, because you erased some windows erroneously and wantto redo the page analysis etc. -, you can simply click the button "Analyze Page"on the image toolbar (or click the command "Analyze Page" under the "Process"menu).

Select the document language before executing the page analysis when youare dealing with Asian and Hebrew documents. Specific routines are used forthese languages: the interline spacing of Asian documents is in most cases biggerthan in Western documents, the text is made up of small icons (“ideograms”) thatcould easily be seen as graphic zones in Western documents and the text may runfrom top to bottom, from right to left. In Hebrew documents, the text runs fromright to left. And if you forgot to select the proper language, select it afterwards.Readiris re-executes the page analysis automatically!

Automatic page decomposition is particularly useful when columnized textscolumnized textscolumnized textscolumnized textscolumnized textsand documents with a complex page layout, possibly including graphics and tables,are recognized.

Page decomposition uses three window typeswindow typeswindow typeswindow typeswindow types: text, graphic and table win-dows. Readiris discriminates text blocks, tables and graphic zones containingphotos, illustrations etc. on the page. (Saving graphics and recognizing tables willbe discussed at great length below.)

Two extra zone types are always drawn manually: bar code zones andhandprinting zones. (More about bar code reading and the recognition of hand-written “block letters” later.)

A specific icon marks each zone type.

Also note that you can Ctrl-click a zone to change its type (and to delete it)!

3chapter2.PMD 24/11/2005, 12:0415

Page 36: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 16

Page analyisis is fast, skew-tolerant and highly accurate: it traces complex,“irregular” shapes.

The page analysis will even detect zones where you get white text on awhite text on awhite text on awhite text on awhite text on ablack backgroundblack backgroundblack backgroundblack backgroundblack background. Recognizing such inserts is no problem: while the previewdisplays the scanned document correctly on-screen, Readiris “inverts” the imagewhen the need arises to recognize such text blocks!

Some documents have many “stray” dots on the page, may generate a blackpage border around the actual image etc. To erase all small windows - it’s as-sumed they don’t contain any text - and re-sort the remaining zones, you canclick the command "Delete Small Zones" under the "Layout" menu.

3chapter2.PMD 24/11/2005, 12:0416

Page 37: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 17

USER’S GUIDE

Another similar routine is automatic: the detection of zones on the page bor-ders. When this routine is disabled under the "Layout" menu, the page analysisignores any zones that touch the page borders. When your scanner generatesblack borders around the actual image, page analysis tends to find zones wherethere’s only “noise”. Graphic zones on the page borders are left untouched: pho-tos often touch the page borders, background graphics in most cases cover theentire page etc.

OOOOONENENENENE ANDANDANDANDAND AAAAA H H H H HALFALFALFALFALF, S, S, S, S, SORORORORORTINGTINGTINGTINGTING WWWWWINDOWSINDOWSINDOWSINDOWSINDOWS

Readiris not only detects the various blocks, but also sorts them: the zones aresorted top-down, left to right by default to cope with columnized documents.Numbers indicate the sort order.

3chapter2.PMD 24/11/2005, 12:0417

Page 38: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 18

Evidently, you can modify the sort ordersort ordersort ordersort ordersort order. To do so, click the "Sort" button (oruse the command "Sort Zones" under the "Layout" menu).

3chapter2.PMD 24/11/2005, 12:0418

Page 39: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 19

USER’S GUIDE

The mouse cursor changes as soon as the “sort mode” is enabled.Click on the windows you want to include. Windows you do not click on are

simply ignored, excluded from recognition. It’s easy to see which zones are se-lected and which aren’t: the selected windows are numbered, the non-selectedwindows aren’t.

3chapter2.PMD 24/11/2005, 12:0419

Page 40: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 20

TTTTTWOWOWOWOWO, , , , , WWWWWINDOWINGINDOWINGINDOWINGINDOWINGINDOWING AAAAA S S S S SCANNEDCANNEDCANNEDCANNEDCANNED I I I I IMAGEMAGEMAGEMAGEMAGE M M M M MANUALLANUALLANUALLANUALLANUALLYYYYY

Page analysis is the automatic way of zoning a scanned page. Alternatively,you can zone an image manually with the windowing toolswindowing toolswindowing toolswindowing toolswindowing tools of Readiris. Theseare available on the image toolbar and under the "Layout" menu.

(As indicated earlier, bar code and handprinting windows are always drawnmanually by the user: the page analysis does not detect them for you!)

To draw draw draw draw draw a rectangle around a zone of interest, select the corresponding toolin the image toolbar (or under the "Layout" menu) and drag the cursor from theupper left corner to the lower right corner of the window. (Sides smaller than 1mm are not allowed, they wouldn’t even contain a single character anyway.)

Not to worry should you have selected the wrong zone type: you can quicklychange the type by Ctrl-clicking the mouse over a window.

3chapter2.PMD 24/11/2005, 12:0420

Page 41: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 21

USER’S GUIDE

The windows are automatically sorted in the order of creation: numbers indi-cate the sort order. When you hold your cursor over the status bar for a fewseconds, a tooltip will tell you how many zones of each type were created.

You can also frame “irregular” text blocks by drawing polygonal windowspolygonal windowspolygonal windowspolygonal windowspolygonal windowsaround them. Non-rectangular windows are created by merging rectangular zones:as soon as two rectangles (of the same type) intersect, they become a singlewindow automatically! In a way, you’re building a house by adding one roomafter the other... (Creating polygonal table and bar code windows doesn’t makeany sense.)

3chapter2.PMD 24/11/2005, 12:0421

Page 42: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 22

Furthermore, manual zoning can be combined with window sorting: you candraw new windows even when the “sort mode” is enabled. You then use sortingto include a number of detected windows and manually create some other win-dows where the page analysis didn’t yield the appropriate results. As soon as youstart creating windows in the “sort mode”, all windows you didn’t select arepromptly erased!

To modify, move and delete windows, you need to select select select select select them first. To do so,choose the window selection tool in the image toolbar and click inside a window.Rectangular markers now appear at each corner and in the middle of the windowsides.

3chapter2.PMD 24/11/2005, 12:0422

Page 43: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 23

USER’S GUIDE

To unselect unselect unselect unselect unselect windows, click the mouse button elsewhere. To select addi-addi-addi-addi-addi-tional windowstional windowstional windowstional windowstional windows, hold down the Shift key while clicking on these extra windows.

So much for selecting zones. To modify modify modify modify modify a window, select it, put your mousecursor over a marker and drag the side to change the window size.

To move move move move move a window, simply select it and drag it to another location.To delete delete delete delete delete a window, you can Ctrl-click it and select the command "Delete

Zone".

To delete several windows, select them and choose the "Cut" or "Clear" com-mand from the "Edit" menu. The "Cut" command cuts the windows to an internalbuffer, "Clear" erases the windows irretrievably. When you paste zones, they areinserted in their original position, and you have to drag them to their new location.

(The command "Clear" also executes an operation that concerns scannedpages, not zones: it deletes the pages that are selected in the page toolbar. Still,it’s probably easier to drag pages to be deleted to the trashcan at the bottom ofthe page toolbar...)

3chapter2.PMD 24/11/2005, 12:0423

Page 44: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 24

In fact, all familiar commands from the "Edit" menu apply to the windows: youcan delete, cut, copy and paste them! The "Undo" command also applies: if youhave unfortunately deleted, moved, resized etc. some zones, "Undo" will cancelthe last operation.

Also note that shortcuts are available for all commands! Let’s give an ex-ample: to erase all existing windows, you can choose the command "Select All" orits shortcut Command-A and click the command "Clear" or its shortcut BackSpace.Alternatively, you can use the command "Delete All Zones" under the "Layout"menu to erase all windows simultaneously.

You are now ready to recreate the necessary layout. To restore the previouslayout, you can choose "Undo" or the shortcut Command-Z. Or click "Undo"once more to erase the windows a second time...

TTTTTHREEHREEHREEHREEHREE, S, S, S, S, SAAAAAVINGVINGVINGVINGVING WWWWWINDOWINGINDOWINGINDOWINGINDOWINGINDOWING TTTTTEMPLAEMPLAEMPLAEMPLAEMPLATESTESTESTESTES

The resulting windowing layouts can be saved as zoning templates zoning templates zoning templates zoning templates zoning templates forfuture use with the command "Save" under the "Layout" menu and loaded intomemory with the command "Open" under the "Layout" menu.

3chapter2.PMD 24/11/2005, 12:0424

Page 45: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 25

USER’S GUIDE

If you have to recognize documents with a similar layout, for instance a 50page report where the header and footer should be excluded for obvious reasons,a single template can be applied to zone all 50 pages.

When you load a template into memory, the page analysis is disabled auto-matically. The zoning template remains active until you re-enable the page analy-sis.

Actually, there’s a nice alternative for zoning templates: the preview tool "Ig-nore Exterior Area" limits the page decomposition to the “cropped” portion of theimage.

Select this tool and frame the portion of the image you want to process. Whenyou’re dealing with a multipage document, you can exclude the same outer zonefrom page analysis on every page. (Re-execute the page analysis to cancel theimage “cropping”, or change the zones manually.)

3chapter2.PMD 24/11/2005, 12:0425

Page 46: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 26

RRRRREADIRISEADIRISEADIRISEADIRISEADIRIS TTTTTAKESAKESAKESAKESAKES YYYYYOUOUOUOUOU AROUNDAROUNDAROUNDAROUNDAROUND THETHETHETHETHE WWWWWORLDORLDORLDORLDORLD

Assuming that the windows are correctly defined, you are now almost readyto execute the character recognition. We say “almost”, because we haven’t veri-fied the language and document settings yet.

3chapter2.PMD 24/11/2005, 12:0426

Page 47: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 27

USER’S GUIDE

The language setting can be found on the main toolbar.

Readiris is far from limited to English: up to 123 languages languages languages languages languages are supported! AllAmerican and European languages are supported, including the Central-Euro-pean languages, Greek, Turkish, the Cyrillic (“Russian”) and the Baltic languages.

Optionally, you can read Hebrew Hebrew Hebrew Hebrew Hebrew and Asian documentsAsian documentsAsian documentsAsian documentsAsian documents: the extra module“Hebrew OCR add-on” predictably offers recognition of Hebrew documents,the software option “Asian OCR add-on” offers recognition of Japanese, Simpli-fied Chinese, Traditional Chinese and Korean. (Simplified Chinese is used onChina’s mainland and in Singapore, where Traditional Chinese is used by HongKong, Taiwan, Macau and the overseas Chinese communities.)

Also note that the British and American - or should we say “international”? -variants of the English language are distinguished. The same goes for Spanishand Mexican.

3chapter2.PMD 24/11/2005, 12:0427

Page 48: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 28

Just selecting the document language may not suffice to generate texts in“exotic” languages. You must also ensure that your operating system “handles”the output correctly! For example, how do you display Japanese or Chinese docu-ments properly on your computer? Refer to the Readiris “ReadMe” file“ReadMe” file“ReadMe” file“ReadMe” file“ReadMe” file formore information on this subject.

Selecting the proper document language is imperative. Based on the selectionof a language, the software knows which symbol setsymbol setsymbol setsymbol setsymbol set to recognize. Multi-lin-guistic support ensures that “exotic” characters such as ç, ß, ñ, γ and ø arerecognized correctly.

Secondly, the software extensively uses linguistic databaseslinguistic databaseslinguistic databaseslinguistic databaseslinguistic databases to validate itsresults. Suppose that you have to read the word "president" where an ink stainmakes the "r" look like an "f". Looking things up in the English lexicon, Readiriswill detect autonomously that the word "president" is being read and that it doesn’tmake any sense to recognize the symbol "f". This “self-learning” technique“self-learning” technique“self-learning” technique“self-learning” technique“self-learning” technique isof course highly dependant on the linguistic context.

Linguistics offer useful help to solve ambiguous casesambiguous casesambiguous casesambiguous casesambiguous cases such as an "O" whichmight be mistaken for a '0'. Another typical example is the letter "l" and number'1' which have an identical form in many fonts - think of texts produced on oldtypewriters! The linguistic context helps to determine whether you are dealingwith "l" or '1'.

The illustration below shows various shapes of '1' and "l". The shapes on thefirst line are unambiguous, the shapes on the second line are ambiguous, butlinguistics can solve them. When the context does not suffice, the user inter-venes.

3chapter2.PMD 24/11/2005, 12:0428

Page 49: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 29

USER’S GUIDE

RRRRREADIRISEADIRISEADIRISEADIRISEADIRIS C C C C CHANGESHANGESHANGESHANGESHANGES L L L L LANGUAGESANGUAGESANGUAGESANGUAGESANGUAGES AAAAASSSSS N N N N NEEDEDEEDEDEEDEDEEDEDEEDED

But the buck doesn’t stop here: Readiris can switch languages in the middle ofa sentence without any help from the user! When Western words pop up inGreek, Cyrillic, Hebrew or Asian documents - many untranscrible proper names,brand names etc. are written using the familiar Western symbols -, Readiris canswitch to the correct alphabet automatically. In other words, you can activate amixed alphabet mixed alphabet mixed alphabet mixed alphabet mixed alphabet of Greek, Cyrillic, Hebrew or Asian and Western characters.

Be sure to select "Greek-English" or the appropriate Cyrillic language setting,for instance "Byelorussian-English". In other words: don’t try to just select "Greek"or "Byelorussian" as document language and hope that the Western symbols willcome out fine!

Here’s an example where a Russian text contains some English words - openthe image file Alphabets.tif and recognize the corresponding page if you want totry it for yourself!

3chapter2.PMD 24/11/2005, 12:0429

Page 50: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 30

The end result looks like this when opened with the wordprocessor.

3chapter2.PMD 24/11/2005, 12:0530

Page 51: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 31

USER’S GUIDE

To mix other languagesmix other languagesmix other languagesmix other languagesmix other languages, simply select the language with the most extendedcharacter set. If you have a document where the, say, French translation is placedalongside an English text, you have to select French as language to ensure thatthe accentuated characters such as ç, é and ù get recognized correctly.

DDDDDEFININGEFININGEFININGEFININGEFINING THETHETHETHETHE D D D D DOCUMENTOCUMENTOCUMENTOCUMENTOCUMENT C C C C CHARACTERISTICSHARACTERISTICSHARACTERISTICSHARACTERISTICSHARACTERISTICS

Now that the language is set, we’ll turn to the other document characteristics.You can fine-tune the recognition by specifying some document features: the fonttype and character pitch. (These commands do not apply to Asian documents.)Let’s clarify what this means.

Let’s start with the command "Font Type" under the "Settings" menu. The fontmodes separate “normal” documents from dot matrixdot matrixdot matrixdot matrixdot matrix printed documents. “Draft”

3chapter2.PMD 24/11/2005, 12:0531

Page 52: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 32

or “9 pin” dot matrix symbols are made up of isolated, separate dots, and highlyspecialized recognition routines are used to recognize them.

“Letter quality” dot matrix printing, also called “25 pin” or “NLQ” dot matrix,requires the “normal” setting, as do the printing qualitiesprinting qualitiesprinting qualitiesprinting qualitiesprinting qualities typeset, typewritten,laser printed and inkjet printed.

The setting "Automatic" means that Readiris will detect the font mode auto-matically. Let Readiris “auto-detect” the font mode in all cases - unless you aresure dot matrix documents are being read! (Obviously, "Automatic" is the defaultvalue.)

The tooltip of the status bar indicates the selected font type - automatic detec-tion or dot matrix.

The character pitchcharacter pitchcharacter pitchcharacter pitchcharacter pitch can be set with the command "Character Pitch" underthe "Settings" menu.

With fixed or “monospaced” fonts, all symbols of the font have the samewidth. An "i" takes up as much horizontal space on a line as a"w", as is the case in this sentence. Think of documents producedusing a typewriter, where the carriage moves a fixed distance for each typedsymbol.

A proportional pitch means that the width of a character depends on its shape.Symbols like “m” and “w” are wider, take more horizontal space on a line than the

3chapter2.PMD 24/11/2005, 12:0532

Page 53: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 33

USER’S GUIDE

“thin” characters “l” or “j”. Virtually all books, magazines and newspapers areprinted in proportional pitch.

The simplest solution is to leave this option at all times on the default value"Automatic", which means that Readiris will detect the character pitch automati-cally.

RRRRREADIRISEADIRISEADIRISEADIRISEADIRIS G G G G GETSETSETSETSETS M M M M MOREOREOREOREORE I I I I INTELLIGENTNTELLIGENTNTELLIGENTNTELLIGENTNTELLIGENT E E E E EACHACHACHACHACH T T T T TIMEIMEIMEIMEIME!!!!!

When the document language is selected and document characteristics areset, you can click the "Recognize" button on the main toolbar (or the command"Recognize Document" under the "Process" menu).

The OCR progress is indicated on-screen. You can press the Escape key orexecute Command-. to abort the text recognition.

Readiris will enter the interactive learning phase at the end of the recognitionwhen the learning is enabled. Interactive learning is disabled by default.

Font training Font training Font training Font training Font training can substantially enhance the accuracy of the recognition sys-tem. When the user tries to read distorted, defaced forms as are found in realdocuments or stylized font shapes which Readiris does not recognize optimally,training can overcome this temporary “failure”.

User learning is also used to train the system on special symbolsspecial symbolsspecial symbolsspecial symbolsspecial symbols whichReadiris is unable to recognize, such as mathematical and scientific symbols anddingbats. Some examples: Readiris can be trained to recognize the "π" symbol as"pi" or the dingbat " " as "Tel". (However, the list of recognized symbols cannotbe extended with the symbols "π"and " "!)

The interactive learning is enabled with the "Learn" button on the main toolbar(or with the option "Interactive Learning" under the "Learning" menu).

3chapter2.PMD 24/11/2005, 12:0533

Page 54: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 34

(Interactive learning does not apply to Asian documents: learning does notmake sense for these languages which use thousands of different symbols - andyou’d have to be able to enter the ideograms, not an easy task when using aWestern keyboard!)

At the end of the recognition, Readiris displays the recognized text progres-sively and the system stops on doubtful characters, or - if you are dealing withtouching characters (“ligatures”) - on doubtful character strings. They are al-ways presented in their context, the doubtful characters are highlighted.

Unrecognized characters are by default represented by a tilde (the "~" sym-bol). The “reject” character can be modified with the "Preferences" commandunder the "Readiris" menu.

3chapter2.PMD 24/11/2005, 12:0534

Page 55: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 35

USER’S GUIDE

If necessary, enter a character (or character string) for the incorrect or un-known shape and click one of the following buttons.

LearnLearnLearnLearnLearnYou agree with the proposed solution or correct it. The program saves this

doubtful character in the font dictionary as “sure”, final. Future recognition willno longer require your intervention, the shape is considered learnt once and forall.

In the example above, the system stops on two joined characters, and we click"Learn" to accept a shape which cannot be confused with other characters.

Don’t LearnDon’t LearnDon’t LearnDon’t LearnDon’t LearnYou agree with the proposed solution or correct it. The difference with the

"Learn" button is that the learnt symbol gets the status “unsure” in the dictionary.For future recognition, the system will propose the “learnt” solution but still re-quire a confirmation.

This button is used for symbols which might be confused with others: a de-faced "e" which might be mistaken for a "c", a damaged "t" which closely re-sembles an "r" etc.

3chapter2.PMD 24/11/2005, 12:0535

Page 56: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 36

The "e" above is seriously damaged - in fact it is close to the letter "c", and youshould click "Don’t Learn" so as not to confuse it with the symbol "c".

DeleteDeleteDeleteDeleteDeleteThe displayed form is eliminated from the output. This button is used to ignore

“noise” on the documents - spots, coffee stains etc. - which might get recognizedas points, comma’s and what have you -, and to erase any other unwanted sym-bol.

UndoUndoUndoUndoUndoYou go back to correct mistakes. You can undo the 32 last decisions.

FinishFinishFinishFinishFinishThe learning process is aborted but the OCR continues in automatic mode. All

decisions by the system thereafter are accepted without user validation.

3chapter2.PMD 24/11/2005, 12:0536

Page 57: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 37

USER’S GUIDE

Click this button when you see that the recognition is highly accurate and doesnot require detailled proofreading.

AbortAbortAbortAbortAbortDon’t confuse "Finish" with the "Abort" button: with "Abort", no output is

generated and you start all over, with "Finish", the text is created, it just isn'tproofread in detail!

TTTTTHEHEHEHEHE R R R R ROLEOLEOLEOLEOLE OFOFOFOFOF F F F F FONTONTONTONTONT D D D D DICTIONARIESICTIONARIESICTIONARIESICTIONARIESICTIONARIES

The results of each training session are temporarily held in the computer’smemory but can and should be stored in files called “dictionaries” for future use.

(Don’t confuse font dictionaries with lexicons! Font dictionaries contain char-acter shapes learnt during the interactive OCR phase, lexicons are linguistic da-tabases that assist the recognition.)

Font dictionaries should be loaded into memory when you want to recognizesimilar documents in order to make use of the extra intelligence they contain; inthis way, Readiris takes into account the intelligence stored in these font libraries.You could say that Readiris gets more intelligence each time you use it!

Initially, all input from the user is simply held in the computer’s memory. Nofont shapes are actually saved until he uses the command "Save Dictionary"under the "Learn" menu. When he does so, all learnt shapes contained in theRAM memory are stored in the “font dictionaries”.

3chapter2.PMD 24/11/2005, 12:0537

Page 58: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 38

The command "Open Dictionary" allows to load font dictionaries back intomemory.

The active dictionary is mentioned at all times in the title bar of the interactivelearning window! When no dictionary has been saved yet, the name "UntitledTraining" is used. Click the "Abort" button of the interactive learning in case youhave loaded the wrong font dictionary!

Use the command "New Dictionary" to “unload” whichever dictionary is loadedinto memory.

You can also append, complete existing dictionaries by loading them, perform-ing extra learning and saving them again.

Font dictionaries are limited to 500 shapes, and you are recommended to cre-ate separate dictionaries for specific applications, for instance per type of docu-ment. For clarity, you are recommended to give meaningful names to the fontdictionaries, for instance Report, Palatino etc. Training no longer has effect whenthe dictionary is full: the results of the learning are no longer held in memory orwritten to a dictionary.

SSSSSAAAAAVINGVINGVINGVINGVING THETHETHETHETHE R R R R RESULESULESULESULESULTSTSTSTSTS INININININ AAAAA TTTTTEXTEXTEXTEXTEXT F F F F FILEILEILEILEILE

The interactive training concludes the character recognition; you will be promptedto save the OCR result to a text file. Just click "Save" for the time being.

3chapter2.PMD 24/11/2005, 12:0538

Page 59: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 39

USER’S GUIDE

Click the "Format" button on the main toolbar (or select the command "TextFormat" under the "Settings" menu) to discover the versatile output capabilities ofReadiris.

3chapter2.PMD 24/11/2005, 12:0539

Page 60: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 40

Readiris supports the file formats Text (Unicode), RTF (“Rich Text Format”),HTML and Adobe Acrobat PDF. The RTF format is used by default. Note thatthe file extension of the selected format is added automatically to the file name.

The option "Ask File Name and Location" determines whether you are promptedto save the recognized text at the end of the recognition phase.

SSSSSENDINGENDINGENDINGENDINGENDING O O O O OUTPUTUTPUTUTPUTUTPUTUTPUT D D D D DIRECTLIRECTLIRECTLIRECTLIRECTLYYYYY TTTTTOOOOO YYYYYOUROUROUROUROUR AAAAAPPLICAPPLICAPPLICAPPLICAPPLICATIONTIONTIONTIONTION

But we can also send the recognized text directly to our text application - asan alternative to saving a text file and simultaneously with it. For instance, ifMicrosoft Word functions as your target application, your wordprocessor will bestarted up automatically at the end of the recognition (if necessary) and the rec-ognized text will be inserted inside a new document.

The "Send to" feature offers a direct OCR link between your scanner andyour Mac OS applications. Readiris exports recognized documents directly toany text-based Mac OS application - wordprocessors such as Microsoft Word orApple Pages, spreadsheets such as Microsoft Excel, web browsers such as AppleSafari, application suites such as AppleWorks, standard Mac OS text applicationssuch as TextEdit and Preview, Adobe Reader etc.

Use the option "Add Application" to “declare” an application as a possibleoutput target; all “declared” applications remain so until they are removed againwith the option "Remove Application". Select "None" to disable the use of a tar-get application momentarily.

3chapter2.PMD 24/11/2005, 12:0540

Page 61: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 41

USER’S GUIDE

You are recommended to assign different applications to the various formats,so that several applications become available as output targets. To make thingseasier for you, you’re prompted to assign target applications to the supported textformats the first time you run Readiris.

3chapter2.PMD 24/11/2005, 12:0541

Page 62: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 42

Here’s a nice tip: select your e-mail softwaree-mail softwaree-mail softwaree-mail softwaree-mail software - Apple Mail, Microsoft En-tourage etc. - as target application. As you click the "Recognize" button, you willcreate a new e-mail message and add the recognized document as attachment!Do you know a more direct way of distributing new material quickly...?

Note that Readiris also allows you to copy the recognized text to the clip-clip-clip-clip-clip-boardboardboardboardboard, so there is no strict need to export the result to an application... or save itto a text file!

3chapter2.PMD 24/11/2005, 12:0542

Page 63: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 43

USER’S GUIDE

SSSSSEEINGEEINGEEINGEEINGEEING THETHETHETHETHE TTTTTEXTEXTEXTEXTEXT R R R R RESULESULESULESULESULTTTTT

Concluding, Readiris offers several methods when it comes to saving the OCRresult: copying the result to the clipboard, saving the result in a text file, exportingthe recognized document promptly to a target application and even saving theresult in a text file and sending the recognized document directly to an applica-tion.

After the OCR, the scanned document is redisplayed with the zoning as cre-ated to be available for further processing.

You can now open the recognized text with your wordprocessor, text editor,import it into your desktop publishing software or any other text-based applica-tion, archive it, post it on an Intranet server etc. You have indeed converted apaper document into an editable computer file, be it up to 40 times faster thanmanual retyping! Go ahead and compare it with the image you have inside yourReadiris window.

3chapter2.PMD 24/11/2005, 12:0543

Page 64: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 44

RRRRRECOGNIZINGECOGNIZINGECOGNIZINGECOGNIZINGECOGNIZING M M M M MULULULULULTIPLETIPLETIPLETIPLETIPLE P P P P PAGESAGESAGESAGESAGES

But how do you save the text of additional pages? Or in other words: how doyou process documents consisting of multiple pages? It’s actually very simple: goon recognizing pages and save the results to the same file! (Make sure that fileisnt currently open, because that will prevent you from writing to it!) Secondly,

3chapter2.PMD 24/11/2005, 12:0544

Page 65: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 45

USER’S GUIDE

don’t forget to put the font dictionary in the append mode so that you can con-tinue the font training comfortably.

As soon as you scan pages (or open image files) inside a document, you haveto decide whether you want to start a new document or complete the currentdocument.

Answer "no" to add pages to the current document, answer "yes" to create anew document. This answer has the same effect as the command "New Docu-ment" under the "File" menu.

However, there’s a more efficient way of recognizing several pages than scan-ning and OCRing them one after the other: processing multipage documentsmultipage documentsmultipage documentsmultipage documentsmultipage documentsdirectly!

To scan a document composed of several pages in one operation, enable thedocument feeder of your scanner. Study the Twain driver of your scanner to seehow this works. Place the pages of your document in the automatic documentfeeder and start the scanning.

3chapter2.PMD 24/11/2005, 12:0545

Page 66: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 46

You can also open multiple prescanned images. To load several images, selectthe first image and hold down the Command key as you select additional images.To load a continuous range of images, select the first image and hold down theShift key as you select the last image.

And you can open multipage TIFF files. When you do so, a page number isadded to the “root” of the image file. Open the sample file Multipage.tif to give ita try; the various pages are displayed one after the other. (You can press Escapeor Command-. to interrupt the loading process between two pages...)

3chapter2.PMD 24/11/2005, 12:0646

Page 67: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 47

USER’S GUIDE

All images you scan or load into memory are added to the current documentuntil you click the command "New Document" under the "File" menu. Creating anew document “cleans the slate”. Any document loaded into memory - contain-ing a single page or multiple pages - is erased.

PPPPPRINTINGRINTINGRINTINGRINTINGRINTING THETHETHETHETHE I I I I IMAGESMAGESMAGESMAGESMAGES

The page toolbarpage toolbarpage toolbarpage toolbarpage toolbar gives direct access to the various pages of the document.To go to a page, click it in the page toolbar. The selected page is highlighted.

3chapter2.PMD 24/11/2005, 12:0647

Page 68: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 48

To go to the previous page, you can use the PageUp and arrow up keys; to goto the next page, press PageDn or arrow down. Press Home to go to the firstpage, press End to go to the last page.

You can quickly print print print print print the scanned images images images images images with the "Print" button on theimage toolbar (or with the commands "Print Images" under the "File" menu) shouldyou need an overview of your document. The command "Print Thumbnails" un-der the "File" menu prints a thumbnail album of the scanned document.

But whether you print the images at legible size or as thumbnails, you don’thave to print all pages: define a page range to print only specific pages!

And when you print thumbnails, you can influence the grid size. Define thenumber of rows and columns to determine how many thumbnails get placed on apage.

3chapter2.PMD 24/11/2005, 12:0648

Page 69: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 49

USER’S GUIDE

The command "Page Setup" under the "File" menu selects your printer anddefines its setup.

Start the recognition on the sample image Multipage.tif.If the interactive learning is enabled, you go through the recognition and learn-

ing phases page by page.When you click the "Finish" button, all decisions by the system thereafter are

accepted without user validation. In other words, the interactive learning is abortedfor all pages; the OCR for this document continues in automatic mode.

3chapter2.PMD 24/11/2005, 12:0649

Page 70: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 50

The recognition result of multipage documents is saved in a single output file;when the recognition result is sent to a target application, multiple pages getcreated inside a single document.

But maybe you need to recognize just one, some or even most pages - but notall pages - of a scanned document? Well, that’s easy too: select the requiredpages in the page toolbar and click the command "Recognize Selected Pages"under the "Process" menu.

EEEEEDITINGDITINGDITINGDITINGDITING M M M M MULULULULULTIPTIPTIPTIPTIPAGEAGEAGEAGEAGE D D D D DOCUMENTSOCUMENTSOCUMENTSOCUMENTSOCUMENTS

The user can edit multipage documents, mainly to correct scanning errors: hecan delete pages from the document and move pages to other locations in thedocument to reorder the pages.

To delete pages, select them in the page toolbar and press the Delete key (orselect the command "Clear" from the "Edit" menu). You can also drag the pagesto be deleted to the trashcan below.

3chapter2.PMD 24/11/2005, 12:0650

Page 71: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 51

USER’S GUIDE

To move pages to a different location in the document, drag their icon to thatnew location.

3chapter2.PMD 24/11/2005, 12:0651

Page 72: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 52

SSSSSTTTTTARARARARARTINGTINGTINGTINGTING AAAAA N N N N NEWEWEWEWEW D D D D DOCUMENTOCUMENTOCUMENTOCUMENTOCUMENT

You can use the command "New Document" under the "File" menu to closethe current document.

3chapter2.PMD 24/11/2005, 12:0652

Page 73: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 53

USER’S GUIDE

This command “cleans the slate”. Any document loaded into memory - con-taining a single page or multiple pages - is erased. You are now ready to create anew document.

But you can also create a new document from within the current document.As long as the OCR was not executed, the system assumes that you want to addpages to the current document. You can for instance scan all the pages in thescanner’s autofeeder, fill the feeder again and start over. All pages scanned willcompose a single document. Or you could scan a number of pages and add someimage files, say, faxes. These pages again form a single document, all you have todo is change the image source in between with the "Source" button.

When the OCR was already executed and you re-initiate the scanning (or theloading of images), you are prompted to start a new document or complete thecurrent document.

RRRRRECOGNIZINGECOGNIZINGECOGNIZINGECOGNIZINGECOGNIZING T T T T TEXTEXTEXTEXTEXT Z Z Z Z ZONESONESONESONESONES

We now know how to recognize pages and how to process multipage docu-ments. But can we recognize less than a page with equal comfort? We can! Ctrl-click a zone and select the command "Copy as Text": the text window under themouse gets recognized and sent to the clipboard.

3chapter2.PMD 24/11/2005, 12:0653

Page 74: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 54

The current system settings - language, font type etc. - apply. The OCR resultis placed on the clipboard as “running”, unformatted text.

The command "Copy Text Zones" from the "Layout" menu copies all textzones on the page to the clipboard. This command offers a convenient way ofrecognizing a single page instantly!

OOOOORGANIZINGRGANIZINGRGANIZINGRGANIZINGRGANIZING THETHETHETHETHE T T T T TEXTEXTEXTEXTEXT O O O O OUTPUTUTPUTUTPUTUTPUTUTPUT

Saving or exporting the text means more than selecting an output method -saving a file, sending the output to a target application or the clipboard, or doingboth - or defining a filename for the output file. You also select a file format anddetermine the appearance of the recognized text. In short, you have to decidewhere you want to take the text before you launch the execution.

3chapter2.PMD 24/11/2005, 12:0654

Page 75: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 55

USER’S GUIDE

Some options of the "Format" button allow you to influence the look of the textoutput.

The text flowtext flowtext flowtext flowtext flow of the output document is directly influenced by the option"Merge Lines into Paragraphs".

Keep this option enabled to have Readiris detect the paragraphs: Readiris willthen apply the normal wordwrap wordwrap wordwrap wordwrap wordwrap typical of wordprocessors, otherwise, a car-riage return is added after each line and hyphenated words remain so! Paragraphdetection is enabled by default.

Let’s give an example to clear things up. When the first three lines of a col-umn are "The new presi-", "dent waved from the balcony." and "His wife hadjoined him.", the paragraph detection gives you the following result: "The newpresident waved from the balcony. His wife had joined him." The hyphenatedparts of the word "president" were “reglued” and a space was added at the endof the first sentence, thus creating naturally flowing text.

Had paragraph detection not been enabled, the original layout would havebeen retained, with a carriage return added at the end of each line.

This option is not available when the PDF format is selected: Adobe AcrobatPDF files always store text line by line!

(The "Format" button contains some formatting options we haven’t discussedyet - this will be done shortly.)

SSSSSETTINGETTINGETTINGETTINGETTING UPUPUPUPUP YYYYYOUROUROUROUROUR S S S S SCANNERCANNERCANNERCANNERCANNER

Let’s set your scanner up now. It is assumed that the scanner hardware andnecessary software are installed correctly on your computer system.

3chapter2.PMD 24/11/2005, 12:0655

Page 76: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 56

Actually, it’s all very easy: Readiris exploits the Twain driver of each scannerto support it. In other words, as soon as there’s a Twain module available for yourscanner modelscanner modelscanner modelscanner modelscanner model, Readiris supports it effortlessly!

Select your Twain driver under Readiris with the option "Scanner" of the "Pref-erences" command under the "Readiris" menu.

The option "Invert Image" allows you to generate “inverted” images“inverted” images“inverted” images“inverted” images“inverted” images - thisoption is useful to process full pages with white text on a dark background.

The selected scanner is mentioned in the main toolbar; the title bar of theimage window and the filename in the page toolbar indicate which scanner wasused to acquire the image. (Given our example, page 1 was scanned with an HPTwain driver, and that Twain driver is still the active scanner.)

3chapter2.PMD 24/11/2005, 12:0656

Page 77: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 57

USER’S GUIDE

Go to the Readiris “Read Me” file or to chapter 1 of this manual should youneed further information.

SSSSSCANNINGCANNINGCANNINGCANNINGCANNING D D D D DOCUMENTSOCUMENTSOCUMENTSOCUMENTSOCUMENTS

Now that our scanner is set up, we want to get started scanning documents.The scanner’s Twain driver is used to set the color mode, the scanning reso-

lution, the page format and orientation, the brightness and contrast. (The contrastsetting is only available on some scanners.)

Which scanning options you dispose of depends on your scanner model. Referto the software documentation that accompanies your scanner.

3chapter2.PMD 24/11/2005, 12:0657

Page 78: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 58

There are some elements you should be aware of. First of all, pay some atten-tion to lineskew. Although the page analysis and recognition are skew-tolerant, itmay become difficult to zone and OCR a page correctly when the skew is toosignificant. Limited lineskew (less than 0.5°) can be ignored because the OCRaccuracy does not suffer.

3chapter2.PMD 24/11/2005, 12:0658

Page 79: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 59

USER’S GUIDE

The option "Page Deskewing" under the "Options" button (or under the "Set-tings" menu) determines whether pages which were scanned at an angle will bedeskeweddeskeweddeskeweddeskeweddeskewed, straightened automatically. Limited lineskew gets ignored. This op-tion is disabled by default.

If you forgot to enable this option, use the command "Deskew Page" on theimage toolbar (or under the "Process" menu) to “straighten” pages that werescanned at an angle.

The deskewing takes a few seconds: the image is analyzed to detect the skewangle - if any -, the color or greyscale image and its black-and-white version aredeskewed and the page analysis gets re-executed.

You may also need to adjust the page orientation. Use the rotation rotation rotation rotation rotation tools onthe image toolbar. (Corresponding commands are found under the "Process" menu.)Three rotation directions are available: to the right, to the left and upside down.Rotation also takes a few seconds as the image itself is updated, not just thedisplay on-screen.

However, Readiris can correct badly oriented pages for you. Enable the op-tion "Page Orientation Detection" under the "Options" button (or under the "Set-tings" menu) and Readiris will correct the page orientation where needed.

3chapter2.PMD 24/11/2005, 12:0659

Page 80: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 60

You can make good use of the image Deskew.jpg the image folder if you wantto try it. Enable the options "Page Deskewing" and "Page Orientation Detection"before you open the image and let Readiris restore the Tower of Pisa the way welike it.

3chapter2.PMD 24/11/2005, 12:0660

Page 81: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 61

USER’S GUIDE

LLLLLETETETETET THETHETHETHETHE B B B B BADADADADAD C C C C COLOROLOROLOROLOROLOR N N N N NOTOTOTOTOT B B B B BEEEEE S S S S SEENEENEENEENEEN

Readiris supports black-and-white, greyscale and color images on an equalbasis, so you are free to choose the color modecolor modecolor modecolor modecolor mode that best suits your needs. Toinclude lineart graphics in the recognized documents, scan in black-and-white, toinclude black-and-white photos, scan in greyscales, to include color pictures, scanin color.

But why would you reduce the bit depth of the images during the scan? It goeswithout saying that greyscale and color images are slower to acquire.

Note that the image size, bit depth and resolution are mentioned on the statusbar of the image window.

Readiris creates a black-and-white version for every greyscale and color im-age. To view a scanned image in black-and-white, disable the option "Image inColor" under the "View" menu.

DDDDDIFFERENTIFFERENTIFFERENTIFFERENTIFFERENT D D D D DEVICESEVICESEVICESEVICESEVICES, D, D, D, D, DIFFERENTIFFERENTIFFERENTIFFERENTIFFERENT R R R R RESOLUTIONESOLUTIONESOLUTIONESOLUTIONESOLUTION

Whatever your scanning mode may be, maintain a scanning resolution resolution resolution resolution resolution of300 dpi. In all probability, this is not the default setting of your Twain driver!Select a resolution of 300 dpi for normal applications, use a higher resolution of400 dpi for small print (below 10 point) and when the document is very degraded.

Readiris reads point sizespoint sizespoint sizespoint sizespoint sizes of 6 to 72 point (0.08 to 1" or 0.21 to 2.54 cm).

3chapter2.PMD 24/11/2005, 12:0661

Page 82: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 62

Readiris also recognizes “drop lettersdrop lettersdrop lettersdrop lettersdrop letters”, large caps that cover several lines.(These can of course be no bigger than 72 point!) Even inverted drops caps getsrecognized...

Faxes Faxes Faxes Faxes Faxes have a resolution of 100 or 200 dpi, when you’re creating images witha digital camera, the resolution is unknown, when you’re opening images, the fileheader may contain an incorrect resolution. To process such images hassle-free,enable the option "Process as 300 dpi" under the "Preferences" command of the"Readiris" menu. This setting applies to both direct scanning and the opening ofprescanned images.

When your images are acquired by a digital cameradigital cameradigital cameradigital cameradigital camera instead of a scanner, itis mandatory that you enable another special option, "Digital Camera", in the"Preferences" command. This parameter again applies to direct scanning andprescanned images.

3chapter2.PMD 24/11/2005, 12:0762

Page 83: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 63

USER’S GUIDE

By doing this, you enhance the image before it gets recognized. There arespecific challenges to be met when it comes to digital cameras: they producelow-resolution images - even when you hold the camera very close over yourdocument - and the image resolution is in any case unkown.

There are some “finer points” to be aware of when it comes to successfullyrecognizing images captured with a digital camera.

First of all, select the highest possible image resolution. Create for instance2,048 x 1,536 size images when 1,024 x 768 pixel images are also supported.Secondly, enable the “macro” mode of your camera to take closeups - which isalways the case when you photograph documents. (This mode was designed tocapture flowers, insects etc.) Otherwise, the images are unsharp and illegible.

3chapter2.PMD 24/11/2005, 12:0763

Page 84: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 64

Limit yourself to no or small compression: important compression reduces thesharpness of the captured text. Zoom manually to crop your document - somecameras are bundled with photo stitching software, but don’t bother using it fordocument capture.

Hold the camera directly above the document to avoid capturing the docu-ment at an angle. However, avoid shadows cast on the document by the cameraor your hand! Produce stable images. Consider mounting your camera on a tripodwhen necessary.

Disable the flash when you’re filming glossy paper, otherwise the image maybe too light. Generally speaking, adapt the brightness and contrast to the environ-ment - day light, lamp light, neon light etc. (Some cameras can be calibrated byfilming a white document.)

To give it a try, open the image Digital.jpg in the Readiris image folder andexecute the recognition.

3chapter2.PMD 24/11/2005, 12:0764

Page 85: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 65

USER’S GUIDE

AAAAADJUSTINGDJUSTINGDJUSTINGDJUSTINGDJUSTING THETHETHETHETHE S S S S SCANNEDCANNEDCANNEDCANNEDCANNED I I I I IMAGESMAGESMAGESMAGESMAGES

Scanning in greyscale and color isn’t just useful to save the graphics withsufficient quality, in some instances, it’s also useful or necessary to obtain goodOCR results! When text is printed on a color background, scanning in color maycreate the tone differences that are lacking in black-and-white images. Whenthere is only limited contrast between the text and the background, the back-ground can create “noise” that renders the recognition difficult or impossible!

Think for instance of black text printed on a dark background: when you scansuch a document in black-and-white, you may not be able to “drop” the back-ground color without losing the text information as well, as much as you may tryto adjust the scanner brightness...

3chapter2.PMD 24/11/2005, 12:0765

Page 86: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 66

As was already indicated, powerful, intelligent routines automatically convertcolor and greyscale images into black-and-white. Thanks to its intelligent rou-tines, even tough cases get solved - here’s how our “difficult” image gets binarizedby Readiris!

Should this still be necessary, the user can optimize the image further for theconsecutive OCR process. Select the "Adjust Image" button on the image toolbar(or the command "Adjust Image" under the "Process" menu) to do so.

3chapter2.PMD 24/11/2005, 12:0766

Page 87: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 67

USER’S GUIDE

When you access this command, the black-and-white version is displayedautomatically. (It’s as if you disabled the option "Image in Color"!) There aresome complicated concepts here, and we need to discuss them in detail.

The option "Smoothen Color or Greyscale Image" renders greyscale and colorimages more homogeneous by “flattening”, smoothing out relative differences in

3chapter2.PMD 24/11/2005, 12:0767

Page 88: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 68

intensity. As a result, a sharper contrast is created between the foreground - thetext - and the background - a color, artwork etc.

This preprocessing preprocessing preprocessing preprocessing preprocessing feature may seem highly technical and difficult to un-derstand, but it certainly has its role to play: with some scanner models, thisreduction of the sharpness is needed to recognize color and greyscale images.Smoothening is sometimes the only way separate text from the colored back-ground! Below is a sample image that is simply illegible without image smoothing.

The image smoothening is also available as an option in the "Preferences"command under the "Readiris" menu.

The brightness brightness brightness brightness brightness now. This setting determines the overall brightness of theimage: any darkening or lightening of the image applies to all pixels. The objectiveis to rid yourself of the page background. We’ll give two examples. In the firstexample, every zone of the image is dark. We therefore lighten the image to

3chapter2.PMD 24/11/2005, 12:0768

Page 89: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 69

USER’S GUIDE

eliminate the page background. The foreground - the text - remains sufficientlydark to get detected by the binarization. Example 2: the image is so light even theforeground text doesn’t show up in the binarized image! We darken the image soas to make the text legible.

The contrast contrast contrast contrast contrast determines the local contrast between the darker and lighterzones of the image. (The text is usually darker than the background - the reverseis true when you’re dealing with inverted text.) The objective is to make thecharacter shapes stand out nicely against their (colored) background. Here’s anexample where we need to increase because the default setting yields brokencharacters.

3chapter2.PMD 24/11/2005, 12:0769

Page 90: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 70

Note above all that no image adjustment is executed until you click the "Ap-ply" button! By clicking "OK", you execute the adjustment and close the window.Here’s an example where we darkened the black-and-white image dramatically- though admittedly not with OCR accuracy in mind!

The first two options concern color and greyscale images, the last one,"Despeckle", exclusively concerns black-and-white images. “Despeckling” meansthat the “parasite pixels” (also called “salt and pepper noise”) will be removedfrom black-and-white images.

3chapter2.PMD 24/11/2005, 12:0770

Page 91: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 71

USER’S GUIDE

Be sure that you don’t erase spots that are too big, otherwise you might starterasing the dots on "i" etc., portions of dot matrix letters etc.!

By enabling the option "Despeckling" under the "Options" button (and underthe "Settings" menu) the despeckling is executed automatically on every pageloaded into memory!

The best way of optimizing the images for the OCR process is this: place theadjustment window where it doesn’t prevent you from judging the image adjust-ment you execute. Adapt the parameters - clicking "Apply" each time - until theimage is crisp and clear.

SSSSSAAAAAVINGVINGVINGVINGVING D D D D DEFEFEFEFEFAULAULAULAULAULTTTTT S S S S SETTINGSETTINGSETTINGSETTINGSETTINGS

Set the program parameters correctly and click the command "Save As De-fault" under the "Settings" menu to save the current settings, including your scan-ner model, as default settings settings settings settings settings for future use.

3chapter2.PMD 24/11/2005, 12:0771

Page 92: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 72

When you quit the Readiris software and the settings were modified, you areinvited to save the current settings as default settings.

Settings files contain more than the scanner model: they also determine whetheryou are going to use interactive learning, which language and font type - forinstance a normal, proportional font - the documents have, which output mode isused - for instance send HTML texts to Apple Safari - etc. In short, all opera-tional settings of Readiris are stored in the settings files.

The default settings will obviously be used at each program startup. To restorethe default settings without having to quit the Readiris software, use the com-mand "Open Default" under the "Settings" menu. (You can even reload the fac-tory settings with a special command...)

SSSSSAAAAAVINGVINGVINGVINGVING S S S S SPECIFICPECIFICPECIFICPECIFICPECIFIC S S S S SETTINGSETTINGSETTINGSETTINGSETTINGS

You can also save specific settings to avoid having to redefine the operationalparameters. The commands "Save" and "Open" under the "Settings" menu takecare of this.

3chapter2.PMD 24/11/2005, 12:0772

Page 93: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 73

USER’S GUIDE

Let’s give an example: if you regularly have to OCR German documents, youare recommended to create a settings file for this type of document. You wouldthen select "German" as the document language, disable learning because thesame typefaces are used systematically etc.

RRRRRECOGNIZINGECOGNIZINGECOGNIZINGECOGNIZINGECOGNIZING P P P P PAGESAGESAGESAGESAGES AAAAAUTUTUTUTUTOMAOMAOMAOMAOMATICALLTICALLTICALLTICALLTICALLYYYYY

Now that our scanner is set up, we want to get started capturing documents.Instead of going through all the parameters, we’ll execute automatic OCRautomatic OCRautomatic OCRautomatic OCRautomatic OCR, avery comfortable way of recognizing pages.

Click the "Auto" button (or select the command "Automatic OCR" under the"Process" menu).

We will now perform fully automatic OCR, that is we will recognize a pageimmediately, without any interruption. Automatic OCR means that a page is suc-cessively scanned, windowed by page analysis or a zoning template and recog-nized without interactive learning. All you have to do is initiate the scanning andsave the recognized text, the intermediate steps are handled by Readiris.

RRRRREADIRISEADIRISEADIRISEADIRISEADIRIS R R R R RECREAECREAECREAECREAECREATESTESTESTESTES YYYYYOUROUROUROUROUR D D D D DOCUMENTOCUMENTOCUMENTOCUMENTOCUMENT L L L L LAAAAAYOUTYOUTYOUTYOUTYOUT

Automatic recognition, which renders the recognition process automatic, shouldnot be confused with autoformatting! “Autoformatting” means that Readiris rec-

3chapter2.PMD 24/11/2005, 12:0773

Page 94: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 74

reates a facsimile copy facsimile copy facsimile copy facsimile copy facsimile copy of the scanned document: the word, paragraph andpage formatting of your original document are applied.

Similar typefaces (serif and sans serif, proportional and fixed, normal andcondensed) are used as in the source document, the point sizes and typestyles(bold, italic, underlined, superscript and subscript) are maintained across the rec-ognition. The tabs and the alignment (left, centered, right and justified) of eachtext block are recreated. So are the bulleted and numbered lists. Any e-mailadresses and URLs of web pages get detected and recreated as hyperlinks in theoutput. The placement of columns, text blocks and graphics follows your originaldocument.

In other words, Readiris allows you to archive a true copy of your documents,be it an editable and compact text file instead of a scanned image!

All this implies that the sorting of windows only partially applies when“autoformatting” is used: you can include and exclude zones, but any re-orderingof zones is simply ignored!

Here’s an example of how it works. To get acquainted with this feature, openthe image Autoform.jpg which is found in the image folder.

3chapter2.PMD 24/11/2005, 12:0774

Page 95: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 75

USER’S GUIDE

Click the "Format" button on the main toolbar, select the text format RTF(“Rich Text Format”) and the layout option "Recreate Source Document". (The

3chapter2.PMD 24/11/2005, 12:0775

Page 96: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 76

option "Merge Lines into Paragraphs" is enabled by default to apply wordwrapwithin the paragraphs.) Enable the option "Ask File Name and Location" to sendthe reading result to an RTF file or, if Microsoft Word is installed on your com-puter, send the OCR result to Microsoft Word.

Whether layout reconstruction is available depends on the selected outputmode. A “poor” format generating “plain” text such as Text (Unicode) does notsupport advanced formatting codes and therefore cannot offer autoformatting.The RTF format does. (On the plus side, the RTF format is a widely used textformat that can be opened by any popular wordprocessor.) The Adobe AcrobatPDF format on the other hand was designed to copy the look of your documents:PDF documents by nature imply autoformatting.

When the recognized text is opened using a word processor, the text looks likethis without any intervention by the user.

3chapter2.PMD 24/11/2005, 12:0776

Page 97: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 77

USER’S GUIDE

(To see the effect correctly, you need to enable the “WYSIWIG” mode ofyour wordprocessor, mostly called “page layout” mode. However, if you send the

3chapter2.PMD 24/11/2005, 12:0877

Page 98: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 78

recognized document directly to Microsoft Word, the page or print layout view isactivated automatically!)

In short, Readiris not only recognizes your texts, but can format them for youas well. OCR isn’t just text recognition anymore, it has become document rec-document rec-document rec-document rec-document rec-ognition ognition ognition ognition ognition as well!

CCCCCOLUMNSOLUMNSOLUMNSOLUMNSOLUMNS P P P P PLEASELEASELEASELEASELEASE, N, N, N, N, NOTOTOTOTOT F F F F FRAMESRAMESRAMESRAMESRAMES!!!!!

The formatting option "Use Columns instead of Frames" determines how the“autoformatting” gets done: the text blocks, tables and graphics can either bestored in frames or in editable columnscolumnscolumnscolumnscolumns.

“Frames” are separate containers for text used to position several blocks oftext, graphics and tables on a page. With columns, the text flows naturally fromone column to the next, and columnized texts are much easier to edit.

We now assume that real columns do occur on the scanned document: whenthe system is unable to detect columns in the source document, this formattingmode uses frames anyway as a “fallback” position!

You can make good use of the image Columns.tif in the image folder if youwant to try it.

3chapter2.PMD 24/11/2005, 12:0878

Page 99: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 79

USER’S GUIDE

The option "Insert Column Breaks" refines the recreation of columns: it deter-mines whether you insert “hard” column breaks at the end of each column or not.

3chapter2.PMD 24/11/2005, 12:0879

Page 100: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 80

With column breaks, any text you edit, add or remove remains inside its column;no text ever flows automatically across a column break. All text that follows acolumn break is moved to the top of the next column!

Enable this option when you want to maintain column breaks where thesewere detected in the recognized document - whatever text editing gets done afterthe OCR. In newspapers and magazines, the various columns on a page oftencorrespond to different article “threads”. Having text flow from one column tothe next “on the sly”, covertly may not be a good idea!

Disable this option when you have columnized body text: you’ll ensure thenatural flow of the text from one column to the next.

There’s one aspect where you may decide not to recreate the source docu-ment: the page sizepage sizepage sizepage sizepage size of your output documents. What do we mean here? Let’sgive some examples: you’re scanning A4 pages but you create Letter outputbecause that format is easier to print, whereas A4 requires a manual feed. Oryou may be an attorney; you scan Letter documents that you save in the Legalformat.

That’s why Readiris allows you to define preferred paper sizes for the outputdocuments. Click the button "Page Sizes" in the "Text Format" dialog.

3chapter2.PMD 24/11/2005, 12:0880

Page 101: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 81

USER’S GUIDE

Select the applicable and excluded paper sizes: the applicable paper sizes canbe used to format the recognized documents, the excluded formats won’t ever beused. Sort the applicable paper sizes by dragging them to their proper place:Readiris goes through the various page sizes in the indicated order and utilizes thefirst paper size that is sufficiently large to hold the scanned document. The button"Default" re-applies the default settings. (These take the settings of your com-puter into account!)

Know that this option does not apply to HTML files - a text format designedfor the Internet that doesn’t have any page formats! Nor does it apply to PDFfiles, which apply a custom fit to recreate the source document accurately.

3chapter2.PMD 24/11/2005, 12:0881

Page 102: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 82

TTTTTEXTEXTEXTEXTEXT F F F F FORMAORMAORMAORMAORMATTINGTTINGTTINGTTINGTTING, P, P, P, P, PARARARARARTTTTT 2 2 2 2 2

The other layout options are "Create Body Text" and "Retain Word and Para-graph Formatting".

Creating body textbody textbody textbody textbody text means you create a non-formatted, “running” text. Thetext will be captured, but its formatting is entirely ignored. Use this option whenyou just need to recapture a text but not its layout.

(Body text is also what you get when you quickly recognize a text zone byCtrl-clicking it and selecting the command "Copy as Text": when the recognitionis done, you’ll paste body text into your text application. And the same holdswhen you recognize all text zones on a page at once with the command "CopyText Zones" from the "Layout" menu.)

The option "Retain Word and Paragraph Formatting" represents the middleroad: the word formattingword formattingword formattingword formattingword formatting - font type, point size and typestyle - is retainedacross the recognition, and so is the paragraph formattingparagraph formattingparagraph formattingparagraph formattingparagraph formatting - the tabs and thealignment.

3chapter2.PMD 24/11/2005, 12:0882

Page 103: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 83

USER’S GUIDE

Don’t confuse this formatting option with “full” autoformatting: this option justputs one paragraph after the other, it does not recreate columns or copy therelative position of the various zones.

EEEEEXPORXPORXPORXPORXPORTINGTINGTINGTINGTING TTTTTEXTEXTEXTEXTEXT S S S S SEVERALEVERALEVERALEVERALEVERAL TTTTTIMESIMESIMESIMESIMES

Actually, you can export the OCR results several times without repeating therecognition! Change the text format and the formatting options under the "For-mat" button and click the button "Recognize" again. No OCR is executed thistime - unless you defined new windows or modified existing ones! OtherwiseReadiris just reformats the OCR results and saves them in the new text format orsends them to the target application you’ve just selected.

The same goes for any other element you change: when you add a page toyour OCR job, only that page will be recognized. When you create a new textzone on any page, only that zone will be recognized before the results get ex-ported.

You could for instance recognize a 10 page document and save it in a Wordfile. Then you quickly scan the abstract found on the cover page and send it by e-mail to an impatient colleague to finally scan the appendix - a table - and save allresults in an HTML file to be posted on your company’s web site.

CCCCCREAREAREAREAREATINGTINGTINGTINGTING P P P P PORORORORORTTTTTABLEABLEABLEABLEABLE D D D D DOCUMENTSOCUMENTSOCUMENTSOCUMENTSOCUMENTS

We still need to go deeper into one format: Adobe Adobe Adobe Adobe Adobe Acrobat PDFAcrobat PDFAcrobat PDFAcrobat PDFAcrobat PDF. Readirisallows you to create text- and image-based PDF documents.

3chapter2.PMD 24/11/2005, 12:0883

Page 104: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 84

What’s the difference between these formats? When you select the format"PDF Text", Readiris creates a PDF file that contains the text result. (Graphicsmay occur but only when graphic zones occur on the page - photographs, art-work etc.) In other words: the page image is not contained in the single-layeredPDF file! The format "PDF Image" is also single-layered, but it obviously onlycontains the scanned image, no OCR results!

The formats "PDF Text-Image" and "PDF Image-Text" yield different results:Readiris creates a searchable PDF file that contains the recognized text and thepage image. With “text-image” PDF files, the text is placed above the page im-age in the two-layered PDF file; with “image-text” PDF files, the text is con-tained under the page image. Use the "Search" tool of Adobe Reader and thePreview application and this becomes quickly obvious!

3chapter2.PMD 24/11/2005, 12:0884

Page 105: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 85

USER’S GUIDE

PDF files of the type “text-image” are actually pretty sophisticated: the pixelsof the recognized text are erased to create a legible document! Displaying recog-nized text in, say, black on top of black character bitmaps would give you textwith a heavy shadow...

You can recognize the sample image Background.jpg if you want to give it atry.

3chapter2.PMD 24/11/2005, 12:0885

Page 106: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 86

3chapter2.PMD 24/11/2005, 12:0886

Page 107: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 87

USER’S GUIDE

All text-based PDF files encode web site URLs as visible links: click them tovisit the mentioned web site!

3chapter2.PMD 24/11/2005, 12:0887

Page 108: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 88

Click the "Format" button to discover some options that concern the AcrobatPDF format: "Create Bookmarks" and "Embed Fonts". (As soon as the PDFformat is selected, autoformatting applies - and cannot be disabled.)

The option "Create Bookmarks" sees to it that a bookmark bookmark bookmark bookmark bookmark is created foreach document element - the graphics as well as the text blocks and tables. Forthe text zones, Readiris applies an intelligent algorithm to come up with a title, a“summary” per zone; the tables and graphics are simply numbered. (Anothernavigational element of PDF documents, page thumbnailsthumbnailsthumbnailsthumbnailsthumbnails, can be created dy-namically by your Adobe Reader software!)

The option "Embed Fonts" embeds the fonts in the PDF files. Embeddingfonts prevents font substitution when readers view and print the recognized docu-

3chapter2.PMD 24/11/2005, 12:0888

Page 109: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 89

USER’S GUIDE

ment. It ensures that readers - whatever their computer configuration may be -see the text in its original fonts. However, embedding fonts increases the file sizeof the recognized documents (somewhat)!

... O... O... O... O... ORRRRR R R R R READINGEADINGEADINGEADINGEADING T T T T THEMHEMHEMHEMHEM

Let’s look the other way for a moment. As Readiris offers full support of theAdobe Acrobat PDF format, you won’t just generate PDF files, you can alsoread them!

“Repurposing” PDF documents“Repurposing” PDF documents“Repurposing” PDF documents“Repurposing” PDF documents“Repurposing” PDF documents may be a major application of Readiris.There are several reason why this is the case. First of all, it’s a way of convertingimages into text: open image-based PDF documents, execute the recognition andsave the OCR result to a text document (in any supported text format). Text filesare editable, image files are not.

Second case: you can convert image-based PDF files to text-based PDF docu-ments. You then execute the recognition on “image-only” PDF files and save theOCR results... as text-based PDF documents! Text-based PDF files are search-able and editable, “image-only” PDF files are not.

Finally, converting PDF files is a way of “unlocking” PDF content. You canrecognize “read-only” PDF documents, where the text is normally inaccessible.With unprotected PDF files, the content can be retrieved (copied and saved to atext file), with “read-only” files, the content cannot be extracted. These docu-ments can only be viewed and printed!

Two important nuances must be noted: Readiris does not open password-protected PDF documents, even if all other PDF security barriers are brokendown by Readiris! (To be specific: “master passwords” that set the permissionsof PDF documents don’t bother Readiris, “user passwords” required to open aPDF document do.) Secondly, Readiris does not convert PDF documents thatcontain JPEG 2000 compressed images.

3chapter2.PMD 24/11/2005, 12:0889

Page 110: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 90

Proceed as usual: load PDF files into memory as you open prescanned images- faxes, snapshots made with your digital camera etc. Press Escape or executeCommand-. to interrupt the loading process between two pages.

There’s a specific option that concerns PDF files. You may want to indicatewhich pages you want to convert. If your objective is, say, to capture just achapter of a lengthy PDF publication, it doesn’t make any sense to load the entirebook into Readiris... Indicating the proper page rangepage rangepage rangepage rangepage range can save you lots of time!(This also holds for multipage TIFF images.)

You can give it a try with the file Sample.pdf in the Readiris image folder ifyou care to...

3chapter2.PMD 24/11/2005, 12:0890

Page 111: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 91

USER’S GUIDE

SSSSSAAAAAVINGVINGVINGVINGVING G G G G GRAPHICSRAPHICSRAPHICSRAPHICSRAPHICS S S S S SEPEPEPEPEPARAARAARAARAARATELTELTELTELTELYYYYY

In our PDF example, the graphic was included in the recognized text; whetherthis is the case depends on the formatting option "Include Graphics". Saving graph-

3chapter2.PMD 24/11/2005, 12:0891

Page 112: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 92

ics inside the text is only possible with “full” autoformatting, not with a “poor”text format such as Text (Unicode).

Still, with Readiris, you can save graphics without performing text recognition.As Readiris supports black-and-white, greyscale and color images, you can cap-ture lineart graphics and photographs.

How? Draw a graphic zone around the illustrations, cartoons etc. you need.Creating graphic windows manually is done in the same way as drawing text andtable windows, simply select the graphic window tool now on the image toolbar(or under the "Layout" menu).

Similar to the other window types, the tooltip of the status bar tells you howmany graphic zones there are.

Next, choose the command "Save Page" under the "File" menu and enable theoption "Graphics Only". You are prompted to specify a filename.

3chapter2.PMD 24/11/2005, 12:0892

Page 113: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 93

USER’S GUIDE

Determine which graphic file format you will use. Select a format that’s sup-ported by your paint or photo retouching software. A multitude of popular graphicformats is available: JPEG, PDF, Photoshop, PICT, PNG (“Portable NetworkGraphics”), TIFF and Windows bitmaps (BMP).

The graphics are saved in a single file. You don’t have to limit yourself to asingle graphic, but if you draw several graphic windows, they will be collected,“stacked” in a single file. (You can use the crop command of your paint or photoretouching program to separate them.)

Sides smaller than 1 mm are not allowed - bitmaps of that size hardly containany information. “Irregular”, non-rectangular windows are allowed, and so areseveral graphics. The surface not covered by your “complex” graphic zonesremains white. In the example below, two graphics zones - one in the left lowercorner and the other in the upper right corner - lead to lots of white space aroundthe actual graphics.

3chapter2.PMD 24/11/2005, 12:0993

Page 114: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 94

To send a graphic to the clipboard rather than save an image file, Ctrl-clickyour mouse over a graphic window and select the command "Copy as Graphic":the graphic zone is ready to be pasted!

The command "Copy Graphic Zones" under the "Layout" menu copies allgraphic zones simultaneously to the clipboard. (You may have to crop them lateron...)

3chapter2.PMD 24/11/2005, 12:0994

Page 115: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 95

USER’S GUIDE

RRRRREADINGEADINGEADINGEADINGEADING F F F F FAXESAXESAXESAXESAXES ANDANDANDANDAND D D D D DEFERREDEFERREDEFERREDEFERREDEFERRED R R R R RECOGNITIONECOGNITIONECOGNITIONECOGNITIONECOGNITION

Saving images as image files opens another possibility: you can save the fullpage and perform deferred OCR deferred OCR deferred OCR deferred OCR deferred OCR on it later on. That’s what we did with theprescanned images of our tutorials.

Simply scan a document and select the command "Save Page" under the "File"menu. (This command only saves single pages.) You’ll be prompted to save theentire page as a graphic file when you enable the option "All". (Any windows youmight have detected or drawn on the page are ignored.)

The color mode of the original image - color, greyscale or black-and-white - isalways maintained.

Select an appropriate graphic format - various graphic formats are available.When you save a document as a JPEG file for deferred OCR, ensure that youmaintain sufficient image quality. JPEG files with high compression rates de-grade the image quality - and the performance of your OCR software can sufferas a consequence.

3chapter2.PMD 24/11/2005, 12:0995

Page 116: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 96

As we just indicated, the command "Save Page" exclusively saves the currentpage. There’s a much more efficient way of saving your scans in graphic files forlater OCR: enable the image scanning mode.

To do so, select the document type "Image" on the main toolbar (or under the"Settings" menu). Note that the "Recognize" button is now replaced by the "Send"button!

Click the "Format" button to discover what this means. You have the sameflexibility that you have when you’re recognizing documents: you can save yourscans in files and send them directly to a target application - Photoshop, thePreview application etc. (Note how the "Format" button indicates the selectedgraphic format!)

Clicking the "Send" button exports all scans of the current document.

3chapter2.PMD 24/11/2005, 12:0996

Page 117: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 97

USER’S GUIDE

Furthermore, you can save scanned images instantly by dragging them fromthe page toolbar to the desktop! For each selected image, a JPEG image is cre-ated “on the fly”.

Obviously, you can load the image files into memory with the "Open" button onthe main toolbar (or with the corresponding command under the "File" menu).

Color, greyscale and black-and-white images are supported on an equal basis:Readiris allows you to open FlashPix images, GIF images, JPEG images, JPEG2000 images, MacPaint images, Photoshop images, PICT images, PNG images,QuickDraw GX images, QuickTime images, Silicon Graphics images, Targa im-ages, (uncompressed, packbits and Group 3 compressed) TIFF images, multipageTIFF images and Windows bitmaps (BMP). (Readiris also opens Adobe AcrobatPDF documents.)

This capability is particularly useful to convert your faxes faxes faxes faxes faxes into editable textfiles! If you have any influence over your correspondents, ask them to send faxeswith the “fine” quality - those faxes have the higher resolution of 200 dpi and willyield better OCR results.

RRRRRECOGNIZINGECOGNIZINGECOGNIZINGECOGNIZINGECOGNIZING T T T T TABLESABLESABLESABLESABLES

So far, we’ve recognized texts and faxes and we’ve saved graphics. Let’sprocess a table now. Take a table of figures and scan it, or open the sample imageTables.jpg in the image folder.

Actually, the image Tables.jpg contains two tables, and that’s no coincidence!The page analysis zones them as table windows, and Readiris will reconstructthem for you by recreating the tables cell by cell in your spreadsheet or by insert-ing a table object inside your wordprocessor files.

Let’s explore the different solutions, starting with the “gridded” or “framed”table - it has borders around the cells.

3chapter2.PMD 24/11/2005, 12:0997

Page 118: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 98

Run the recognition with the layout option "Retain Word and Paragraph For-matting" or "Recreate Source Document" enabled and the table gets recreated.Open your wordprocessor to have a look at the result: the cells and the borders

3chapter2.PMD 24/11/2005, 12:0998

Page 119: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 99

USER’S GUIDE

were recreated by Readiris one by one! (You could obviously have included thetext paragraphs in the text file as well.)

3chapter2.PMD 24/11/2005, 12:0999

Page 120: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 100

Let’s concentrate on the “ungridded” table for a moment - it has no bordersaround the cells. Note that the page analysis has nevertheless detected it. There’sanother interesting aspect to this table: its content is purely numeric!

For optimal OCR accuracy of such tables, we can limit the recognition to thenumeric symbolsnumeric symbolsnumeric symbolsnumeric symbolsnumeric symbols with the "Language" button. (The numeric mode is not strictlynumeric, it includes the symbols 0 to 9, +, *, /, %, , (comma), . (dot), (, ), -, =, $, £,¥ and the • symbol.)

As you can only do this when the table doesn’t contain any alphabetic symbols- otherwise the text portions won’t be recognized correctly - we can activate thenumeric mode only when we recognize this table but not the rest of the docu-ment.

3chapter2.PMD 24/11/2005, 12:09100

Page 121: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 101

USER’S GUIDE

When we do so by selecting this table with the "Sort" button, we can send theOCR result directly to the spreadsheet Microsoft Excel. Select HTML as textformat and Excel as target application with the "Format" button.

The spreadsheet is started up automatically and the result looks like this: thetypical table structure with rows and columns is recreated, and you are immedi-ately ready to process the data.

3chapter2.PMD 24/11/2005, 12:09101

Page 122: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 102

You may come across “ungridded” tables the page analysis does not detect astable zones because the columns are too widely spaced - Readiris tries to avoidconfusion with columnized text blocks. To create a table window manually, clickon the table window tool in the image toolbar and proceed as usual. (The tooltipof the status bar can tell you how many table windows there are on the currentpage.)

RRRRRECOGNIZINGECOGNIZINGECOGNIZINGECOGNIZINGECOGNIZING H H H H HANDWRITTENANDWRITTENANDWRITTENANDWRITTENANDWRITTEN T T T T TEXTEXTEXTEXTEXT

We’ve recognized scanned documents, tables, faxes, snapshots taken with adigital camera, we’ve saved graphics and we’ve converted PDF documents.Readiris adds yet another reading capability: the recognition of handwritten texts.

Actually, we should say handprinted text, not handwritten text! Handwriting isused to describe continuous, “cursive” handwritten text (“longhand”). The sym-bols within a word or character string touch, it is impossible to say where one

3chapter2.PMD 24/11/2005, 12:09102

Page 123: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 103

USER’S GUIDE

symbol ends and another starts. With handprinting, the “block letters” are sepa-rated and the recognition software has an easier job to isolate the individual char-acters.

It takes highly specialized software - “ICR” or “Intelligent Character Recog-nition” software - to recognize handprinted symbols. I.R.I.S.’ powerful ICR tech-nology is based on more than one million writing samples! Readiris supports allnatural writing styles - American or European. No imposed style is required.

Handprinting recognition is limited to the numerals (0-9), the uppercase letters(A-Z) and the punctuation symbols “,” (comma), “.” (dot) and “-” (hyphen).

Does that mean you can only takes notes in English? No, you can scribble inFrench, German, Italian, Spanish etc. too - as long as you don’t write the accentsand umlauts on the uppercase characters! For example, Readiris will not recog-nize “TÉLÉCOPIE À 4H”, “PÜNKTLICH IN ÖSTERREICH” or “PIÙQUALITÀ”, but will recognize “TELECOPIE A 4H.”, “PUENKTLICH INOESTERREICH” and “PIU QUALITA”. Still, you cannot take notes in Greek,Russian etc.: only the Latin alphabet is supported!

Where less than optimal reading results are obtained, you can adapt yourwriting style and use I.R.I.S.’ optimized writing form. Consult the on-line help ofReadiris to discover the writing rules. A few simple tips can teach you why sub-stitutions occur and how to avoid them. The blank I.R.I.S. writing form serves asa full-page “template” on which the block letters get filled out regularly spacedand in the right size! You can find the empty form on the Readiris CD-ROM forreprinting and editing.

3chapter2.PMD 24/11/2005, 12:09103

Page 124: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 104

So now we know how we can take machine-legible handwritten notes duringa meeting. How can we proceed to recognize these notes afterwards withReadiris? Draw a handprinting window around the handprinted text and executethe recognition. (You can give it a try with the sample image Handprinting.tif!)

The document characteristics - language, font type and character pitch - donot apply to handprinting. You’re limited to a basic English - or should we say“Latin”? - character set of (uppercase) block letters. Nor does interactive learn-ing apply: learning does not make sense in an environment where everybody hasa particular handwriting style. (As indicated, the ICR technology is based onmore than one million writing samples...)

3chapter2.PMD 24/11/2005, 12:09104

Page 125: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 105

USER’S GUIDE

RRRRREADINGEADINGEADINGEADINGEADING B B B B BARSARSARSARSARS ANDANDANDANDAND S S S S SPPPPPACESACESACESACESACES

And Readiris reads bar codes too...! Bar codes that figure in scanned imagescan be read and included as recognized data inside the output documents.

Bar codes are composed of parallel bars and spaces between them. Pre-defined combinations of bars and spaces represent specific characters. There

3chapter2.PMD 24/11/2005, 12:09105

Page 126: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 106

are several bar code standards or “symbologies”. All widespread bar code sym-bologies are supported.

Only laserprinted and inket printed bar codes have sufficient quality. Excludematrix printed bar codes: they do not produce sufficient contrast and their resolu-tion is mostly limited to 60 dpi! Readiris recognizes well contrasted bar codesbest; black bars on a white background yield the best results. Most bar codetypes require a “quiet zone” around the actual bar code. Bar codes don’t renderpartial results; a missing start or stop character or an incorrect check digit alwayslead to a misread, a zero result!

Draw a bar code window around each bar code - the page analysis does notdetect them - and execute the recognition. The bar codes are read and includedin the text output. You can also Ctrl-click a bar code zone and select the com-mand "Copy As Data"; the bar code is read and sent to the clipboard... (Thecheck characters of some bar code standards are verified but stripped from thereading results.) The sample image Barcode.tif illustrates how it works.

3chapter2.PMD 24/11/2005, 12:09106

Page 127: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 107

USER’S GUIDE

GGGGGETTINGETTINGETTINGETTINGETTING O O O O ONNNNN-----LINELINELINELINELINE H H H H HELPELPELPELPELP

This concludes our overview of Readiris. Some last-minute information maynot be included in this manual. We thus recommend you to consult the on-linehelp system for additional information on Readiris.

Go to the "Help" menu to do so. The command "Readiris Help" allows you tonavigate through the many help topics.

3chapter2.PMD 24/11/2005, 12:09107

Page 128: User's manual Readiris - Claro Software · I.R.I.S. detains the copyrights to the Readiris software, the OCR technology, the ICR technology, the bar code reading technology, the linguistic

2 - 108

You can also find more information on Readiris in the “ReadMe” file“ReadMe” file“ReadMe” file“ReadMe” file“ReadMe” file and onthe I.R.I.S. web site (www.irislink.com); the command "I.R.I.S. on the Internet"takes you directly to the I.R.I.S. home page.

3chapter2.PMD 24/11/2005, 12:10108