impact final conference - apostolos antonacopoulos
DESCRIPTION
Case Study: Scanning ParametersTRANSCRIPT
![Page 1: IMPACT Final Conference - Apostolos Antonacopoulos](https://reader036.vdocuments.net/reader036/viewer/2022062616/54916d21b479599d0e8b4878/html5/thumbnails/1.jpg)
The Effect of Scanning Parameters on OCR ResultsA Case Study
Apostolos Antonacopoulos
PRImA Lab, The University of Salford, United Kingdom
www.primaresearch.org
![Page 2: IMPACT Final Conference - Apostolos Antonacopoulos](https://reader036.vdocuments.net/reader036/viewer/2022062616/54916d21b479599d0e8b4878/html5/thumbnails/2.jpg)
Outline
Background Image selection Methods and procedures Experiments
Experiment 1: Colour Vs. greyscale Vs. bitonal
Experiment 2: Effects of resolution Experiment 3: Comparison with NLNZ images
Conclusions
2
![Page 3: IMPACT Final Conference - Apostolos Antonacopoulos](https://reader036.vdocuments.net/reader036/viewer/2022062616/54916d21b479599d0e8b4878/html5/thumbnails/3.jpg)
Background Cost of storage is a real issue for Content Holders Study by Tracy Powell and Gordon Paynter of the
National Library of New Zealand (DLIB 2009) opened a number of questions
Aims: Examine the effects of colour in addition to
greyscale and bitonal Examine the effects of producing bitonal
images in different ways Examine the effects of different resolutions Study the results by image rather than average
3
![Page 4: IMPACT Final Conference - Apostolos Antonacopoulos](https://reader036.vdocuments.net/reader036/viewer/2022062616/54916d21b479599d0e8b4878/html5/thumbnails/4.jpg)
Image Selection
Qualitative selection Parts of newspaper articles (no layout issues) Variety of newspapers from British Library
collection Quality of overall page taken into account Regions of different quality selected from
same page Only text regions selected (no graphics
present) No additional artefacts (e.g. warping) present
4
![Page 5: IMPACT Final Conference - Apostolos Antonacopoulos](https://reader036.vdocuments.net/reader036/viewer/2022062616/54916d21b479599d0e8b4878/html5/thumbnails/5.jpg)
Methods and Procedures
Regions marked using Aletheia and extracted from the main image as separate PAGE files
Text was keyed and represented in PAGE files
Selected (“standard”) colour reduction and binarisation methods were applied
ABBYY FineReader Engine 9 used for OCR IMPACT OCR evaluation tool used
5
![Page 6: IMPACT Final Conference - Apostolos Antonacopoulos](https://reader036.vdocuments.net/reader036/viewer/2022062616/54916d21b479599d0e8b4878/html5/thumbnails/6.jpg)
Experiment 1: Colour/Grey/Bitonal6
![Page 7: IMPACT Final Conference - Apostolos Antonacopoulos](https://reader036.vdocuments.net/reader036/viewer/2022062616/54916d21b479599d0e8b4878/html5/thumbnails/7.jpg)
Accuracy Variation per Image
7
![Page 8: IMPACT Final Conference - Apostolos Antonacopoulos](https://reader036.vdocuments.net/reader036/viewer/2022062616/54916d21b479599d0e8b4878/html5/thumbnails/8.jpg)
Bitonal: Best Algorithm Vs. Scanner
8
![Page 9: IMPACT Final Conference - Apostolos Antonacopoulos](https://reader036.vdocuments.net/reader036/viewer/2022062616/54916d21b479599d0e8b4878/html5/thumbnails/9.jpg)
Original with Large Bitonal Variation
9
BL9_r0
![Page 10: IMPACT Final Conference - Apostolos Antonacopoulos](https://reader036.vdocuments.net/reader036/viewer/2022062616/54916d21b479599d0e8b4878/html5/thumbnails/10.jpg)
Experiment 2: Effects of Resolution
10
![Page 11: IMPACT Final Conference - Apostolos Antonacopoulos](https://reader036.vdocuments.net/reader036/viewer/2022062616/54916d21b479599d0e8b4878/html5/thumbnails/11.jpg)
Experiment 3: Examine NLNZ Images11
![Page 12: IMPACT Final Conference - Apostolos Antonacopoulos](https://reader036.vdocuments.net/reader036/viewer/2022062616/54916d21b479599d0e8b4878/html5/thumbnails/12.jpg)
Variations in Quality and Accuracy
12
Other bitonalalgorithmbetter NLNZ1_r1
Scanner bitonalbetter NLNZ4_r0
![Page 13: IMPACT Final Conference - Apostolos Antonacopoulos](https://reader036.vdocuments.net/reader036/viewer/2022062616/54916d21b479599d0e8b4878/html5/thumbnails/13.jpg)
Conclusions Averages do not give an accurate picture. Different
decisions should be taken for different document types
Better quality images leave room for improvement (re-OCR), especially when accuracy is far from high 90s%
Current OCR systems are not taking advantage of extra quality?
Higher quality (at least greyscale) is an investment Perhaps not so high resolution for “routine” material
“Lossy” compression is a real option Better to have a high quality image with an
imperceptible “loss” than a perfect low quality image!
13
![Page 14: IMPACT Final Conference - Apostolos Antonacopoulos](https://reader036.vdocuments.net/reader036/viewer/2022062616/54916d21b479599d0e8b4878/html5/thumbnails/14.jpg)
Further Information14
PRImAhttp://www.primaresearch.org
IMPACThttp://www.impact-project.eu