an e-team on statistical techniques for unsupervised segmentation and classification e. salerno cnr...
Post on 20-Jan-2016
215 views
TRANSCRIPT
An E-team on statistical techniques for unsupervised segmentation and
classification
E. Salerno
CNR – Istituto di Scienza e Tecnologie dell’Informazione
Pisa, Italy
Muscle Joint WP5-WP7 Focus Meeting, Rocquencourt, December 2005
Overview
• Unsupervised processing: Why?
• Statistical approach
• What we have done
• What we propose
• What we would like to share with partners
Unsupervised processing: why?
Unsupervised processing is often essential in important applications
Document image analysis
Showthrough cancellationYo no quiero encarecerte el servicio que te hago en darte a conocer tan notable y tan honorado caballero; pero quiero que me agradezcas ...
OCR
Remote sensing Thematization
Classification
Statistical approach
Problem setting
• A data model
• A source model
• A statistically significant data sample
• Learn the model (use statistics)
• Estimate the sources (inverse problem)
Statistical approach
Methods
• Independent component analysis
• Dependent component analysis
• Bayesian approaches
Applications
• Multispectral data analysis
• Multisensor data analysis
• Multiview data analysis
What we have done in document image analysis
Original Recovery of bleed-through
Color decorrelation
Attenuation of stains
What we have done in document image analysis
Color decorrelation
Data Output 1
Output 3Output 2
What we have done in document image analysis
Independent component analysis
Text extraction from ancient palimpsests
© The Owner of the Archimedes Palimpsest
Text separation from color document scans
Edge-preserving Bayesian approach
What we have done in document image analysis
Main text pattern at convergence
Show-through outline at convergence
Main text outline at convegence
Show-through pattern at convergence
What we have done in document image analysis
Other document image processing applications
• Watermark extraction
• Joint deblurring and separation
• Color restoration
• Show-through cancellation/extraction from recto-verso grayscale scans
What we propose
E-team on statistical techniques for unsupervised segmentation and classification
We are looking for partners with similar interests to collaborate in
• Extensive experimentation of available procedures on multispectral document data
• Development of specific data models for color/multispectral or grayscale recto-verso document images
• Ad-hoc registration procedures for recto and verso pages
• Joint deblurring-segmentation• Training (exploit MUSCLE fellowships)
What we propose
What we would like to share with partners
• ICA software for text extraction• Expertise in separation and
deblurring procedures• Graylevel recto-verso test
database (Gerolamo Cardano’s Contradicentium Medicorum, 1663)
What we propose
People at ISTI
• Anna Tonazzini• Ercan Kuruoglu• Emanuele Salerno• MUSCLE Fellow(s)• Research collaborators