an e-team on statistical techniques for unsupervised segmentation and classification e. salerno cnr...

14
An E-team on statistical techniques for unsupervised segmentation and classification E. Salerno CNR – Istituto di Scienza e Tecnologie dell’Informazione Pisa, Italy Muscle Joint WP5-WP7 Focus Meeting, Rocquencourt, December 2

Post on 20-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An E-team on statistical techniques for unsupervised segmentation and classification E. Salerno CNR – Istituto di Scienza e Tecnologie dell’Informazione

An E-team on statistical techniques for unsupervised segmentation and

classification

E. Salerno

CNR – Istituto di Scienza e Tecnologie dell’Informazione

Pisa, Italy

Muscle Joint WP5-WP7 Focus Meeting, Rocquencourt, December 2005

Page 2: An E-team on statistical techniques for unsupervised segmentation and classification E. Salerno CNR – Istituto di Scienza e Tecnologie dell’Informazione

Overview

• Unsupervised processing: Why?

• Statistical approach

• What we have done

• What we propose

• What we would like to share with partners

Page 3: An E-team on statistical techniques for unsupervised segmentation and classification E. Salerno CNR – Istituto di Scienza e Tecnologie dell’Informazione

Unsupervised processing: why?

Unsupervised processing is often essential in important applications

Document image analysis

Showthrough cancellationYo no quiero encarecerte el servicio que te hago en darte a conocer tan notable y tan honorado caballero; pero quiero que me agradezcas ...

OCR

Remote sensing Thematization

Classification

Page 4: An E-team on statistical techniques for unsupervised segmentation and classification E. Salerno CNR – Istituto di Scienza e Tecnologie dell’Informazione

Statistical approach

Problem setting

• A data model

• A source model

• A statistically significant data sample

• Learn the model (use statistics)

• Estimate the sources (inverse problem)

Page 5: An E-team on statistical techniques for unsupervised segmentation and classification E. Salerno CNR – Istituto di Scienza e Tecnologie dell’Informazione

Statistical approach

Methods

• Independent component analysis

• Dependent component analysis

• Bayesian approaches

Applications

• Multispectral data analysis

• Multisensor data analysis

• Multiview data analysis

Page 6: An E-team on statistical techniques for unsupervised segmentation and classification E. Salerno CNR – Istituto di Scienza e Tecnologie dell’Informazione

What we have done in document image analysis

Original Recovery of bleed-through

Color decorrelation

Page 7: An E-team on statistical techniques for unsupervised segmentation and classification E. Salerno CNR – Istituto di Scienza e Tecnologie dell’Informazione

Attenuation of stains

What we have done in document image analysis

Color decorrelation

Page 8: An E-team on statistical techniques for unsupervised segmentation and classification E. Salerno CNR – Istituto di Scienza e Tecnologie dell’Informazione

Data Output 1

Output 3Output 2

What we have done in document image analysis

Independent component analysis

Text extraction from ancient palimpsests

© The Owner of the Archimedes Palimpsest

Page 9: An E-team on statistical techniques for unsupervised segmentation and classification E. Salerno CNR – Istituto di Scienza e Tecnologie dell’Informazione

Text separation from color document scans

Edge-preserving Bayesian approach

What we have done in document image analysis

Main text pattern at convergence

Show-through outline at convergence

Main text outline at convegence

Show-through pattern at convergence

Page 10: An E-team on statistical techniques for unsupervised segmentation and classification E. Salerno CNR – Istituto di Scienza e Tecnologie dell’Informazione

What we have done in document image analysis

Other document image processing applications

• Watermark extraction

• Joint deblurring and separation

• Color restoration

• Show-through cancellation/extraction from recto-verso grayscale scans

Page 11: An E-team on statistical techniques for unsupervised segmentation and classification E. Salerno CNR – Istituto di Scienza e Tecnologie dell’Informazione

What we propose

E-team on statistical techniques for unsupervised segmentation and classification

We are looking for partners with similar interests to collaborate in

• Extensive experimentation of available procedures on multispectral document data

• Development of specific data models for color/multispectral or grayscale recto-verso document images

• Ad-hoc registration procedures for recto and verso pages

• Joint deblurring-segmentation• Training (exploit MUSCLE fellowships)

Page 12: An E-team on statistical techniques for unsupervised segmentation and classification E. Salerno CNR – Istituto di Scienza e Tecnologie dell’Informazione

What we propose

What we would like to share with partners

• ICA software for text extraction• Expertise in separation and

deblurring procedures• Graylevel recto-verso test

database (Gerolamo Cardano’s Contradicentium Medicorum, 1663)

Page 13: An E-team on statistical techniques for unsupervised segmentation and classification E. Salerno CNR – Istituto di Scienza e Tecnologie dell’Informazione

What we propose

People at ISTI

• Anna Tonazzini• Ercan Kuruoglu• Emanuele Salerno• MUSCLE Fellow(s)• Research collaborators

Page 14: An E-team on statistical techniques for unsupervised segmentation and classification E. Salerno CNR – Istituto di Scienza e Tecnologie dell’Informazione