community data annotation/curation

29
Community Data Annotation/Curation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 1 2 3 4 5 6 7 8 9 10 1112 13 1415 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49

Upload: randi

Post on 17-Jan-2016

36 views

Category:

Documents


0 download

DESCRIPTION

Community Data Annotation/Curation. Community Annotation/Curation. Feasibility studies Pick two anatomical areas (thorax, brain) Deliverables Infrastructure/process Distributed atlas Integration needs Visualization Federated database Ontologies Issues Intellectual property - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Community Data Annotation/Curation

Community Data Annotation/Curation

12

3

4

5

6

7

8910

11

12

1314

15

1617

18 1920

21 2223 24

25

26

27

28

29

30

31

32

33

1 234

5

67

89 10

111213

1415

16

17

181920

2122

23

24

25

26

2728

29

30

313233 34

35

3637

3839

40

41

4243

44

454647

4849

Page 2: Community Data Annotation/Curation

Community Annotation/Curation

Demo Project• Open atlas

Individuals Populations (??)

Success criteria• Acceptance and participation by

anatomy community• Portability of tools to other projects• At least one “good” atlas

Project cycles• Identify customers (anatomists)

and customer’s customers (radiology, surgery, algorithm developers, educators)

• “Extreme” approach, “release early, release often"

Feasibility studies• Pick two anatomical areas

(thorax, brain)Deliverables

• Infrastructure/process• Distributed atlas

Integration needs• Visualization• Federated database• Ontologies

Issues• Intellectual property• Business model

Page 3: Community Data Annotation/Curation

Open Atlas: Requirements

Open data and open processCustomer GUI applicationSoftware ToolkitMethods for curationMechanism for consensus buildingMechanisms for quality controlContinuous process feedbackProvenanceSoup to nuts software

• Reference implementation Visualization Editor Registration, model extraction, etc.

• Query applicationOutreach to customer’s customerLocal and web based

Page 4: Community Data Annotation/Curation

Open Atlas: Components

User interfaceSegmentation tools + manual correctionInterface to multiple ontologiesRevision controlAutomated quality assuranceDashboardsPackaging/deliveryData repositoryAPI for programmatic access to data/annotations/toolsCore team

• Anatomists/Radiologists(Domain expert)• Database design• Ontology support• Image analysis• Image/Geometry editor• Process support tools

Page 5: Community Data Annotation/Curation

Starting Points

U Wash FMA

NLM Visible Human Thorax• Original from EAI• Enhanced by Virtual Soldier Project

Brigham and Women’s Brain Atlas/Slicer

Page 6: Community Data Annotation/Curation

Community Data Annotation/Curation

Page 7: Community Data Annotation/Curation

Background Slides:

Open, Distributed and Collaborative Data Annotation

Bill LorensenInsight Software Consortium

Page 8: Community Data Annotation/Curation

Motivation

Many imaging communities are data starved• Algorithm developers• End users

Lots of raw data, but very little annotated data• LIDC• Notre Dame Biometrics Data Distribution

Page 9: Community Data Annotation/Curation

Forms of Annotation

Anatomy labelsContoursStatistical

Anatomical landmarksTemplatesGround truth

Page 10: Community Data Annotation/Curation

Problem Statement

Sensors are producing large amounts of dataAnnotation adds valueAnnotation of large data collections is expensive and error prone

Page 11: Community Data Annotation/Curation

Customers

Algorithm developersAnatomistsTeachersSensor manufacturers

Page 12: Community Data Annotation/Curation

Solution

A distributed, coordinated community can efficiently and economically annotate large sets of data

• wikipedia• wikimapia

Extreme programming techniques can be applied to the data annotation process

Page 13: Community Data Annotation/Curation

Examples

Anatomical atlasesFace recognition

• 2D photos• 3D range data

Page 14: Community Data Annotation/Curation

Example – FBI Facial Reconstruction

Two data collections• 300 CT datasets of heads• 1000 photo and range data of faces

Challenge• Extract models of eyes, noses and mouths from

range data• Replace eyes, noses and mouths in CT data with

range data models

Page 15: Community Data Annotation/Curation

Face Template

12

3

4

5

6

7

8910

11

12

1314

1516

17

18 1920

21 2223 24

25

26

27

28

29

30

31

32

33

1 234

5

67

89 10

111213

1415

16

17

181920

2122

2324

2526

2728

29

3031

3233 34

35

3637

3839

4041

4243

44

454647

4849

Photo Range Data

Page 16: Community Data Annotation/Curation

Mouth

Page 17: Community Data Annotation/Curation

Multidisciplinary Project

Image AnalysisAnatomyDatabasesOntologiesSoftware EngineeringQuality AssuranceVisualization

Page 18: Community Data Annotation/Curation

Menu for Success

A Community with a common visionA pool of talented and motivated developers/scientistsA mix of academic and commercialAn organized, light weight approach to product developmentA leadership structureCommunicationA business model

Adopted from “Open Source Menu for Success”

Page 19: Community Data Annotation/Curation

Leadership Structure

Follow NCBC modelAlgorithms

• Ontology creation• Image analysis

EngineeringDriving Projects

• Open Atlas• Radiology ground truth

Page 20: Community Data Annotation/Curation

Business Model

All core technology is open, without restrictionAll NLM supported annotation is open, without restrictionProprietary enhancement of annotated data is allowedAnnotated data can be used in commercial products without restriction

Page 21: Community Data Annotation/Curation

Guiding Principles

Page 22: Community Data Annotation/Curation

Extreme Data Annotation

The community owns the data

Although the origin of the data is retained, others are free to correct defects and enhance each other's data

In the end, all of the data should appear as though one person annotated it

Page 23: Community Data Annotation/Curation

Extreme Data Annotation

Release early, release often

Although people are tempted to keep their data under wraps until it is perfect, the process encourages them to release their data as soon as it passes some minimum quality control testsThe longer the data is visible to the community, the better integrated it will be

Page 24: Community Data Annotation/Curation

Extreme Data Annotation

Continuous integration

There is no scheduled porting to databases or model formatsAll new data is integrated into supported databases and data formats continuously

Page 25: Community Data Annotation/Curation

Extreme Data Annotation

Everyone agrees to keep the data free of defects

Although everyone is encouraged to submit their data early, the data must pass quality tests and integration tests nightlyA continuous QA process sends e-mails to people who check in data that does not meet quality control testsMore effectively, the community enforces the commitment though peer pressure

Page 26: Community Data Annotation/Curation

Software/Data Analogies

SoftwareProgramText editorCompilation errorCompilationStyle

DataAnnotated dataImage editorCollisionsModel creationOntology

Page 27: Community Data Annotation/Curation

Why NLM?

NLM produces, collects, annotates, stores and distributes data

• Medline• Visible Human Project• Mayo Data Collection

NLM has managed distributed, collaborative, multidisciplinary projects

• Insight Toolkit• HPCC Internet 2

Page 28: Community Data Annotation/Curation

What is needed?

Select a pilot project• Open Atlas Project

Select customersSelect core team

• Anatomists• Database design• Ontology support• Image analysis• Image/Geometry editor• Process support tools

Page 29: Community Data Annotation/Curation

Open Atlas Project

Create anatomical atlases from cross-sectional image dataSemi-automatic and manual labeling of structuresEngage the anatomy community