monthly program update april 12, 2012 andrew j. buckler, ms principal investigator

21
Monthly Program Update April 12, 2012 Andrew J. Buckler, MS Principal Investigator WITH FUNDING SUPPORT PROVIDED BY NATIONAL INSTITUTE OF STANDARDS AND TECHNOLOGY

Upload: erimentha-arianna

Post on 01-Jan-2016

35 views

Category:

Documents


6 download

DESCRIPTION

Monthly Program Update April 12, 2012 Andrew J. Buckler, MS Principal Investigator. With Funding Support provided by National Institute of Standards and Technology. Agenda. Working discussion on data curation , using facilities of Iterate for storage and provenance documentation model. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Monthly Program Update April 12, 2012 Andrew J. Buckler, MS Principal Investigator

Monthly Program UpdateApril 12, 2012

Andrew J. Buckler, MSPrincipal Investigator

WITH FUNDING SUPPORT

PROVIDED BY NATIONAL

INSTITUTE OF STANDARDS AND

TECHNOLOGY

Page 2: Monthly Program Update April 12, 2012 Andrew J. Buckler, MS Principal Investigator

Agenda

• Working discussion on data curation, using facilities of Iterate for storage and provenance documentation model.

• Updates on:– Metrology Workshop results.– QIBA 3A Test bed progress.

22

Page 3: Monthly Program Update April 12, 2012 Andrew J. Buckler, MS Principal Investigator

Part of our discussion on data curation and processing workflow from last month…

// B

usin

ess

Requ

irem

ents

FNIH

, QIB

A, a

nd C

-Pat

h pa

rtici

pant

s do

n’t

have

a w

ay to

pro

vide

pre

cise

sp

ecifi

catio

n fo

r con

text

for u

se a

nd

appl

icab

le a

ssay

met

hods

(to

allo

w

sem

antic

labe

ling)

:Bi

omar

kerD

B =

Spec

ify (b

iom

arke

r do

mai

n ex

perti

se, o

ntol

ogy

for l

abel

ing)

;Re

sear

cher

s an

d co

nsor

tia d

on’t

have

an

abili

ty to

exp

loit

existi

ng d

ata

reso

urce

s w

ith h

igh

prec

isio

n an

d re

call:

Refe

renc

eDat

aSet

+ =

Form

ulat

e (B

iom

arke

rDB,

{Dat

aSer

vice

} );

Tech

nolo

gy d

evel

oper

s an

d co

ntra

ct re

sear

ch

orga

niza

tions

don

’t ha

ve a

way

to d

o la

rge-

scal

e qu

antit

ative

runs

:Re

fere

nceD

ataS

et .C

olle

cted

Valu

e+ =

Ex

ecut

e (R

efer

ence

Dat

aSet

.Raw

Dat

a);

The

com

mun

ity la

cks

way

to a

pply

defi

nitiv

e st

atisti

cal a

naly

ses

of a

nnot

ation

and

im

age

mar

kup

over

spe

cifie

d co

ntex

t for

us

e:Bi

omar

kerD

B.Su

mm

aryS

tatis

tic+

= An

alyz

e ( {

Ref

eren

ceD

ataS

et .C

olle

cted

Valu

e } )

;In

dust

ry la

cks

stan

dard

ized

way

s to

repo

rt

and

subm

it da

ta e

lect

roni

cally

:efi

ling

tran

sacti

ons+

= P

acka

ge

(Bio

mar

kerD

B, {R

efer

ence

Dat

aSet

} );

333333

Page 4: Monthly Program Update April 12, 2012 Andrew J. Buckler, MS Principal Investigator

…and the associated storage model…Subject Predicate Object

ClinicalUtility is Investigation URI

ClinicalValidity is Investigation URI

TechnicalPerformance is Investigation URI

Investigation has SummaryStatisticType

Investigation has Study URI

Study has DescriptiveStatisticType

Study has Protocol URI

Study has Assay URI

Assay has RawData URI

Assay has AnnotationData URI

AIM file is AnnotationData URI

Mesh is AnnotationData URI

(using “Share” and “Duplicate” functions of RDSM to leverage cases across investigations)

(self-generating knowledgebase from RDSM hierarchy and ISA-TAB description files)

Reference Data Set Manager:

Heavyweight Storage with URIs

Knowledgebase:Lightweight

Storage linking to URIs

44

Page 5: Monthly Program Update April 12, 2012 Andrew J. Buckler, MS Principal Investigator

…leading us to: Principles of ProvenanceCentral to the scientific method is the idea of replicating

prior experiments such that they are transparent and verifiable.

We need to keep track of• the origin of data• transformation methods applied to the data

• not just which programs• version information is critical• copies of actual programs used (git).

555555

Page 6: Monthly Program Update April 12, 2012 Andrew J. Buckler, MS Principal Investigator

• Taverna keeps provenance data in a database on the machine from which the workflow is initiated

• We need to expose provenance for external users of QI-Bench• example: provenance of the data in an exported

ISA-TAB

666666

Page 7: Monthly Program Update April 12, 2012 Andrew J. Buckler, MS Principal Investigator

Provenance architecture of Iterate

77

Page 8: Monthly Program Update April 12, 2012 Andrew J. Buckler, MS Principal Investigator

Taverna allows access to the provenance data via a Java API.

• We have not explored this area of Taverna yet.• Taverna’s documentation indicates this is an

area under active development.

888888

Page 9: Monthly Program Update April 12, 2012 Andrew J. Buckler, MS Principal Investigator

Iterate Demonstration• Obtaining a list of communities to which a user

belongs• Nesting a workflow• Listing the items in a folder

999999

Page 10: Monthly Program Update April 12, 2012 Andrew J. Buckler, MS Principal Investigator

Workflow to list community memberships in Iterate

1010

Page 11: Monthly Program Update April 12, 2012 Andrew J. Buckler, MS Principal Investigator

Workflow to list community memberships in Iterate using a nested workflow

1111

Page 12: Monthly Program Update April 12, 2012 Andrew J. Buckler, MS Principal Investigator

Provenance application in QI-Bench Demonstrators:Investigation and Studies level (ISA-TAB compliant)

1212

Page 13: Monthly Program Update April 12, 2012 Andrew J. Buckler, MS Principal Investigator

Provenance application in QI-Bench Demonstrators:Assay and Data levels (not ISA-TAB compliant yet)

1313

Page 14: Monthly Program Update April 12, 2012 Andrew J. Buckler, MS Principal Investigator

Application• Provenance of

• Demonstrator40 data [input for analysis]• Demonstrator40 Output [obviously the output]

141414141414

Page 15: Monthly Program Update April 12, 2012 Andrew J. Buckler, MS Principal Investigator

Application• So we can answer

• What is Demonstrator40_download.zip?• How did we get the Demonstrator40 data?

• What was the original dataset and where did it come from?

• What transformation on the original dataset created the Demonstrator40 data folder?

151515151515

Page 16: Monthly Program Update April 12, 2012 Andrew J. Buckler, MS Principal Investigator

Update: Metrology Workshop results

1616

Page 17: Monthly Program Update April 12, 2012 Andrew J. Buckler, MS Principal Investigator

Update: QIBA 3A Test bed progress

1717

Page 18: Monthly Program Update April 12, 2012 Andrew J. Buckler, MS Principal Investigator

1818

Page 19: Monthly Program Update April 12, 2012 Andrew J. Buckler, MS Principal Investigator

Value proposition of QI-Bench• Efficiently collect and exploit evidence establishing

standards for optimized quantitative imaging:– Users want confidence in the read-outs– Pharma wants to use them as endpoints– Device/SW companies want to market products that produce them

without huge costs– Public wants to trust the decisions that they contribute to

• By providing a verification framework to develop precompetitive specifications and support test harnesses to curate and utilize reference data

• Doing so as an accessible and open resource facilitates collaboration among diverse stakeholders

1919

Page 20: Monthly Program Update April 12, 2012 Andrew J. Buckler, MS Principal Investigator

Summary:QI-Bench Contributions• We make it practical to increase the magnitude of data for increased

statistical significance. • We provide practical means to grapple with massive data sets.• We address the problem of efficient use of resources to assess limits of

generalizability. • We make formal specification accessible to diverse groups of experts that are

not skilled or interested in knowledge engineering. • We map both medical as well as technical domain expertise into

representations well suited to emerging capabilities of the semantic web. • We enable a mechanism to assess compliance with standards or

requirements within specific contexts for use.• We take a “toolbox” approach to statistical analysis. • We provide the capability in a manner which is accessible to varying levels of

collaborative models, from individual companies or institutions to larger consortia or public-private partnerships to fully open public access.

2020

Page 21: Monthly Program Update April 12, 2012 Andrew J. Buckler, MS Principal Investigator

QI-BenchStructure / Acknowledgements• Prime: BBMSC (Andrew Buckler, Gary Wernsing, Mike Sperling, Matt Ouellette)

• Co-Investigators– Kitware (Rick Avila, Patrick Reynolds, Julien Jomier, Mike Grauer)– Stanford (David Paik)

• Financial support as well as technical content: NIST (Mary Brady, Alden Dima, John Lu)

• Collaborators / Colleagues / Idea Contributors– Georgetown (Baris Suzek)– FDA (Nick Petrick, Marios Gavrielides) – UMD (Eliot Siegel, Joe Chen, Ganesh Saiprasad, Yelena Yesha)– Northwestern (Pat Mongkolwat)– UCLA (Grace Kim)– VUmc (Otto Hoekstra)

• Industry– Pharma: Novartis (Stefan Baumann), Merck (Richard Baumgartner)– Device/Software: Definiens, Median, Intio, GE, Siemens, Mevis, Claron Technologies, …

• Coordinating Programs– RSNA QIBA (e.g., Dan Sullivan, Binsheng Zhao)– Under consideration: CTMM TraIT (Andre Dekker, Jeroen Belien)

2121