nih workshop: informatics for data and resource discovery in addiction research july 8, 2010

60
NIH WORKSHOP: INFORMATICS FOR DATA AND RESOURCE DISCOVERY IN ADDICTION RESEARCH July 8, 2010 Case Study 5 (NEMO): Informatics tools to support theoretical and practical integration of human neuroscience data Gwen Frishkoff, Ph.D. Psychology & Neuroscience, Georgia State University NeuroInformatics Center, University of Oregon http://nemo.nic.uoregon.edu

Upload: keith-hall

Post on 30-Dec-2015

27 views

Category:

Documents


1 download

DESCRIPTION

NIH WORKSHOP: INFORMATICS FOR DATA AND RESOURCE DISCOVERY IN ADDICTION RESEARCH July 8, 2010. Case Study 5 (NEMO) : Informatics tools to support theoretical and practical integration of human neuroscience data Gwen Frishkoff, Ph.D. Psychology & Neuroscience, Georgia State University - PowerPoint PPT Presentation

TRANSCRIPT

NIH WORKSHOP: INFORMATICS FOR DATA AND RESOURCE DISCOVERY IN ADDICTION RESEARCH

July 8, 2010

Case Study 5 (NEMO):Informatics tools to support

theoretical and practical integration of human neuroscience data

Gwen Frishkoff, Ph.D.Psychology & Neuroscience, Georgia State University

NeuroInformatics Center, University of Oregon

http://nemo.nic.uoregon.edu

Neuro–Informatics: Crossing the language divide

What the computer scientist says…

Should wewrite out the data to XML or

RDF triples? And do you plan to use ontology rules

to do complex reasoning or just use SQL to query the

data?

What the neuroscientist hears…

Blah blah blah blah blah…data… blah blah

blah? And blah blah blah…the data?

GOALS FOR THIS TUTORIAL

• What is an ontology & what’s it for?– Why bother? (Case Study: Classification of EEG/ERP data)

– What are some “best practices” in ontology design & implementation?

• What is RDF & what’s it for?– How does RDF represent information?– How is it used to link data to ontologies?– How can ontology-based annotation be used to

support classification of data?

Case Study 5 (NEMO): Neural ElectroMagnetic Ontologies

The problem (pattern classification)

The methods & tools ontologies RDF database

Proof of concept (a worked example)

The challenge (pattern classification)

The methods & tools ontologies RDF database

Proof of concept (a worked example)

Case Study 5 (NEMO): Neural ElectroMagnetic Ontologies

2-min Primer on EEG/ERP Methods

EEGs (“brainwaves” or flunctuations in brain electrical potentials) are recorded by placing two or more electrodes on the scalp surface.

256-channel Geodesic Sensor Net ~5,000 ms

Event-related potentials (ERP)

ERPs (“event-related potentials”) are the result of averaging across multiple segments of EEG, time-locking to an event of interest.

ERP Patterns (“Components”)

1. TIME — peak latency, duration (WHEN in time)2. SPACE— scalp “topography” (WHERE on scalp)3. FUNCTION — sensitivity to experiment factors

Donchin & Duncan-Johnson, 1977

ERP Patterns are characterized by 3 dimensions:

120 ms

• Tried and true method for noninvasive brain functional mapping

• Millisecond temporal resolution• Direct measure neuronal activity• Portable and inexpensive• Recent innovations give new windows

into rich, multi-dimensional patterns– More spatial info (high-density EEG)– More temporal & spectral info (JTF, etc.)– Multimodal integration & joint recordings

of EEG and fMRI– Specificity of different patterns

beyond “reduction in P300” amplitude…

1 sec

Brain Electrophysiology (EEG/ERP): The promise (Biomarkers of addiction?)

Brain Electrophysiology (EEG/ERP): The challenge

• An embarrassment of riches– A wealth of data

– A plethora of methods

• A lack of integration– How to compare patterns across studies, labs?

– How to do valid meta-analyses in ERP research?

• A need for robust pattern classification– Bottom-up (data-driven) methods– Top-down (knowledge-driven) methods

An embarrassment of riches

410 ms

450 ms

330 ms

Peak latency 410 ms

A lack of standardization

Will the “real” N400 please step forward?

Hypothetical Database Query: Show me all the N400 patterns in data set X.

Putative “N400”-labeled patterns

Parietal N400

≠≠

Frontal N400

Parietal P600

A Need for Integration

Neural ElectroMagnetic Ontologies(NEMO)

The driving goal is to develop methods and tools to support cross-lab, cross-experiment integration of EEG and MEG data

We bring a set of methods & tools to bear to address this:

A set of formal (OWL) ontologies for representation of EEG/MEG and ERP/ERF data

A suite of tools for ontology-based annotation and analysis of EEG and ERP data

An RDF database that stores annotated data from our NEMO ERP consortium and supports ERP pattern classification via SPARQL queries

The challenge (EEG pattern classification)

The methods & tools ontologies RDF database

Proof of concept (a worked example)

Case Study 5 (NEMO): Neural ElectroMagnetic Ontologies

What’s an ontology & what’s it for?“Highly semantically

structured”

What does this mean & what

does it buy us?

Ontologies for high-level, explicit

representation of domain knowledge

theoretical integration*

*NOTE: We can record pattern definitions from

literature in ontology without committing to

the truth of these records now and forever

Science evolves… So do ontologies!!

Maryann: “Avoid ontology

wars…”

Ontology design principles(based on OBO Foundry

recommendations)1. Factor the domain to generate modular

(“orthogonal”) ontologies that can be reused, integrated for other projects

2. Reuse existing ontologies (esp. foundational concepts) to define basic (low-level) concepts

3. Validate definitions of high-level concepts in bottom-up (data-driven) as well as top-down (knowledge-driven) methods

4. Collaborate with a community of experts in collaborative design, testing of ontology-based tools for data representation and analysis

Factoring the ERP domain

1 sec

TIME SPACE

FUNCTION Modulation of pattern features (time,

space, amplitude) under different experiment conditions

Overview: NEMO Ontologies– NEMO core modules:

• NEMO_spatial• NEMO_temporal• NEMO_functional• NEMO_ERP• NEMO_data

– NEMO backend:• NEMO_relations• NEMO_imports• NEMO_deprecated• NEMO_annotation_properties

ERP spatial subdomain

1 sec

TIME SPACE

FUNCTION Modulation of ERP pattern features under different experiment conditions

International 10-10 EEG Electrode Locations

ITT electrode location Fz

(medial frontal)

Scalp surface “regions of interest”

LEFT MEDIAL RIGHT

FRONTAL

TEMPORAL

PARIETAL

OCCIPITAL

Reuse in dev’t of NEMO Spatial

BFO (Basic Formal

Ontology) “UPPER

ONTOLOGY”

FMA(Foundational

Model of Anatomy)

“MIDLEVEL ONTOLOGY”

ERP temporal subdomain

1 sec

TIME SPACE

FUNCTION Modulation of ERP pattern features under different experiment conditions

Early (“exogenous”) vs. Late (“endogenous”) ERP patterns

~0-150 ms after event (e.g., stimulus onset)

EARLY

501 ms or more after event (e.g., stimulus onset)

LATE

~151-500 after event (e.g., stimulus onset)

MID-LATENCY

NEMO Temporal Ontology

Collaboration in dev’t of NEMO ERP

1 sec

TIME SPACE

FUNCTION Modulation of ERP pattern features under different experiment conditions

NEMO Functional Ontology

Angela Laird

BrainMap

Jessica Turner

BIRN(now part of Neurolex)

CogPO

http://brainmap.org/scribe/index.html

“Cognitive ontologies” Formalization of experiment metadata

CARMEN Project: Development of MINI

Frank Gibson & colleagues

Reconsistituting the ERP domain…

1 sec

TIME SPACE

FUNCTION Modulation of ERP pattern features under different experiment conditions

Frishkoff, Frank, et al., 2007

Validation through application of NEMO ontologies in modeling ERP data

The challenge (EEG pattern classification)

The methods & tools ontologies RDF database

Proof of concept (a worked example)

Case Study 5 (NEMO): Neural ElectroMagnetic Ontologies

Ontologies for high-level, explicit

representation of domain knowledge

theoretical integration

RDF to support principled mark-up of data for meta-

analysispractical integration

NEMO International Language & Literacy Consortium

Tim CurranUniversity of Colorado

Kerry KibornUniversity of Glasgow

Dennis MolfeseUniversity of Louisville

Chuck PerfettiUniversity of Pittsburgh

John ConnollyMcMaster University

Formed in 2007

What is RDF and what is it for?RDF graph

(data model)

Annoting EEG/ERP data

Pattern Labels

Functional attributes

Temporal attributes

Spatial attributes

= + +

Robert M. Frank

Concepts coded in OWL NEMO ontology

Data coded in RDF NEMO database

HOW?

Annotating Data in RDF• Data Annotation

– The process of marking up or “tagging” data with meaningful symbols; tags may come from ontology linked to a URI

• URI (Uniform Resource Identifier)– A compact sequence of characters that identifies an abstract

or physical resource (typically located on the Web)

• RDF (Resource Description Framework)– RDF is a directed, labeled graph (data model) for representing

information (typically on the Web)

*See Glossary (http://www.seiservices.com/nida/1014080/ReadingRoom.aspx)

Recall: The goal is to formulate pattern definitions, use them to classify data, and ultimately to revise them based on

meta-analysis results

Observed Pattern = “N400” iff

Event type is onset of meaningful stimulus (e.g., word) AND

Peak latency is between 300 and 500 ms AND

Scalp region of interest (ROI) is centroparietal AND

Polarity over ROI is negative(>0)

The rule (just the temporal criterion)as it appears in Protégé

Protégé rendering

OWL/RDF rendering

Typical tabular representation of summary ERP data

Peak latency measurement

ERP observation (pattern extracted from “raw” ERP data)

The “RDF Triple”In RDF form: <001> <type> <NEMO_0000093>

Subject – Predicate –Object In natural language:

The data represented in row A is an instance of (“is a”) some ERP pattern.

That is, measurements (cells) are “about” ERP patterns (rows).

In graph form:

RDF Triple #2

In natural language =

The data represented in cell Z (row A, column 1) is an instance of (“is a”) a peak latency temporal measurement (i.e., the time at which the pattern is of maximal amplitude)

In RDF form: <002> <type> <NEMO_0745000>

Subject – Predicate –Object

RDF Triple #3

This graph represents an assertion, expressed in RDF =<001> <is_peak_latency_measurement_of> <002>

The data represented in cell Z is a temporal property of the ERP pattern represented in row A

Recall: Pattern definition is encoded in the ontology (not in RDF data rep!)

This is the inference that we want to make

Pattern classification is the goal

The challenge (EEG pattern classification)

The methods & tools ontologies RDF database

Proof of concept (a worked example)

Case Study 5 (NEMO): Neural ElectroMagnetic Ontologies

Formulating pattern rules in the ontology

First, we write the rule in semi-natural language:

IF (1) 001 type ERP_spatiotemporal pattern• and (2) 002 type peak_latency_measurement_datum• and (3) 002 is_peak_latency_measurement_of 001,• and (4) 002 has_numeric_value X,• and (5) 500 >= X >= 300 (X has datatype decimal)

(in reality, there are spatial, temporal, & functional criteria…)

THEN (6) 001 type N400_pattern

Translating the rule into OWL/RDF

Next, we convert the rule to a SPARQL query by replacing natural language terms with corresponding URI (tags) from NEMO ontology

• type rdf:type • ERP_spatiotemporal_pattern NEMO_0000093• peak_latency_measurement NEMO_0745000• is_measurement_of NEMO_9278000• has_numeric_value NEMO_7943000

Executing the query

Finally, we load Virtuoso’s SPARQL interface http://nemo.nic.uoregon.edu:8890/sparql

& then cut and paste the query into the Query textbox and click Run Query.

…. And Virtuoso returns the following results (for ex):

As a result, we can deduce that ERP observations 0002, 0003, 0004, 0006, and 0140 are

N400 pattern instances… QED

Cycles of Knowledge discovery & Knowledge Engineering (i.e., Onto Dev’t)

Take-home message from CARMEN project:

“Raw data remains static; metadata evolves.”

(note this implies that the ontology also evolves!)

“Data integrity is preserved; the science has room to develop”

NEMO Database

Design

Linking Shared Data & Resources(http://linkeddata.org/)

NIF

NEMO

CARMENHeadIT

NOW YOU SHOULD KNOW…

• What is an ontology & what’s it for?– Why bother?– What are some “best practices” in ontology

design & implementation?

• What is RDF & what’s it for?– How does RDF represent information?– How is it used to link data to ontologies?– How can ontology-based annotation be used to

support classification of data?

Funding from the National Institutes of Health (NIBIB), R01-MH084812 (Dou, Frishkoff, Malony)

NEMO Ontology Task ForceRobert M. Frank (NIC)Dejing Dou (CIS)Paea LePendu (CIS)Haishan Liu (CIS)Allen Malony (NIC, CIS)Snezana Nikolic (PSY, GSU)

Acknowledgments

www.nemo.nic.uoregon.edu

NEMO EEG/MEG Data ConsortiumTim Curran (U. Colorado)Dennis Molfese (U. Louisville)John Connolly (McMaster U.)Kerry Kilborn (Glasgow U.)Charles Perfetti (U. Pittsburgh)

Special thanks to:Maryann Martone & associates (NIF)Jessica Turner (cogPO)Angela Laird (BrainMap)Scott Makeig & Jeff Grethe (EEGLAB/HeadIT)