marco brandizi corso di dott. in informatica, univ. milano bicocca xix ciclo progress report feb...

55
Marco Brandizi Marco Brandizi Corso di Dott. in Informatica, Univ. Corso di Dott. in Informatica, Univ. Milano Bicocca Milano Bicocca XIX Ciclo XIX Ciclo Progress Report Progress Report Feb 2005 Feb 2005

Upload: elmer-mathews

Post on 29-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

Marco Brandizi Marco Brandizi

Corso di Dott. in Informatica, Univ. Milano BicoccaCorso di Dott. in Informatica, Univ. Milano Bicocca

XIX CicloXIX Ciclo

Progress ReportProgress Report

Feb 2005Feb 2005

Page 2: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

Agenda

Microarrays and Gene Expression overview

A Knowledge Managment System for uA data

management

Motivations

What to model and where to start from

First elaborations

Ongoing work and future

Page 3: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

Gene Expression and Microarrays

Page 4: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

DNA

gene

mRNA

protein

Genes Machine

Cell/Life

Page 5: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

Microarray Data / Details

Page 6: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

Microarray Data

Page 7: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

Microarrays Data Mgmt IssuesExp. data vs. seq. data:

Context dependent (living system, exp. Conditions)

Lack of standard unit of measure

Several normalizations methods

Multiple platforms and methods

No standard for data annotation

Vocabularies and terminology coherence

Details about: experiment, source, protocols, exp.

conditions

Page 8: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

Microarrays Data Mgmt Issues / 2Evidences about data quality

What to store?

Raw Images

Computed values

Normalized values

How to find data

Complex vocabularies aware systems (ontologies)

Data mining and exp. comparison tools

Data access control

Page 9: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

Issues => MIAME/MAGE

Page 10: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

MIAME Experiment Modelling

Page 11: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

GCA DB

Page 12: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

GCA DB

Page 13: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

GCA DB

Page 14: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

Need of a KMS for uA data management

Page 15: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

The uA Experiment Cycle

Page 16: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

“Closing the loop”

Page 17: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

“Closing the loop”

Page 18: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

uA KMS: What to model?

Page 19: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

Knowledge management... what?Genes

Textual annotations, literature

Interactions, pathways.

Genes collections (functional families, clusters)

Experiment and Experimental Conditions

Keyword/ontology based searches

Tested conditions searches

Expression Values

Navigation

Same trascriptome/trend/correlation/pattern

Page 20: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

Knowledge management... what?Chips

Keyword searches

Annotations about chip quality, protocols to be used, etc.

People

“Is expert in ...”

“Works with ...”

“Is studing ...”

Its ranking is X (based on publications, user preferences, etc.

Page 21: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

Knowledge management... what?

“Does IL-2 regulate something and under what conditions?”

Interactions of gene: IL2

Note added by Norman on Dec 10, 2003, deduced by text, not confirmed [see original text]: IL2 --------> UPREGULATES -----------> IL10, IL12 | ON --> Cellls --> Type: DC UNDER CONDITION: LPS Stimulus

Page 22: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

Knowledge management... what?

Page 23: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

Knowledge management... what?

Page 24: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

Knowledge management... what?

Page 25: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

uA KMS: Where to start from?

Page 26: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

What to do first?Gene Expression Formal Model

Focused on GE measures

Oriented to “closing the loop” goal

Several things to start from

Ontologies and Inference Systems

Already defined alike models

Other alike systems

Page 27: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

Defining a GE ModelStart Point: Ontologies and Inference Systems

XML->RDF->...->OWL, and related tools (ex.:

Protegé, Racer, Jena)

Logics, particularly Description Logic

Inferential Systems and Languages (ex.: Prolog)

Page 28: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

Defining a GE ModelStart Point: Already defined alike models

"Modeling Gene Expression", Proceedings of NETTAB/2004,

www.loa-cnr.it, A model in Description Logic of GE, but

without focus on microarrays and expression intentsities

Page 29: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

Defining a GE ModelStart Point: Already defined alike models

Very similar to previous work, but with tools for

annotation/querying of microarray chips

Yet, seems not focused on data/assays/etc. annotation.

Page 30: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

Defining GE ModelStart point: Other alike systems

Synapsia by Agilent, very similar, but not focused on uAs

Hybrow, www.hybrow.org, a computer-aided hypothesis

evaluation

The Notebook Project, www.notebook.org, a bio-KMS based

on SOAP and P2P

2004, Sarini, M., Blanzieri, E., Giorgini, P., and Moser, C.,

From actions to suggestions: supporting the work of biologists

through laboratory notebooks

Page 31: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

Defining GE ModelStart point: Other alike systems

Page 32: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

uA KMS: Toward a GE Model

Page 33: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

Defining GE ModelGene Expression Formal Model

Basic elements: genes, hybridizations, experiments

Page 34: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

Defining GE ModelGene Expression Formal Model

Basic elements: annotated sets

Page 35: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

Defining GE ModelGene Expression Formal Model

Basic elements: annotated sets

Page 36: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

Gene Expression Entities

Page 37: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

Entities Grouping

EntitityCollection ::= Cluster of DataSet | Cluster of Entity

Cluster of DataSet ::=

Cluster of DataSet

| GeneCluster of DataSet.GeneSet

| HybCluster of DataSet.HybSet

Cluster of Entity :: =

Cluster of Entity

| Set of Entity

All entities in a cluster are of same type. Ex. A cluster of genes contains hierarchically grouped sets of genes, only genes. NOTE: Grammar here used is VERY informal!!!

Page 38: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

Entities GroupingGeneSet ::=

Set of Gene

HybSet ::= Set of Hybridization

Set of X ::= { x : x IS-A X }

Singleton ( C ) ::= { S : S = Set of C AND #S = 1 }

Page 39: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

Annotations

Annotation ::= EntityCollection => AnnotationSet

Annotation allows to track Gene Expression data with useful info.

Page 40: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

Annotations/Basics

Annotation ( EmptySet ) ::= EmptySet

Annotation ( Singleton ( Entity e ) ) ::= Attributes ( e ) U BaseAnnotation ( e )

Page 41: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

Annotations/BasicsBaseAnnotation ( Any ) ::=

To be decided, first ideas is a set of:

Name/Value/Type, and Description like in MAGE

External Reference, with URI, or attachment

Graph attachment, "vectoring" values, ex: PCA with components values,

scatter plots wit

Annotation Author

Annotation Date

Security/Access references

Alike the classes Extendeable, Describable, Identifiable of MAGE-OM

Entity annotates another Entity, ex.: Exp author

Page 42: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

Annotations/BasicsAttributes ( Entity e ) ::=

Set of < attrib, value, type > for each declared attribute of Entity

attribute may be declared in JavaBean fashion, optionally providing a

mapping for type and semantic of attribute

Annotation ( GeneSet GS ) ::=

BaseAnnotation ( GS )

U Annotation ( g ) : g BELONGS GS

U BiologicalAnnotation ( GS )

Page 43: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

Annotations/Biological Ann.BiologicalAnnotation ( GS ) ::=

Allows for tagging the gene set with a biological meaning the genes

have ben grouped why

Ex.:

belonging to functional family of apoptosis

in the KEGG pathway about IL-2

under GO ID #10234

Page 44: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

Annotations/Data SetsAnnotation ( Cluster of DataSet ds ) ::=

BaseAnnotation ( ds ) U Annotation ( < all entities in ds > )

Meaning of clustering

Clustering method / alghoritm

Alghoritm annotations, ex.: parameter values

Cluster includes the case of flat set (not tree), and sub-cases:

gene/hybs filtering ( genes have been filtered in from another data

set )

values transformation ( normalization, PCA, average on replicas )

Page 45: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

Annotations/ExamplesTypes of annotations / searches:

Generic

<attribute> LIKE <pattern>

<value> BETWEEN ( <lo>, <hi> )

<author> IS author

Genes

public_id LIKE pattern

REGULATION ( g1, g2, ... gn )

g1 REGULATES | DOWN_REGULATES | UP_REGULATES |

PROMOTE | INHIBITS ( g1, g2, ... gn )

geneX IN_PATHWAY ( p )

Page 46: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

Annotations/ExamplesDataSet

geneSet1 SIMILAR_PROFILE geneSet2 IN_DATASET ds

hybSet1 SIMILAR_PROFILE hybSet2 IN_DATASET ds

Not necessarily computed, annotated.

CORRELATION ( dSet1, dSet2 ... dSetN, value )

annotates the expression values correlation

Page 47: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

Annotations/Time Courses Time profiles may be shifted between or it may happen that when a gene is up another is down.

Page 48: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

Annotations/Time Courses Time profiles may be shifted between or it may happen that when a gene is up another is down.

geneSet1 SIMILAR_TIME_PROFILE geneSet2, a part. case of SIMILAR_PROFILE geneSet1 TIME_AFTER geneSet2 ... geneSet1 TIME_BEFORE geneSet2 ...geneSet1 TIME_SHIFT geneSet2 ...geneSet1 TIME_OPPOSED geneSet2geneSet1 TIME_OPPOSED_SHIFT geneSet2

Page 49: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

Annotations/Comparisons “+/-” graphs. Common graphs of gene interaction that is evident from comparison experiments. Modeled via previous shown constructs.

Page 50: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

Annotations/Set Graphs Observing Eulero-Venn diagrams is very common. Modeled via Aset theory operations

Page 51: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

OperatorsOperations and relations with data

When storing result of operations, result source may be annotated and

annotation composed coherently:

gset = gset1 U gset2

save ( gset, annotation )

gset is saved with:

further annotation provied by user

SOURCE ( UNION, geset1, gset2 )

all annotations coming from gset1 and gset2 belongs to gset too

Page 52: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

OperatorsAset theory operations:

EntityCollection U EntityCollection ... U EntityCollection =

EntityCollection

EntityCollection INTERSECTION EntityCollection ... INTERSECTION

EntityCollection = EntityCollection

EntityCollection - EntityCollection = EntityCollection

Compositions:

new Cluster ( geneSet1, geneSet2, geneSet3 ... geneSetN,

geneSetAnnotation )

new Cluster ( cluster1, cluster2, clusterAnnotation )

Relations on single entitities

gene1 DOWN_REGULATES ( gene2, gene3 ) AUTHOR misterX

Page 53: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

Ongoing and future...

Page 54: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

What's next?Refinements of GCA, study of BASE

Study of Ontologies tools and Ontology reasoners

Better definition of GE Model

Review with biologists

Cooperation with Ontology Groups (proposals are

welcome...)

Page 55: Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005

To be continued...