marco brandizi corso di dott. in informatica, univ. milano bicocca xix ciclo progress report

55
Marco Brandizi Marco Brandizi Corso di Dott. in Informatica, Univ. Corso di Dott. in Informatica, Univ. Milano Bicocca Milano Bicocca XIX Ciclo XIX Ciclo Progress Report Progress Report Feb 2005 Feb 2005

Upload: kendra

Post on 19-Mar-2016

22 views

Category:

Documents


0 download

DESCRIPTION

Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005. Agenda. Microarrays and Gene Expression overview A Knowledge Managment System for uA data management Motivations What to model and where to start from First elaborations - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

Marco Brandizi Marco Brandizi

Corso di Dott. in Informatica, Univ. Milano BicoccaCorso di Dott. in Informatica, Univ. Milano Bicocca

XIX CicloXIX Ciclo

Progress ReportProgress Report

Feb 2005Feb 2005

Page 2: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

AgendaMicroarrays and Gene Expression overview

A Knowledge Managment System for uA data management

Motivations

What to model and where to start from

First elaborations

Ongoing work and future

Page 3: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

Gene Expression and Microarrays

Page 4: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

DNA

gene

mRNA

protein

Genes Machine

Cell/Life

Page 5: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

Microarray Data / Details

Page 6: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

Microarray Data

Page 7: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

Microarrays Data Mgmt IssuesExp. data vs. seq. data:

Context dependent (living system, exp. Conditions)

Lack of standard unit of measure

Several normalizations methods

Multiple platforms and methods

No standard for data annotation

Vocabularies and terminology coherence

Details about: experiment, source, protocols, exp. conditions

Page 8: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

Microarrays Data Mgmt Issues / 2Evidences about data quality

What to store?

Raw Images

Computed values

Normalized values

How to find data

Complex vocabularies aware systems (ontologies)

Data mining and exp. comparison tools

Data access control

Page 9: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

Issues => MIAME/MAGE

Page 10: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

MIAME Experiment Modelling

Page 11: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

GCA DB

Page 12: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

GCA DB

Page 13: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

GCA DB

Page 14: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

Need of a KMS for uA data management

Page 15: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

The uA Experiment Cycle

Page 16: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

“Closing the loop”

Page 17: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

“Closing the loop”

Page 18: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

uA KMS: What to model?

Page 19: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

Knowledge management... what?Genes

Textual annotations, literature

Interactions, pathways.

Genes collections (functional families, clusters)

Experiment and Experimental Conditions

Keyword/ontology based searches

Tested conditions searches

Expression Values

Navigation

Same trascriptome/trend/correlation/pattern

Page 20: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

Knowledge management... what?Chips

Keyword searches

Annotations about chip quality, protocols to be used, etc.

People

“Is expert in ...”

“Works with ...”

“Is studing ...”

Its ranking is X (based on publications, user preferences, etc.

Page 21: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

Knowledge management... what?“Does IL-2 regulate something and under what conditions?”

Interactions of gene: IL2

Note added by Norman on Dec 10, 2003, deduced by text, not confirmed [see original text]: IL2 --------> UPREGULATES -----------> IL10, IL12 | ON --> Cellls --> Type: DC UNDER CONDITION: LPS Stimulus

Page 22: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

Knowledge management... what?

Page 23: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

Knowledge management... what?

Page 24: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

Knowledge management... what?

Page 25: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

uA KMS: Where to start from?

Page 26: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

What to do first?Gene Expression Formal Model

Focused on GE measures

Oriented to “closing the loop” goal

Several things to start from

Ontologies and Inference Systems

Already defined alike models

Other alike systems

Page 27: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

Defining a GE ModelStart Point: Ontologies and Inference Systems

XML->RDF->...->OWL, and related tools (ex.: Protegé, Racer, Jena)

Logics, particularly Description Logic

Inferential Systems and Languages (ex.: Prolog)

Page 28: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

Defining a GE ModelStart Point: Already defined alike models

"Modeling Gene Expression", Proceedings of NETTAB/2004, www.loa-cnr.it, A model in Description Logic of GE, but without focus on microarrays and expression intentsities

Page 29: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

Defining a GE ModelStart Point: Already defined alike models

Very similar to previous work, but with tools for annotation/querying of microarray chips

Yet, seems not focused on data/assays/etc. annotation.

Page 30: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

Defining GE ModelStart point: Other alike systems

Synapsia by Agilent, very similar, but not focused on uAs

Hybrow, www.hybrow.org, a computer-aided hypothesis evaluation

The Notebook Project, www.notebook.org, a bio-KMS based on SOAP and P2P

2004, Sarini, M., Blanzieri, E., Giorgini, P., and Moser, C., From actions to suggestions: supporting the work of biologists through laboratory notebooks

Page 31: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

Defining GE ModelStart point: Other alike systems

Page 32: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

uA KMS: Toward a GE Model

Page 33: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

Defining GE ModelGene Expression Formal Model

Basic elements: genes, hybridizations, experiments

Page 34: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

Defining GE ModelGene Expression Formal Model

Basic elements: annotated sets

Page 35: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

Defining GE ModelGene Expression Formal Model

Basic elements: annotated sets

Page 36: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

Gene Expression Entities

Page 37: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

Entities Grouping

EntitityCollection ::= Cluster of DataSet | Cluster of Entity

Cluster of DataSet ::= Cluster of DataSet | GeneCluster of DataSet.GeneSet | HybCluster of DataSet.HybSet

Cluster of Entity :: = Cluster of Entity | Set of Entity

All entities in a cluster are of same type. Ex. A cluster of genes contains hierarchically grouped sets of genes, only genes. NOTE: Grammar here used is VERY informal!!!

Page 38: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

Entities GroupingGeneSet ::=

Set of Gene

HybSet ::= Set of Hybridization

Set of X ::= { x : x IS-A X }

Singleton ( C ) ::= { S : S = Set of C AND #S = 1 }

Page 39: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

Annotations

Annotation ::= EntityCollection => AnnotationSet

Annotation allows to track Gene Expression data with useful info.

Page 40: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

Annotations/BasicsAnnotation ( EmptySet ) ::=

EmptySet

Annotation ( Singleton ( Entity e ) ) ::= Attributes ( e ) U BaseAnnotation ( e )

Page 41: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

Annotations/BasicsBaseAnnotation ( Any ) ::=

To be decided, first ideas is a set of: Name/Value/Type, and Description like in MAGE External Reference, with URI, or attachmentGraph attachment, "vectoring" values, ex: PCA with components values, scatter plots witAnnotation AuthorAnnotation DateSecurity/Access referencesAlike the classes Extendeable, Describable, Identifiable of MAGE-OMEntity annotates another Entity, ex.: Exp author

Page 42: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

Annotations/BasicsAttributes ( Entity e ) ::=

Set of < attrib, value, type > for each declared attribute of Entityattribute may be declared in JavaBean fashion, optionally providing a mapping for type and semantic of attribute

Annotation ( GeneSet GS ) ::=BaseAnnotation ( GS )U Annotation ( g ) : g BELONGS GS U BiologicalAnnotation ( GS )

Page 43: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

Annotations/Biological Ann.BiologicalAnnotation ( GS ) ::=

Allows for tagging the gene set with a biological meaning the genes have ben grouped whyEx.:

belonging to functional family of apoptosisin the KEGG pathway about IL-2under GO ID #10234

Page 44: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

Annotations/Data SetsAnnotation ( Cluster of DataSet ds ) ::=

BaseAnnotation ( ds ) U Annotation ( < all entities in ds > ) Meaning of clustering

Clustering method / alghoritmAlghoritm annotations, ex.: parameter values

Cluster includes the case of flat set (not tree), and sub-cases: gene/hybs filtering ( genes have been filtered in from another data set ) values transformation ( normalization, PCA, average on replicas )

Page 45: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

Annotations/ExamplesTypes of annotations / searches:

Generic <attribute> LIKE <pattern><value> BETWEEN ( <lo>, <hi> ) <author> IS author

Genes public_id LIKE patternREGULATION ( g1, g2, ... gn ) g1 REGULATES | DOWN_REGULATES | UP_REGULATES | PROMOTE | INHIBITS ( g1, g2, ... gn ) geneX IN_PATHWAY ( p )

Page 46: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

Annotations/ExamplesDataSet

geneSet1 SIMILAR_PROFILE geneSet2 IN_DATASET dshybSet1 SIMILAR_PROFILE hybSet2 IN_DATASET ds Not necessarily computed, annotated. CORRELATION ( dSet1, dSet2 ... dSetN, value )

annotates the expression values correlation

Page 47: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

Annotations/Time Courses Time profiles may be shifted between or it may happen that when a gene is up another is down.

Page 48: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

Annotations/Time Courses Time profiles may be shifted between or it may happen that when a gene is up another is down.

geneSet1 SIMILAR_TIME_PROFILE geneSet2, a part. case of SIMILAR_PROFILE geneSet1 TIME_AFTER geneSet2 ... geneSet1 TIME_BEFORE geneSet2 ...geneSet1 TIME_SHIFT geneSet2 ...geneSet1 TIME_OPPOSED geneSet2geneSet1 TIME_OPPOSED_SHIFT geneSet2

Page 49: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

Annotations/Comparisons “+/-” graphs. Common graphs of gene interaction that is evident from comparison experiments. Modeled via previous shown constructs.

Page 50: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

Annotations/Set Graphs Observing Eulero-Venn diagrams is very common. Modeled via Aset theory operations

Page 51: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

OperatorsOperations and relations with data

When storing result of operations, result source may be annotated and annotation composed coherently:

gset = gset1 U gset2 save ( gset, annotation ) gset is saved with:

further annotation provied by userSOURCE ( UNION, geset1, gset2 ) all annotations coming from gset1 and gset2 belongs to gset too

Page 52: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

OperatorsAset theory operations:

EntityCollection U EntityCollection ... U EntityCollection = EntityCollectionEntityCollection INTERSECTION EntityCollection ... INTERSECTION EntityCollection = EntityCollectionEntityCollection - EntityCollection = EntityCollection

Compositions: new Cluster ( geneSet1, geneSet2, geneSet3 ... geneSetN, geneSetAnnotation )new Cluster ( cluster1, cluster2, clusterAnnotation )

Relations on single entititiesgene1 DOWN_REGULATES ( gene2, gene3 ) AUTHOR misterX

Page 53: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

Ongoing and future...

Page 54: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

What's next?Refinements of GCA, study of BASE

Study of Ontologies tools and Ontology reasoners

Better definition of GE Model

Review with biologists

Cooperation with Ontology Groups (proposals are welcome...)

Page 55: Marco Brandizi      Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

To be continued...