marco brandizi corso di dott. in informatica, univ. milano bicocca xix ciclo progress report feb...

Post on 29-Jan-2016

217 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Marco Brandizi Marco Brandizi

Corso di Dott. in Informatica, Univ. Milano BicoccaCorso di Dott. in Informatica, Univ. Milano Bicocca

XIX CicloXIX Ciclo

Progress ReportProgress Report

Feb 2005Feb 2005

Agenda

Microarrays and Gene Expression overview

A Knowledge Managment System for uA data

management

Motivations

What to model and where to start from

First elaborations

Ongoing work and future

Gene Expression and Microarrays

DNA

gene

mRNA

protein

Genes Machine

Cell/Life

Microarray Data / Details

Microarray Data

Microarrays Data Mgmt IssuesExp. data vs. seq. data:

Context dependent (living system, exp. Conditions)

Lack of standard unit of measure

Several normalizations methods

Multiple platforms and methods

No standard for data annotation

Vocabularies and terminology coherence

Details about: experiment, source, protocols, exp.

conditions

Microarrays Data Mgmt Issues / 2Evidences about data quality

What to store?

Raw Images

Computed values

Normalized values

How to find data

Complex vocabularies aware systems (ontologies)

Data mining and exp. comparison tools

Data access control

Issues => MIAME/MAGE

MIAME Experiment Modelling

GCA DB

GCA DB

GCA DB

Need of a KMS for uA data management

The uA Experiment Cycle

“Closing the loop”

“Closing the loop”

uA KMS: What to model?

Knowledge management... what?Genes

Textual annotations, literature

Interactions, pathways.

Genes collections (functional families, clusters)

Experiment and Experimental Conditions

Keyword/ontology based searches

Tested conditions searches

Expression Values

Navigation

Same trascriptome/trend/correlation/pattern

Knowledge management... what?Chips

Keyword searches

Annotations about chip quality, protocols to be used, etc.

People

“Is expert in ...”

“Works with ...”

“Is studing ...”

Its ranking is X (based on publications, user preferences, etc.

Knowledge management... what?

“Does IL-2 regulate something and under what conditions?”

Interactions of gene: IL2

Note added by Norman on Dec 10, 2003, deduced by text, not confirmed [see original text]: IL2 --------> UPREGULATES -----------> IL10, IL12 | ON --> Cellls --> Type: DC UNDER CONDITION: LPS Stimulus

Knowledge management... what?

Knowledge management... what?

Knowledge management... what?

uA KMS: Where to start from?

What to do first?Gene Expression Formal Model

Focused on GE measures

Oriented to “closing the loop” goal

Several things to start from

Ontologies and Inference Systems

Already defined alike models

Other alike systems

Defining a GE ModelStart Point: Ontologies and Inference Systems

XML->RDF->...->OWL, and related tools (ex.:

Protegé, Racer, Jena)

Logics, particularly Description Logic

Inferential Systems and Languages (ex.: Prolog)

Defining a GE ModelStart Point: Already defined alike models

"Modeling Gene Expression", Proceedings of NETTAB/2004,

www.loa-cnr.it, A model in Description Logic of GE, but

without focus on microarrays and expression intentsities

Defining a GE ModelStart Point: Already defined alike models

Very similar to previous work, but with tools for

annotation/querying of microarray chips

Yet, seems not focused on data/assays/etc. annotation.

Defining GE ModelStart point: Other alike systems

Synapsia by Agilent, very similar, but not focused on uAs

Hybrow, www.hybrow.org, a computer-aided hypothesis

evaluation

The Notebook Project, www.notebook.org, a bio-KMS based

on SOAP and P2P

2004, Sarini, M., Blanzieri, E., Giorgini, P., and Moser, C.,

From actions to suggestions: supporting the work of biologists

through laboratory notebooks

Defining GE ModelStart point: Other alike systems

uA KMS: Toward a GE Model

Defining GE ModelGene Expression Formal Model

Basic elements: genes, hybridizations, experiments

Defining GE ModelGene Expression Formal Model

Basic elements: annotated sets

Defining GE ModelGene Expression Formal Model

Basic elements: annotated sets

Gene Expression Entities

Entities Grouping

EntitityCollection ::= Cluster of DataSet | Cluster of Entity

Cluster of DataSet ::=

Cluster of DataSet

| GeneCluster of DataSet.GeneSet

| HybCluster of DataSet.HybSet

Cluster of Entity :: =

Cluster of Entity

| Set of Entity

All entities in a cluster are of same type. Ex. A cluster of genes contains hierarchically grouped sets of genes, only genes. NOTE: Grammar here used is VERY informal!!!

Entities GroupingGeneSet ::=

Set of Gene

HybSet ::= Set of Hybridization

Set of X ::= { x : x IS-A X }

Singleton ( C ) ::= { S : S = Set of C AND #S = 1 }

Annotations

Annotation ::= EntityCollection => AnnotationSet

Annotation allows to track Gene Expression data with useful info.

Annotations/Basics

Annotation ( EmptySet ) ::= EmptySet

Annotation ( Singleton ( Entity e ) ) ::= Attributes ( e ) U BaseAnnotation ( e )

Annotations/BasicsBaseAnnotation ( Any ) ::=

To be decided, first ideas is a set of:

Name/Value/Type, and Description like in MAGE

External Reference, with URI, or attachment

Graph attachment, "vectoring" values, ex: PCA with components values,

scatter plots wit

Annotation Author

Annotation Date

Security/Access references

Alike the classes Extendeable, Describable, Identifiable of MAGE-OM

Entity annotates another Entity, ex.: Exp author

Annotations/BasicsAttributes ( Entity e ) ::=

Set of < attrib, value, type > for each declared attribute of Entity

attribute may be declared in JavaBean fashion, optionally providing a

mapping for type and semantic of attribute

Annotation ( GeneSet GS ) ::=

BaseAnnotation ( GS )

U Annotation ( g ) : g BELONGS GS

U BiologicalAnnotation ( GS )

Annotations/Biological Ann.BiologicalAnnotation ( GS ) ::=

Allows for tagging the gene set with a biological meaning the genes

have ben grouped why

Ex.:

belonging to functional family of apoptosis

in the KEGG pathway about IL-2

under GO ID #10234

Annotations/Data SetsAnnotation ( Cluster of DataSet ds ) ::=

BaseAnnotation ( ds ) U Annotation ( < all entities in ds > )

Meaning of clustering

Clustering method / alghoritm

Alghoritm annotations, ex.: parameter values

Cluster includes the case of flat set (not tree), and sub-cases:

gene/hybs filtering ( genes have been filtered in from another data

set )

values transformation ( normalization, PCA, average on replicas )

Annotations/ExamplesTypes of annotations / searches:

Generic

<attribute> LIKE <pattern>

<value> BETWEEN ( <lo>, <hi> )

<author> IS author

Genes

public_id LIKE pattern

REGULATION ( g1, g2, ... gn )

g1 REGULATES | DOWN_REGULATES | UP_REGULATES |

PROMOTE | INHIBITS ( g1, g2, ... gn )

geneX IN_PATHWAY ( p )

Annotations/ExamplesDataSet

geneSet1 SIMILAR_PROFILE geneSet2 IN_DATASET ds

hybSet1 SIMILAR_PROFILE hybSet2 IN_DATASET ds

Not necessarily computed, annotated.

CORRELATION ( dSet1, dSet2 ... dSetN, value )

annotates the expression values correlation

Annotations/Time Courses Time profiles may be shifted between or it may happen that when a gene is up another is down.

Annotations/Time Courses Time profiles may be shifted between or it may happen that when a gene is up another is down.

geneSet1 SIMILAR_TIME_PROFILE geneSet2, a part. case of SIMILAR_PROFILE geneSet1 TIME_AFTER geneSet2 ... geneSet1 TIME_BEFORE geneSet2 ...geneSet1 TIME_SHIFT geneSet2 ...geneSet1 TIME_OPPOSED geneSet2geneSet1 TIME_OPPOSED_SHIFT geneSet2

Annotations/Comparisons “+/-” graphs. Common graphs of gene interaction that is evident from comparison experiments. Modeled via previous shown constructs.

Annotations/Set Graphs Observing Eulero-Venn diagrams is very common. Modeled via Aset theory operations

OperatorsOperations and relations with data

When storing result of operations, result source may be annotated and

annotation composed coherently:

gset = gset1 U gset2

save ( gset, annotation )

gset is saved with:

further annotation provied by user

SOURCE ( UNION, geset1, gset2 )

all annotations coming from gset1 and gset2 belongs to gset too

OperatorsAset theory operations:

EntityCollection U EntityCollection ... U EntityCollection =

EntityCollection

EntityCollection INTERSECTION EntityCollection ... INTERSECTION

EntityCollection = EntityCollection

EntityCollection - EntityCollection = EntityCollection

Compositions:

new Cluster ( geneSet1, geneSet2, geneSet3 ... geneSetN,

geneSetAnnotation )

new Cluster ( cluster1, cluster2, clusterAnnotation )

Relations on single entitities

gene1 DOWN_REGULATES ( gene2, gene3 ) AUTHOR misterX

Ongoing and future...

What's next?Refinements of GCA, study of BASE

Study of Ontologies tools and Ontology reasoners

Better definition of GE Model

Review with biologists

Cooperation with Ontology Groups (proposals are

welcome...)

To be continued...

top related