mapping transcription mechanisms from multimodal genomic data

21
1 Harvard Medical School Mapping Transcription Mechanisms from Multimodal Genomic Data Hsun-Hsien Chang, Michael McGeachie, and Marco F. Ramoni Children’s Hospital Informatics Program Harvard-MIT Division of Health Sciences and Technology Harvard Medical School March 10, 2010

Upload: deon

Post on 10-Jan-2016

33 views

Category:

Documents


1 download

DESCRIPTION

Mapping Transcription Mechanisms from Multimodal Genomic Data. Hsun-Hsien Chang, Michael McGeachie, and Marco F. Ramoni. Children ’ s Hospital Informatics Program Harvard-MIT Division of Health Sciences and Technology Harvard Medical School March 10, 2010. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Mapping Transcription Mechanisms from Multimodal Genomic Data

1

Harvard Medical School

Mapping Transcription Mechanisms from Multimodal Genomic Data

Hsun-Hsien Chang, Michael McGeachie, and Marco F. Ramoni

Children’s Hospital Informatics ProgramHarvard-MIT Division of Health Sciences and Technology

Harvard Medical School March 10, 2010

Page 2: Mapping Transcription Mechanisms from Multimodal Genomic Data

2

Harvard Medical School

Information Flow in Multimodal Genomic Data

• Genetic Variants– 100k – 1000k SNPs– 250k copy number

variations (CNVs)– 250k methylation

measurements

• Transcripts– 50k mRNA expression levels– 50k microRNA expression levels– 1.5M exon expression / splicing

Information

Information

Page 3: Mapping Transcription Mechanisms from Multimodal Genomic Data

3

Harvard Medical School

Expression Quantitative Trait Loci (eQTLs)• Connection from variant to expression is an

information channel– A DNA locus is modulating the expression level of

a gene = eQTL• Cis(Trans) eQTLs are the genetic variants

located close to (far away) genes.• Identifying cis-eQTLs is easier

– Focusing on cis-eQTL reduces search space– trans eQTLs?

Page 4: Mapping Transcription Mechanisms from Multimodal Genomic Data

4

Harvard Medical School

• Cancer: based on genetic modification (variants) and cellular malfunction (gene expression)

• Identification of eQTLs helps understand molecular mechanisms in cancer and provides biological insight.

• Clinical study of Acute lymphoblastic leukemia (ALL)– The most common malignancy in children, nearly one third of all

pediatric cancers.– A few cases are associated with inherited genetic syndromes (i.e., Down

syndrome, Bloom syndrome, Fanconi anemia), but the cause remains unknown.

• Data– 29 patients.– Genotyped 100,000 SNPs (Affymetrix Human Mapping 100K).– Profiled 50,000 gene expressions (Affymetrix HG-U133 Plus 2.0).

Clinical Study on Pediatric Leukemia

Page 5: Mapping Transcription Mechanisms from Multimodal Genomic Data

5

Harvard Medical School

Challenges in Finding eQTLs

• Compare the distribution of each Variant to the levels of each expression measurement– Computational

• All pairs of variants vs. expressions is costly• Usually discretize expression levels (Pensa et al., BioKDD, 2004)

– Multiple testing considerations• Understanding

– Too many associations to test via laboratory science• Computational methods of biological discovery• Want to summarize main informational (biological) pathways

• Answer: Use transcriptional information

Page 6: Mapping Transcription Mechanisms from Multimodal Genomic Data

6

Harvard Medical School

Transcriptional Information Channel

X Y

SNPs are modeled as binomial variables.

Expressions are modeled as log-normal variables.

• Mutual Information quantifies information flow:

• Higher MI is achieved by larger σ2 and smaller σk2 , i.e., when expression level Y is more likely modulated by SNP X.

Transcription Channel

• Info Theory:measures Entropy,H(X)

Page 7: Mapping Transcription Mechanisms from Multimodal Genomic Data

7

Harvard Medical School

• Transcript Y is modulated by SNP X:

• Transcript Y is independent of SNP X:

Page 8: Mapping Transcription Mechanisms from Multimodal Genomic Data

8

Harvard Medical School

Transcriptional Information Map

X1 Y1

Y2

X4 Y4

X5 Y5

X6

X7 Y7

Y8

X9 Y9

X8

Y3

Y6

Page 9: Mapping Transcription Mechanisms from Multimodal Genomic Data

9

Harvard Medical School

ALL Transcriptional Information Map of Chr21

Page 10: Mapping Transcription Mechanisms from Multimodal Genomic Data

10

Harvard Medical School

Cluster Genes and SNPs into Networks

X1 Y1

X2 Y2

X3

X4 Y4

X5 Y5

X6

X7 Y7

Y8

X9 Y9

X8

Y3

Y6

Page 11: Mapping Transcription Mechanisms from Multimodal Genomic Data

11

Harvard Medical School

X1 Y1

Y2

X3

X4

Y9

X8

Cluster Genes and SNPs into Networks

• We can further infer the optimal modulation patterns using Bayesian networks.

Page 12: Mapping Transcription Mechanisms from Multimodal Genomic Data

12

Harvard Medical School

• Bayesian networks are directed acyclic graphs: – Nodes correspond to random variables.– Directed arcs encode conditional probabilities of the target nodes on the source nodes.

– p(X) depends on (A,B)– p(Z|X,Y) independent of (A,B)

Bayesian Networks

A

B

X

Y Z

Page 13: Mapping Transcription Mechanisms from Multimodal Genomic Data

13

Harvard Medical School

Infer Bayesian Networks in Individual Clusters

Y1

Y2

Y9• Step 1: Use TIM as the initial network.• Step 2: Bayesian network infers SNP-SNP connections.

Page 14: Mapping Transcription Mechanisms from Multimodal Genomic Data

14

Harvard Medical School

A Bayesian Network Inferred from Chr21 TIM

Page 15: Mapping Transcription Mechanisms from Multimodal Genomic Data

15

Harvard Medical School

Information Theoretic Network Analysis

• Find hubs, motifs, guilds, etc.– Abstract edges– Global patterns -> local patterns– Reveal emergent properties– Information theoretic approach using Data

Compression

• Alterovitz G, and Ramoni MF, “Discovering biological guilds through topological abstraction,” AMIA Annu Symp Proc, pp. 1-5, 2006.

Page 16: Mapping Transcription Mechanisms from Multimodal Genomic Data

16

Harvard Medical School

Identified Fundamental Components

Reference: Alterovitz and Ramoni, AMIA Annu Symp Proc, pp. 1-5, 2006.

Page 17: Mapping Transcription Mechanisms from Multimodal Genomic Data

17

Harvard Medical School

Identification of Cis- and Trans eQTL

• RIPK4, 21q22.3– Related to Downs

Syndrome– RIPK4 has 5

(trans) SNPs in q11.2 (shown as blue in the figure) affecting its expression.

RIPK4

Page 18: Mapping Transcription Mechanisms from Multimodal Genomic Data

18

Harvard Medical School

Identification of Cis and Trans eQTL• CYYR1, 21q21.1

– Recently discovered. – Encodes a cysteine and

tyrosine-rich protein.– Recent study found a

correlation with neuroendocrine tumors.

– TIM shows CYYR1 modulated by SNPs across the q arm of chromosome 21.

– DSCAM related to Down’s syndrome

– DSCAM-CYYR1 interaction leads to ALL?

DSCAM

Page 19: Mapping Transcription Mechanisms from Multimodal Genomic Data

19

Harvard Medical School

Complete TIM Algorithm

Infer Network in Individual

Clusters

Cluster 1

Cluster N

...

...

...

...

...

Compute Transcriptional

Information

...

...

...

...

Genetic Variant Transcript

Group Linked SNPs and Transcripts

Cluster 1

Cluster N

. . .

Network Topology

Analysis and Summary

Page 20: Mapping Transcription Mechanisms from Multimodal Genomic Data

20

Harvard Medical School

Transcriptional Information Maps

• Make large multimodal genetic dataset amenable to transcriptional analysis

• Identifies– Modulation patterns between genetic variants

and transcripts.– CIS and TRANS eQTL.

• Analysis of pediatric ALL helps identify biological hypotheses regarding connection to Down’s syndrome

Page 21: Mapping Transcription Mechanisms from Multimodal Genomic Data

21

Harvard Medical School

Questions?Thanks to

Prof. Marco F. Ramoni, Dr. Hsun-Hsien Chang, Dr. Gil Alterowitz, Children’s

Hospital Informatics Program, Brigham and Women’s Hospital