“software” of life. genomes to function lessons from genome projects most genes have no known...

Post on 14-Dec-2015

222 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Genomes to function

Lessons from genome projects

• Most genes have no known function

• Most genes w/ known function assigned from sequence-similarity matches to other organisms

• Need methods to experimentally assay gene activity on a genome-wide scale

Condition 1 RNA

Condition 2 RNA

gene enriched in condition 1

gene enrichedin condition 2

17,997 genes94% of genome

Measure expression on genome-wide scale: DNA Microarrays

Global Analyses of Gene Expression

• Collect all microarrays from the world

• Gene activity across thousands of conditions

conditions(~5k)genes

(20k)

Digital Age of Biology

• Biologists drowning in data

• Bottleneck now is developing computational resources for discovery

• Think Genbank before BLAST...

Discovering Gene Function on a Global Scale

• Gene Networks

• Search Engines

MattWeirauch

CoreyPowell Chad

Chen

CharlieVaske Alex

WilliamsMartinaKoeva

Gene Networks

Gene Networks

• link 2 genes together if they are co-activated in multiple organisms

• build networks from all the links

• discover function from a gene’s links

• understand bigger picture of gene regulation

Principle #1

Gene networks are “scale free”

• Scale free – gene networks may arise from processes like expansion of WWW

some links on the WWW

Principle #2

Genes self assemble into modular subcomponents

http://www.cse.ucsc.edu/~jstuart/multispecies

Principle #2

Genes self assemble into modular subcomponents

0

10

20

30

40

50

605 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

105

110

115

Core Size

Per

cen

t o

f C

ore

s

Network

Random

Principle #3

Coordinated activity is a signature of gene function

proliferation

transcription

ribosomebiogenesis

ribosomalsubunits

respirationprotein modification

secretion

fatty acidmetab.tissue growth

neuronal

immune response

development /hox genes

cell polarity,cell structure

Newly evolved

Proteasome “module”

http://www.cse.ucsc.edu/~jstuart/multispecies

integrator subunits

Principle #4

Local network topology reports on gene function

top 3 integrators:

Integrators have more cis-regulatory complexity

integrators subunits

integrators have different phenotypes

0

10

20

30

40

50

60

70

80

90

WT UNC LVA STP RUP EMB PCH GRO Other

gen

es (%

)

integrators

subunits

Current Directions for Gene Networks

• Gene isoform networks to capture alternative splicing

• Predict drug targets from synthetic lethal nets

Current Directions for Gene Networks

• Gene isoform networks to capture alternative splicing

• Predict drug targets from synthetic lethal nets (w/ Lokey Lab)

MattWeirauch

CoreyPowell Chad

Chen

CharlieVaske Alex

WilliamsMartinaKoeva

Gene Isoform Networks

Gene Isoform Networks

• Most human genes (>60%) are alternatively spliced.

• Alternative splicing gives rise to different proteins from the same gene

• The particular variant expressed can be very important (e.g. sex determination in flies)

• The functional implications of alt. splicing in humans is still largely unexplored.

• Provides a higher resolution understanding of gene expression and its relationship to health & disease

Splicing Microarrays

• Measure particular subparts of the gene structure (e.g. exon-exon junctions)

• Data now available for human and mouse tissue compendiums

• Infer isoforms from expression of subparts across the tissues

• Identify isoform modules

A functional network of gene isoformsisoform patterns isoform network

• assemble into modules

• functional signatures

• global network design

MattWeirauch

CoreyPowell Chad

Chen

CharlieVaske Alex

WilliamsMartinaKoeva

Search Engines

Search engines to discover gene function

identify every member of a pathwayRetinoblastoma pathway

(slidefrom

Art Owen)

gene recommender

query

search for regulating conditions

gene recommendersearch for regulating conditions

query

query

gene recommender

searchfor new

candidates

regulating conditions

query +“hits”

gene recommender

regulating conditions

Rbhda-1lin-36rba-2lin-9

queryScore

experiments

1 Score genes

2

gene recommender procedure

dpl-1rba-2K12D12.1RbR06C7.8hda-1B0464.6R06F6.1T16G12.5F55A3.7plk-1lin-9lin-36

hits

computational validation

Score experiments

1 Score genes

2hda-1lin-36rba-2lin-9

query

(no Rb)

1. rba-22. lin-93. dpl-14. R06C7.85. hda-16. B0464.67. R06F6.18. K12D12.19. T16G12.510. F55A3.711. plk-112. Rb13. lin-36

hits

Searching 1 organism

0

50

100

150

200

250

300

Riboso

me*

Calci

um C

hannel

s

Glyco

lysi

s*

Elect

ron T

ransp

ort*

Prote

asom

e*

tRNA S

ynth

etas

es

Fatty

Aci

d Deg

*

TCA Cyc

le

Transl

atio

n Fac

tors

Cell c

ycle

Cholest

erol*

Collagen

Pre

cisi

on

at

50%

Rec

all

BacteriaYeastPlantWormFlyHuman

H.sapquery

Ecdy hits

Anim hits

Opishits

Euk hits

Cell hits

OrthologMap

Ecdy

Opis

Euk

Anim

Cell

H.saphits

H.sap

A.tha hits

H.pyl hits

S.cer hits

C.ele hits

D.mel hits

D.mel

C.ele

S.cer

A.tha

H.pyl

Multiple SpeciesSearch Engine

Orthology Map

cdk-4mcm-5mcm-7n/apcn-1hda-1…

Cdk4Mcm5Mcm7E2fMus209Rpd3…

C.ele

D.mel

MCM3 (8)MCM6 (9)

MCM5 (28)HDAC1 (69)RBBP4 (86)RPA1 (428)

BUB1 (1866)...

GR

H.sap hits

CDK4MCM5MCM7E2F1PCNAHDAC1…

H.sapcell cycle

query

Anim

Ecdy

MCM3* (1) MCM6* (2)HDAC1* (3)MCM5* (4)RBBP4 (5)

...

Animhits

MCM6* (1)BUB1* (2)

HDAC1* (3)MCM3* (4)

RPA1 (5)...

Ecdyhits

H.sap

Hdac1Bub1

Mcm6Rpa1

Mcm3...

mcm-3rpa-1

mcm-6bub-1rba-2hda-1

...

H.sap BTPsof C.ele hits

GR

H.sap BTPsof D.mel hits

GR

HDAC1 (3)BUB1 (21)MCM6 (26)RPA1 (48)MCM3 (60)

...

MCM3 (6)RPA1 (9)

MCM6 (15)BUB1 (24)

RBBP4 (25)HDAC1 (114)

...

Related genes sort to the top of the search lists

Multiple species search is more precise

Multiple species search is more precise

immunological synapse

Gene product Comment

CD8 antigen query

unknown tyrosine kinase lymphocyte specific

T-cell receptor zeta query

CD2 antigen participates in T-cell activation

CD4 antigen (p55) query

unknown Src-like adaptor

negative regulator of T-cell receptor signaling

CD8 antigen query

unknown transcription factor T-cell specific

paired box gene 8 (PAX8) new association

17

34

2

11

4

21

28

14

42

12

36

26

24 23

7

1

5

22

3

15571

15572

Search Engine Directions

• Search gene networks for pathway members– Incorporate multiple data sources in search

– Faster than scanning raw data

• Discriminative search engines– E.g. identify genes coregulated with DNA damage genes

more so than S-phase genes

Search Engine Directions

• Network Recommender– Search gene networks for pathway members

– Incorporate multiple data sources in search

– Faster than scanning raw data

• Discriminative search engines– E.g. identify genes coregulated with DNA damage genes

more so than S-phase genes

MattWeirauch

CoreyPowell Chad

Chen

CharlieVaske Alex

WilliamsMartinaKoeva

Network Recommender

Network Recommendercoexpression

synthetic lethal

physical protein interactions

Iterative Propagation Algorithm

1. Given a set of genes in a pathway A2. Score gene g based on how connected to

predicted pathway members in network i• Si(g) = hwighp(h) / hwigh, • where h ranges over neighbors of g in network i

3. Compute posterior each gene g in pathway• Construct a positive distribution P(Si(g)| g in A)• Construct a negative distribution P(Si(g)| g not in A)

4. Set p(g) = ∏i P(g in A | Si(g))

Network Recommender Performance

recall

prec

isio

n

Network Recommender Results

Network Recommender for cell cycle

- physical proteininteraction

- gene coexpression

Supplemental Material

05

101520253035404550

Pe

rce

nt

Inte

rac

tio

ns

1 3 5 7 9 11 13 nopathnetwork distance

% Synth Leth

% Background

Genetic interactions

top related