do not reproduce without permission 1 gerstein.info/talks (c) 2007 1 (c) mark gerstein, 2002, yale,...

30
1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Do not reproduce without permission 1 Gerstein.info/talks (c) 2007 Research @ GersteinLab .org • Human Genome Annotation (pseudogenes) Characterizing the function of non-coding regions, focusing on protein fossils and novel transcriptionally active regions (Pseudogene.org + Tiling.GersteinLab.org) • Molecular Networks Using molecular networks to integrate & mine functional genomics information and describe protein function on a large-scale (Networks.GersteinLab.org) • Macromolecular motions Analyzing select populations of 3D-structures in detail, trying to understand their flexibility in terms of packing (MolMovDB.org)

Upload: brook-bond

Post on 17-Dec-2015

222 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Do not reproduce without permission 1 Gerstein.info/talks (c) 2007 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Research @ GersteinLab.org Human

1

(c

) M

ark

Ge

rste

in,

20

02

, Ya

le,

bio

info

.mb

b.y

ale

.ed

u

Do not reproduce without permission 1 G

ers

tein

.in

fo/t

alk

s

(c)

20

07

Research @ GersteinLab.org

• Human Genome Annotation (pseudogenes) Characterizing the function of non-coding regions, focusing on

protein fossils and novel transcriptionally active regions(Pseudogene.org + Tiling.GersteinLab.org)

• Molecular Networks Using molecular networks to integrate & mine functional

genomics information and describe protein function on a large-scale (Networks.GersteinLab.org)

• Macromolecular motions Analyzing select populations of 3D-structures in detail, trying to

understand their flexibility in terms of packing (MolMovDB.org)

Page 2: Do not reproduce without permission 1 Gerstein.info/talks (c) 2007 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Research @ GersteinLab.org Human

2

(c

) M

ark

Ge

rste

in,

20

02

, Ya

le,

bio

info

.mb

b.y

ale

.ed

u

Do not reproduce without permission 2 G

ers

tein

.in

fo/t

alk

s

(c)

20

07

Research @ GersteinLab.org

• Human Genome Annotation (pseudogenes) Characterizing the function of non-coding regions, focusing on

protein fossils and novel transcriptionally active regions(Pseudogene.org + Tiling.GersteinLab.org)

• Molecular Networks Using molecular networks to integrate & mine functional

genomics information and describe protein function on a large-scale (Networks.GersteinLab.org)

• Macromolecular motions Analyzing select populations of 3D-structures in detail, trying to

understand their flexibility in terms of packing (MolMovDB.org)

Page 3: Do not reproduce without permission 1 Gerstein.info/talks (c) 2007 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Research @ GersteinLab.org Human

3

(c

) M

ark

Ge

rste

in,

20

02

, Ya

le,

bio

info

.mb

b.y

ale

.ed

u

Do not reproduce without permission 3 L

ec

ture

s.G

ers

tein

La

b.o

rg

(c)

20

07

[IHGSC, Nature 409, 2001][Venter et al. Science 29, 2001]

2001: Most of the genome is not coding (only ~1.2% exon). It consists of elements such as repeats, regulatory regions, non-coding RNAs, origins of replication, pseudogenes, segmental duplications....What do these elements do? How should they be annotated?

Page 4: Do not reproduce without permission 1 Gerstein.info/talks (c) 2007 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Research @ GersteinLab.org Human

4

(c

) M

ark

Ge

rste

in,

20

02

, Ya

le,

bio

info

.mb

b.y

ale

.ed

u

Do not reproduce without permission 4 L

ec

ture

s.G

ers

tein

La

b.o

rg

(c)

20

07

[IHGSC, Nature 409, 2001][ENCODE Consortium, Nature 447, 2007]

2007 : Pilot results from ENCODE Consortium on decoding what the bases do - 1% of Genome (30 Mb in 44 regions)- Tiling Arrays to assay Transcription & Binding- Multi-organism sequencing and alignment- Careful Annotation- Variation Data

SnyderWeissman

Miller

Page 5: Do not reproduce without permission 1 Gerstein.info/talks (c) 2007 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Research @ GersteinLab.org Human

5

(c

) M

ark

Ge

rste

in,

20

02

, Ya

le,

bio

info

.mb

b.y

ale

.ed

u

Do not reproduce without permission 5 L

ec

ture

s.G

ers

tein

La

b.o

rg

(c)

20

07

Identifiable Features of a Pseudogene (RPL21)

Gerstein & Zheng. Sci Am 295: 48 (2006).

Page 6: Do not reproduce without permission 1 Gerstein.info/talks (c) 2007 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Research @ GersteinLab.org Human

6

(c

) M

ark

Ge

rste

in,

20

02

, Ya

le,

bio

info

.mb

b.y

ale

.ed

u

Do not reproduce without permission 6 L

ec

ture

s.G

ers

tein

La

b.o

rg

(c)

20

07

Very Different Distribution of Genes and Pseudogenes in

Different Organisms

E. coli, K-12

E. coli, O157

Yeast

Worm

Fly

Mouse

Human

Cress

Rickettsia

Leprosy

Plague

110

241

101

95

160

1,116

2411.1

3.3

4.6

4.6

5.5

12.2

100.3

131

125

2,932

3,272

-25000 -15000 -5000 5000 15000 25000 35000

-25000 -15000 -5000 5000 15000 25000 35000

No. of Genes

No. of Pseudogenes

Genome Size

Number of Genes / Pseudogenes

0 5000 15000 25000 35000Genome Size [log Mb]

E. coli, K-12

E. coli, O157

Yeast

Worm

Fly

Mouse

Human

Cress

Rickettsia

Leprosy

Plague

110

241

101

95

160

1,116

2411.1

3.3

4.6

4.6

5.5

12.2

100.3

131

125

2,932

3,272

-25000 -15000 -5000 5000 15000 25000 35000

-25000 -15000 -5000 5000 15000 25000 35000

No. of Genes

No. of Pseudogenes

Genome Size

Number of Genes / Pseudogenes

0 5000 15000 25000 35000Genome Size [log Mb]

Zhang & Gerstein (2004) Curr Opin Genet Dev 14: 328 + Harrison & Gerstein (2002) J Mol Biol 318: 1155

Page 7: Do not reproduce without permission 1 Gerstein.info/talks (c) 2007 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Research @ GersteinLab.org Human

7

(c

) M

ark

Ge

rste

in,

20

02

, Ya

le,

bio

info

.mb

b.y

ale

.ed

u

Do not reproduce without permission 7 L

ec

ture

s.G

ers

tein

La

b.o

rg

(c)

20

07

Historyof

Pseudogene

Preservation

Absent

Present with Disablement

Present without Disablement

Zheng et al. (2007) Gen. Res.

Based on alignment from ENCODE MSA

group

representative pseudogenes drawn from 201 totalA B C D E F

Page 8: Do not reproduce without permission 1 Gerstein.info/talks (c) 2007 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Research @ GersteinLab.org Human

8

(c

) M

ark

Ge

rste

in,

20

02

, Ya

le,

bio

info

.mb

b.y

ale

.ed

u

Do not reproduce without permission 8 L

ec

ture

s.G

ers

tein

La

b.o

rg

(c)

20

07

Using phastOdd value to examine neutral evolution of pseudogenes

most goodcandidates

for studying

mutational processes

a few non-proc. G under constraint

Zheng et al. (2007) Gen. Res.

Page 9: Do not reproduce without permission 1 Gerstein.info/talks (c) 2007 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Research @ GersteinLab.org Human

9

(c

) M

ark

Ge

rste

in,

20

02

, Ya

le,

bio

info

.mb

b.y

ale

.ed

u

Do not reproduce without permission 9 L

ec

ture

s.G

ers

tein

La

b.o

rg

(c)

20

07

Ex. Pseudogene

Intersecting Transcript-

ional Evidence

Composite ChIPhit

SpecialG

tracks in browser

diTAG

CAGE

TARs

ChIP-chip

Zheng et al. (2007) Gen. Res.

Page 10: Do not reproduce without permission 1 Gerstein.info/talks (c) 2007 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Research @ GersteinLab.org Human

10

(

c)

Ma

rk G

ers

tein

, 2

00

2, Y

ale

, b

ioin

fo.m

bb

.ya

le.e

du

Do not reproduce without permission 10

Le

ctu

res

.Ge

rste

inL

ab

.org

(c

) 2

007

Research @ GersteinLab.org

• Human Genome Annotation (pseudogenes) Characterizing the function of non-coding regions, focusing on protein

fossils and novel transcriptionally active regions(Pseudogene.org + Tiling.GersteinLab.org)

• Molecular Networks Using molecular networks to integrate & mine functional genomics

information and describe protein function on a large-scale (Networks.GersteinLab.org)

• Macromolecular motions Analyzing select populations of 3D-structures in detail, trying to

understand their flexibility in terms of packing (MolMovDB.org)

Page 11: Do not reproduce without permission 1 Gerstein.info/talks (c) 2007 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Research @ GersteinLab.org Human

11

(c

) M

ark

Ge

rste

in,

20

02

, Ya

le,

bio

info

.mb

b.y

ale

.ed

u

Do not reproduce without permission 11

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

7

Toward Systematic Ontologies for Function,

using Networks

All of SCOP entries

1Oxido-

reductases

3Hydrolases

1.1Acting on CH-OH

1.1.1.1 Alcohol dehydrogenase

ENZYME

1.1.1NAD and

NADP acceptor

NON-ENZYME

3.1Acting on

ester bonds

1 Meta-bolism

1.1 Carb.

metab.

3.8 Extracel.

matrix

3.8.2 Extracel.

matrixglyco-protein

1.1.1 Polysach.

metab.

3.8.2.1 Fibro-nectin

General similarity Functional class similarityPrecise functional similarity

3 Cell

structure

1.5Acting on

CH-NH

3.4Acting on

peptide bonds

1.1.1.3Homoserine

dehydrogenase

1.2Nucleotide

metab.

3.1 Nucleus

3.8.2.2Tenascin

1.1.1.1 Glycogenmetab.

1.1.1.2 Starchmetab.

3.1.1.1 Carboxylesterase

3.1.1Carboxylic

ester hydro-lases

3.1.1.8 Cholineesterase

General Networks[Eisenberg et al.]

Hierarchies & DAGs[Enzyme, Bairoch; GO, Ashburner;

MIPS, Mewes, Frishman]

Interaction Vectors [Lan et al, IEEE 90:1848]

Page 12: Do not reproduce without permission 1 Gerstein.info/talks (c) 2007 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Research @ GersteinLab.org Human

12

(

c)

Ma

rk G

ers

tein

, 2

00

2, Y

ale

, b

ioin

fo.m

bb

.ya

le.e

du

Do not reproduce without permission 12

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

7

Networks occupy a midway point in terms of level of understanding

1D: Complete Genetic Partslist

~2D: Bio-molecularNetwork

Wiring Diagram

3D: Detailed structural

understanding of cellular machinery

[Jeong et al. Nature, 41:411][Fleischmann et al., Science, 269 :496]

Page 13: Do not reproduce without permission 1 Gerstein.info/talks (c) 2007 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Research @ GersteinLab.org Human

13

(

c)

Ma

rk G

ers

tein

, 2

00

2, Y

ale

, b

ioin

fo.m

bb

.ya

le.e

du

Do not reproduce without permission 13

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

7

Networks as a universal language

Disease Spread

[Krebs]

ProteinInteractions

[Barabasi] Social Network

Food Web

Neural Network[Cajal]

ElectronicCircuit

Internet[Burch & Cheswick]

Page 14: Do not reproduce without permission 1 Gerstein.info/talks (c) 2007 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Research @ GersteinLab.org Human

14

(

c)

Ma

rk G

ers

tein

, 2

00

2, Y

ale

, b

ioin

fo.m

bb

.ya

le.e

du

Do not reproduce without permission 14

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

7

TopNet – an automated web tool

[Yu et al., NAR (2004); Yip et al. Bioinfo. (2006); Similar tools include Cytoscape.org, Idekar, Sander et al]

(vers. 2 :"TopNet-like

Yale Network Analyzer")

Normal website + Downloaded code (JAVA)+ Web service (SOAP) with Cytoscape plugin

Page 15: Do not reproduce without permission 1 Gerstein.info/talks (c) 2007 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Research @ GersteinLab.org Human

15

(

c)

Ma

rk G

ers

tein

, 2

00

2, Y

ale

, b

ioin

fo.m

bb

.ya

le.e

du

Do not reproduce without permission 15

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

7

Target Genes

Transcription Factors

142 transcription factors

3,420 target genes

7,074 regulatory interactions

From integrating data from Snyder et al... TRANSFAC

Yeast Regulatory Network: a platform for integration

Page 16: Do not reproduce without permission 1 Gerstein.info/talks (c) 2007 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Research @ GersteinLab.org Human

16

(

c)

Ma

rk G

ers

tein

, 2

00

2, Y

ale

, b

ioin

fo.m

bb

.ya

le.e

du

Do not reproduce without permission 16

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

7

Yeast Regulatory Hierarchy: the Middle-managers Rule

[Yu et al., PNAS (2006)]

Page 17: Do not reproduce without permission 1 Gerstein.info/talks (c) 2007 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Research @ GersteinLab.org Human

17

(

c)

Ma

rk G

ers

tein

, 2

00

2, Y

ale

, b

ioin

fo.m

bb

.ya

le.e

du

Do not reproduce without permission 17

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

7

Yeast Network Similar in Structure to Government Hierarchy

with Respect to Middle-managers

Page 18: Do not reproduce without permission 1 Gerstein.info/talks (c) 2007 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Research @ GersteinLab.org Human

18

(

c)

Ma

rk G

ers

tein

, 2

00

2, Y

ale

, b

ioin

fo.m

bb

.ya

le.e

du

Do not reproduce without permission 18

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

7

Characteristics of Regulatory Hierarchy: Middle Managers are Information Flow

Bottlenecks

[Yu et al., PNAS (2006)]

Page 19: Do not reproduce without permission 1 Gerstein.info/talks (c) 2007 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Research @ GersteinLab.org Human

19

(

c)

Ma

rk G

ers

tein

, 2

00

2, Y

ale

, b

ioin

fo.m

bb

.ya

le.e

du

Do not reproduce without permission 19

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

7

Research @ GersteinLab.org

• Human Genome Annotation (pseudogenes) Characterizing the function of non-coding regions, focusing on

protein fossils and novel transcriptionally active regions(Pseudogene.org + Tiling.GersteinLab.org)

• Molecular Networks Using molecular networks to integrate & mine functional

genomics information and describe protein function on a large-scale (Networks.GersteinLab.org)

• Macromolecular motions Analyzing select populations of 3D-structures in detail, trying to

understand their flexibility in terms of packing (MolMovDB.org)

Page 20: Do not reproduce without permission 1 Gerstein.info/talks (c) 2007 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Research @ GersteinLab.org Human

20

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Surveying structural flexibility on a proteomic scale

• Questions How do we describe a wide-range of

structural variability in standard terms? Can we develop simple models to

explain constraints on protein flexibility? What information about flexible hinge

location is encoded in sequence?

Page 21: Do not reproduce without permission 1 Gerstein.info/talks (c) 2007 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Research @ GersteinLab.org Human

21

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

MolMovDB.org

Page 22: Do not reproduce without permission 1 Gerstein.info/talks (c) 2007 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Research @ GersteinLab.org Human

22

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Example "Morph": MBP

2 Known Crystal Structures (endpoints, not necessarily same seq.)

Std. Geometric Stats. (from structure comparison)

Pathway Interpolation

Page 23: Do not reproduce without permission 1 Gerstein.info/talks (c) 2007 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Research @ GersteinLab.org Human

23

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Interdigitating structure of protein interfaces constrains motion

Page 24: Do not reproduce without permission 1 Gerstein.info/talks (c) 2007 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Research @ GersteinLab.org Human

24

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Packing Tools - Voronoi software to calculate packing volumes (geometry.molmovdb.org)

Page 25: Do not reproduce without permission 1 Gerstein.info/talks (c) 2007 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Research @ GersteinLab.org Human

25

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Small Shearing Domain Motions: Molybdenum-binding protein & GAPDH

[Lawson] [Wonacott]

Page 26: Do not reproduce without permission 1 Gerstein.info/talks (c) 2007 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Research @ GersteinLab.org Human

26

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Proteins With Shear Motions are Often Divided into Layers

GAPDH Hexokinase [Steitz]

Page 27: Do not reproduce without permission 1 Gerstein.info/talks (c) 2007 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Research @ GersteinLab.org Human

27

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Transferrin: Interdomain Hinges

[Baker]

Page 28: Do not reproduce without permission 1 Gerstein.info/talks (c) 2007 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Research @ GersteinLab.org Human

28

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Transferrin hinge involves absence of steric constraints (continuously

maintained interfaces), esp. at hinge

Page 29: Do not reproduce without permission 1 Gerstein.info/talks (c) 2007 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Research @ GersteinLab.org Human

29

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

3

Do not reproduce without permission

Proteomics Research @ GersteinLab.org

• Macromolecular motions Analyzing select populations of 3D-structures in detail, trying to understand their

flexibility in terms of packing (MolMovDB.org)

• Molecular Networks Using molecular networks to integrate & mine functional genomics information

and describe protein function on a large-scale (Networks.GersteinLab.org)

• Human Genome Annotation (protein fossils) Characterizing the function of non-coding regions, focusing on protein fossils and

novel transcriptionally active regions(Pseudogene.org + Tiling.GersteinLab.org)

Page 30: Do not reproduce without permission 1 Gerstein.info/talks (c) 2007 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Research @ GersteinLab.org Human

30

(

c)

Ma

rk G

ers

tein

, 2

00

2, Y

ale

, b

ioin

fo.m

bb

.ya

le.e

du

Do not reproduce without permission 30

Ge

rste

in.i

nfo

/ta

lks

(c

) 2

00

7

RNAi:Birth of a Field in

the Literature Culmin-ating in the 2006

Nobel

Source:Gerstein & Douglas.

PLoS Comp. Bio. 3:e80 (2007)

PubNet.GersteinLab.org