functional mapping a statistical model for mapping dynamic genes

51
Functional Mapping Functional Mapping A statistical model for mapping dynamic A statistical model for mapping dynamic genes genes

Upload: dylan-colding

Post on 19-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Functional Mapping A statistical model for mapping dynamic genes

Functional MappingFunctional MappingA statistical model for mapping dynamic genesA statistical model for mapping dynamic genes

Page 2: Functional Mapping A statistical model for mapping dynamic genes

Simple regression model for univariate trait

Phenotype = Genotype + Error yi = xij + ei xi is the indicator for QTL genotype

j is the mean for genotype jei ~ N(0, 2)

Recall: Interval mapping for a univariate trait

! QTL genotype is unobservable (missing data)

Page 3: Functional Mapping A statistical model for mapping dynamic genes

A simulation example (F2)

Trait distribution

Trait value

Fre

quency

0 5 10 15

0100

200

300

400

500

qq

Qq

QQ

Overall trait distribution

The overall trait distribution is composed of three distributions, each one coming fromone of the three QTL genotypes, QQ, Qq, and qq.

Page 4: Functional Mapping A statistical model for mapping dynamic genes

Solution: consider a finite mixture model

m-a m+d m+a

QQ

Qq

qq

Trait

m-a m+d m+a

QQ

Qq

qq

Trait

With QQ=m+a, Qq=m+d, qq=m-a

Page 5: Functional Mapping A statistical model for mapping dynamic genes

We use finite mixture model forestimating genotypic effects (F2)

yi ~ p(yi|,) = 2|if2(yi) + 1|i f1(yi) + 0|i f0(yi)QTL genotype (j) QQ Qq qq Code 2 1 0

fj(yi) is a normal distribution densitywith mean j and variance 2

= (2, 1, 0)

= QTL conditional probability given on flanking markers

where

Page 6: Functional Mapping A statistical model for mapping dynamic genes

Subject

Marker (M) Conditional probability

M1 M2 … Mm

Phenoty

pe (y)

of QTL genotypeQQ(2) Qq(1) qq(0)

1 AA(2) BB(2) … y1

2|1

1|1

0|1

2 AA(2) BB(2) ... y2

2|2

1|2

0|2

3 Aa(1) Bb(1) ... y3

2|3

1|3

0|3

4 Aa(1) Bb(1) ... y4

2|4

1|4

0|4

5 Aa(1) Bb(1) ... y5

2|5

1|5

0|5

6 Aa(1) bb(0) ... y6

2|6

1|6

0|6

7 aa(0) Bb(1) ... y7

2|7

1|7

0|7

8 aa(0) bb(0) … y8

2|8

1|8

0|8

Data Structure

Page 7: Functional Mapping A statistical model for mapping dynamic genes

Human Development

Robbins 1928, Human Genetics, Yale University Press

Page 8: Functional Mapping A statistical model for mapping dynamic genes

Tree growth

Looks mess, but there are simple rules underlying the complexity.

Page 9: Functional Mapping A statistical model for mapping dynamic genes

The dynamics of gene expression• Gene expression displays in a dynamic fashion

throughout lifetime.• There exist genetic factors that govern the

development of an organism involving:– Those constantly expressed throughout the lifetime (called

deterministic genes)– Those periodically expressed (e.g., regulation genes)

• Also environment factors such as nutrition, light and temperature.

• We are interested in identifying which gene(s) govern(s) the dynamics of a developmental trait using a procedure called Functional Mapping.

Page 10: Functional Mapping A statistical model for mapping dynamic genes

Stem diameter growth in poplar trees

Ma et al. (2002) Genetics

Page 11: Functional Mapping A statistical model for mapping dynamic genes

Poplar tree - height & diameter

Page 12: Functional Mapping A statistical model for mapping dynamic genes

Mouse growth

1 2 3 4 5 6 7 8 9 100

5

10

15

20

25

30

35

40

45

50

Week

Wei

ght

A: male; B: female

Page 13: Functional Mapping A statistical model for mapping dynamic genes

Developmental Pattern of Genetic Effects

Wu and Lin (2006) Nat. Rev. Genet.

QQ

Qq

QQ

QQ

QQ

Qq

Qq

Qq

Page 14: Functional Mapping A statistical model for mapping dynamic genes

Sample

Marker (M) Phenotype (y) Conditional probability

1 2 … m

t1 t2 … tT

of QTL genotypeQQ(2) Qq(1) qq(0)

1 2 2 … y1(1) y

1(2) … y

1(T)

2|1

1|1

0|1

2 2 2 ... y2(1) y

2(2) … y

2(T)

2|2

1|2

0|2

3 1 1 … y3(1) y

3(2) … y

3(T)

2|3

1|3

0|3

4 1 1 … y4(1) y

4(2) … y

4(T)

2|4

1|4

0|4

5 1 1 … y5(1) y

5(2) … y

5(T)

2|5

1|5

0|5

6 1 0 … y6(1) y

6(2) … y

6(T)

2|6

1|6

0|6

7 0 1 … y7(1) y

7(2) … y

7(T)

2|7

1|7

0|7

8 0 0 ... y8(1) y

8(2) … y

8(T)

2|8

1|8

0|8

Data StructureParents AA aa

F1 Aa

Aa

F2 AA Aa aa ¼ ½ ¼

Page 15: Functional Mapping A statistical model for mapping dynamic genes

Mapping methods for dynamic traits• Traditional approach: treat traits measured at each time point as a

univariate trait and do mapping with traditional QTL mapping approaches such as interval or composite interval mapping.

• Limitations: – Single trait model ignores the dynamics of the gene expression

change over time, and is too simple without considering the underlying biological developmental principle.

• A better approach: Incorporate the biological principle into a mapping procedure to understand the dynamics of gene expression using a procedure called Functional Mapping (pioneered by Wu and group).

Page 16: Functional Mapping A statistical model for mapping dynamic genes

A general framework pioneered by Dr. Wu and his colleagues, to map QTLs that affect the pattern and form of development in time course

- Ma et al., Genetics 2002

- Wu et al., Genetics 2004 (highlighted in Nature

Reviews Genetics)

- Wu and Lin, Nature Reviews Genetics 2006

While traditional genetic mapping is a combination between classic genetics and statistics, functional mapping combines genetics, statistics and biological principles.

Functional Mapping (FunMap)

Page 17: Functional Mapping A statistical model for mapping dynamic genes

Data structure for an F2 population

Phenotype Marker_______________________________ ________________________________________

Sample y(1) y(2) … y(T) 1 2 … m_____________________________________________________________________________________1 y11 y21 … yT1 1 1 … 0

2 y12 y22 … yT2 -1 1 … 1

3 y13 y23 … yT3 -1 0 … 1

4 y14 y24 … yT4 1 -1 … 0

5 y15 y25 … yT5 1 1 … -1

6 y16 y26 … yT6 1 0 … -1

7 y17 y27 … yT7 0 -1 … 0

8 y18 y28 … yT8 0 1 … 1

n y1n y2n … yTn 1 0 … -1

       There are nine groups of two-marker genotypes, 22, 21, 20, 12, 11, 10, 02, 01 and 00, with sample sizes n22, n21, …, n00;

      The conditional probabilities of QTL genotypes, QQ (2), Qq (1) and qq (0) given these marker genotypes 2i, 1i, 0i.

Page 18: Functional Mapping A statistical model for mapping dynamic genes

Univariate interval mapping

L(y) =

fj(yi) = j=2,1,0 for QQ, Qq, qq

 The Lander-Botstein model estimates (2, 1, 0, 2, QTL position)

Multivariate interval mapping L(y) =

 Vector y = (y1, y2, …, yT)

fj(yi) =

Vectorsuj = (j1, j2, …, jT) Residual variance-covariance matrix =  

The unknown parameters: (u2, u1, u0, , QTL position) [3T + T(T-1)/2 +T parameters]

n

iiiiiii yfyfyf

1001122 )()()(

}exp{2

12

2

2

)(

jiy

n

1ii0i0i1i1i2i2 )(f)(f)(f yyy

1

(2)T / 2 1/ 2 {exp−1

2 (yi −u j )T −1 (yi −u j )}

21

121

TT

T

Page 19: Functional Mapping A statistical model for mapping dynamic genes

Functional mapping: the framework

Observed phenotype: yi = [yi(1), …, yi(T)] ~ MVN(uj, )

Mean vector: uj = [μj(1), μj(2), …, μj(T)], j=2,1,0

)()2,()1,(

),2()2()1,2(

),1()2,1()1(

2

2

2

TTT

T

T

Σ (Co)variance matrix:

Page 20: Functional Mapping A statistical model for mapping dynamic genes

An innovative model for genetic dissection of complex traits by incorporating mathematical aspects of biological principles into a mapping framework

Functional Mapping

Provides a tool for cutting-edge research at the interplay between gene action and development

Functional mapping does not estimate (u2, u1, u0, ) directly, instead of the biologically meaningful parameters.

Page 21: Functional Mapping A statistical model for mapping dynamic genes

The Finite Mixture Model

L( ,Θp ,Θq|M ,y)

= 2|i f2 (yi ) + 1|i f1 (yi ) + 0|i f0 (yi )[ ]i=1

n

Modeling mixture proportions, i.e., genotype frequencies at a putative QTL

Modeling the mean vectorModeling the (co)variance matrix

Three statistical issues:

Page 22: Functional Mapping A statistical model for mapping dynamic genes

Modeling the developmental Mean Vector

• Parametric approach Growth trajectories – Logistic curve HIV dynamics – Bi-exponential function Biological clock – Van Der Pol equation Drug response – Emax model

• Nonparametric approach Lengedre function (orthogonal polynomial)

Spline techniques

Page 23: Functional Mapping A statistical model for mapping dynamic genes

Example: Stem diameter growth in poplar trees

Ma, et al.Genetics2002

Page 24: Functional Mapping A statistical model for mapping dynamic genes

Modeling the genotype-dependent mean vector,uj = [uj(1), uj(2),…, uj(T)]

= [ , , …, ]

Instead of estimating mj, we estimate curveparametersΘp = (aj, bj, rj)jr

j

j

eb

a1

jrj

j

eb

a2

1 jTr

j

j

eb

a1

Number of parameters to be estimated in the mean vectorTime points Traditional approach Our approach 5 3 5 = 15

3 3 = 910 3 10 = 30 3 3 = 950 3 50 = 150

3 3 = 9

Logistic Curve of Growth – A Universal Biological Law (West et al.: Nature 2001)

Page 25: Functional Mapping A statistical model for mapping dynamic genes

Modeling the Covariance Matrix

Stationary parametric approachAutoregressive (AR) model with log transformation

Nonstationary parameteric approachStructured antedependence (SAD) modelOrnstein-Uhlenbeck (OU) process

=

1

1

1

1

321

32

2

12

TTT

T

T

T

Page 26: Functional Mapping A statistical model for mapping dynamic genes

Functional interval mapping L(y) =  Vector y = (y1, y2, …, yk)

 f2(yi) =

f1(yi) =

f0(yi) =

 u2 = ( , ,…, )

u1 = ( , , …, )

u0 = ( , , …, )

1

(2)k/ 2 1/ 2 {exp−1

2 (yi −u2 )T −1 (yi −u2 )}

1

(2)k/ 2 1/ 2 {exp−1

2 (yi −u1 )T −1 (yi −u1 )}

n

1ii0i0i1i1i2i2 )(f)(f)(f yyy

2r2

2

eb1

a 2r2

2

2

eb1

a

1r1

1

eb1

a 1r2

1

1

eb1

a

0r0

0

eb1

a 0r2

0

0

eb1

a

22

2

1 Treb

a

11

1

1 Treb

a

00

0

1 Treb

a

1

(2)k/ 2 1/ 2 {exp−1

2 (yi −u0 )T −1 (yi −u0 )}

Page 27: Functional Mapping A statistical model for mapping dynamic genes

n

i jijijqp yfyL

1

2

0| )(log),|,(log MΘΘ

Θ

n

i jij

qp

ij

ijij

n

iij

qp

ij

ijjj ijij

ijij

n

i jj ijij

ijij

j ijij

ij

qp

yf

yfyf

yf

yf

yf

yf

yf

yML

qp

ij

1

2

0

|

||

1

|

|

2

02

0 |

|

1

2

02

0 |

|

2

0 |

)(log1

)(log1

)(

)(

)(

)(

)(

)(

),|,(log

|

ΘΘ

ΘΘ

ΘΘ

ΘΘ

Estimation

Page 28: Functional Mapping A statistical model for mapping dynamic genes

The EM algorithm

j|i = j|i fj (yi )

′ j |i f ′ j (yi )′ j =0

2∑

M step 0)|,(

yL qp ΘΘ

E step

Iterations are made between the E and M steps until convergence

Calculate the posterior probability of QTL genotype j for individual i that carries a known marker genotype

Solve the log-likelihood equations

Page 29: Functional Mapping A statistical model for mapping dynamic genes

EM continued

The likelihood function:

))}ulog(())ulog((exp{)2(

1)( 1'

21

2/12/ jijiTij zzzf

n

iiiiiiiii zfzfzfzfL

1000101101022 )()()()()(

u j =(aj

1+ bj e−rj

,aj

1+ bj e−2rj

,...,aj

1+ bj e−Tr j

)

Page 30: Functional Mapping A statistical model for mapping dynamic genes

Statistical DerivationsM-step: update the parameters (see Ma et al. 2002, Genetics for details)

Page 31: Functional Mapping A statistical model for mapping dynamic genes

Testing QTL effect: Global test• Instead of testing the mean difference at every time points for

different genotypes, we test the difference of the curve parameters.

• The existence of QTL is tested by

• H0 means the three mean curves overlap and there is no QTL effect.

• Likelihood ratio test with permutation to assess significance.

where the notation “~” and “^” indicate parameters estimated under the null and the alternative hypothesis, respectively.

)](log)~

([log2

LLLR

Page 32: Functional Mapping A statistical model for mapping dynamic genes

Testing QTL effect: Regional test• Regional test: to test at which time period [t1,t2] the

detect QTL triggers an effect, we can test the difference of the area under the curve (AUC) for different QTL genotype, i.e.,

where

• Permutation tests can be applied to assess statistical significance.

2

1

2

1

2

1|||:0ttaa

ttAa

ttAA AUCAUCAUCH

)]log()[log(

3,2,11

|

12

2

1

2

1

trj

trj

j

j

t

t trj

jttj

jj

j

ebebr

a

jdteb

aAUC

Page 33: Functional Mapping A statistical model for mapping dynamic genes

Applications

• Several real examples are used to show the utility of the functional mapping approach.

• Application I is about a poplar growth data set.

• Application II is about a mouse growth data set.

• Application III is about a rice tiller number growth data set.

Page 34: Functional Mapping A statistical model for mapping dynamic genes

Application I: A Genetic Studyin Poplars

Parents AA aa

F1 Aa

AA

BC AA Aa ½ ½

Genetic design

Page 35: Functional Mapping A statistical model for mapping dynamic genes

Stem diameter growth in poplar trees

Ma, Casella & Wu, Genetics 2002

rtbe

atg

1)(

a:Asymptotic growth

b:Initial growth

r:Relative growth rate

Page 36: Functional Mapping A statistical model for mapping dynamic genes

Differences in growth across agesUntransformed Log-transformed

Poplardata

Page 37: Functional Mapping A statistical model for mapping dynamic genes

Modeling the covariance structureStationary parametric approach

First-order autoregressive model (AR(1))

Multivariate Box-Cox transformation to stabilize variance (Box and Cox, 1964

Transform-both-side (TBS) technique to reserve the interpretability of growth parameters (Carrol and Ruppert, 1984; Wu et al., 2004). For a log transformation (i.e., =0),

1

1

1

21

2

1

2

TT

T

T

Σ Θq = (,2)

Page 38: Functional Mapping A statistical model for mapping dynamic genes

Functional mapping incorporated by logistic curves and AR(1) model

QTL

Results by FunMap

Results by Interval mapping

FunMap has higher power to detect the QTL than the traditional interval mapping method does.

Ma, Casella & Wu, Genetics 2002

Page 39: Functional Mapping A statistical model for mapping dynamic genes

Application II: Mouse Genetic StudyDetecting Growth Genes

Data supplied by Dr. Cheverud at Washington University

Page 40: Functional Mapping A statistical model for mapping dynamic genes

Mouse Linkage Map

Page 41: Functional Mapping A statistical model for mapping dynamic genes

Body Mass Growth for Mouse

1 2 3 4 5 6 7 8 9 100

5

10

15

20

25

30

35

40

45

50

Week

Wei

ght

510 individuals measuredOver 10 weeks

Parents AA aa

F1 Aa

Aa

F2 AA Aa aa ¼ ½ ¼

Page 42: Functional Mapping A statistical model for mapping dynamic genes

Functional mappingGenetic control of body mass growth in mice

Zhao, Ma, Cheverud & Wu, Physiological Genomics2004

Page 43: Functional Mapping A statistical model for mapping dynamic genes

Application III: functional mapping of PCD QTL

• Rice tiller development is thought to be controlled by genetic factors as well as environments.

• The development of tiller number growth undergoes a process called programmed cell death (PCD).

Page 44: Functional Mapping A statistical model for mapping dynamic genes

Parents AA aa

F1 Aa

DH AA aa ½ ½

Genetic design

Page 45: Functional Mapping A statistical model for mapping dynamic genes

Joint model for the mean vector

• We developed a joint modeling approach with growth and death phases are modeled by different functions.

• The growth phase is modeled by logistic growth curve to fit the universal growth law .

• The dead phase is modeled by orthogonal Legendre function to increase the fitting flexibility.

Page 46: Functional Mapping A statistical model for mapping dynamic genes
Page 47: Functional Mapping A statistical model for mapping dynamic genes

Cui et al. (2006) Physiological Genomics

Page 48: Functional Mapping A statistical model for mapping dynamic genes

QTL trajectory plot

Page 49: Functional Mapping A statistical model for mapping dynamic genes

Advantages of Functional Mapping

• Incorporate biological principles of growth and development into genetic mapping, thus, increasing biological relevance of QTL detection

• Provide a quantitative framework for hypothesis tests at the interplay between gene action and developmental pattern

- When does a QTL turn on?

- When does a QTL turn off?

- What is the duration of genetic expression of a QTL?

- How does a growth QTL pleiotropically affect developmental events?

• The mean-covariance structures are modeled by parsimonious parameters, increasing the precision, robustness and stability of parameter estimation

Page 50: Functional Mapping A statistical model for mapping dynamic genes

Functional Mapping:toward high-dimensional biology

• A new conceptual model for genetic mapping of complex traits

• A systems approach for studying sophisticated biological problems

• A framework for testing biological hypotheses at the interplay among genetics, development, physiology and biomedicine

Page 51: Functional Mapping A statistical model for mapping dynamic genes

Functional Mapping:Simplicity from complexity

• Estimating fewer biologically meaningful parameters that model the mean vector,

• Modeling the structure of the variance matrix by developing powerful statistical methods, leading to few parameters to be estimated,

• The reduction of dimension increases the power and precision of parameter estimation