a bayesian statistical approach to modeling gene regulatory pathways in human placental data elinor...

70
A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Upload: clyde-mills

Post on 12-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in

Human Placental Data

Elinor VelasquezDept. of Biology

San Francisco State University

Page 2: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Outline of talk

• Introduction• The experimental approach: Obtaining

placenta data• The experimental approach: Modeling gene

regulatory networks• Results from experiments• Conclusions and future work• Acknowledgements

Page 3: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Introduction

Page 4: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Overall goal

http://www.biotechnologycenter.org/hio/assets/hisimages/placenta/placenta44.jpg

To use a bioinformatics model for which to better understand the human placenta

Page 5: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

The human placenta

http://www.uchsc.edu/winnlab/index.html

Page 6: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

The basal plate in the placenta

Site of known anatomical abnormalities in preeclampsia

http://www.uchsc.edu/winnlab/projects.html

Page 7: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

EGFR pathway

• EGFR, cell surface receptor for epidermal growth factors

• Potentially important gene for the placenta

British Journal of Cancer (2006) 94, 184 – 188

Page 8: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

EGFR regulates gene expression

EGFR

ANGPT2 CSPG2 DCN

Page 9: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Causal relationships

EGFR

ANGPT2 CSPG2 DCN

Page 10: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Example of a gene regulatory network

Gene 1

Gene2

Gene 3

Gene 5

Gene 6Gene

4

Page 11: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Definition of a Bayesian network

• There exist nodes (disks)

• There are edges (arrows) between some of the nodes

• Causality is implied by the edges

• Acyclic

Gene 1

Gene2

Gene 3

Gene 5

Gene 6Gene

4

Page 12: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

The experimental approach: Obtaining placenta data

Page 13: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Data collected from microarrays• Data comes from 36

experiments conducted by Virginia Winn et al. at the SJ Fisher lab, UCSF

• Gene expression profiling experiments

45000 dots (25-mer oligo probe sets)

representing the human genome

cRNA

hybridization

Page 14: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Traditional “spotted” arrays

Page 15: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

What is a probe set?

• Several oligonucleotides designed to hybridize to various parts of the mRNA generated from a single gene

Probe set

mRNA

gene

Page 16: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Affymetrix GeneChips

Page 17: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Microarray data

The normalized log 2 intensity values were centered to the median value of each probe set, by Virginia Winn et al.

A probe set x1 ... x6 y1 ... y9 z1 ...z6 w1...w6 s1 ... s9

1 2 3 4 5

5 time segments:

36 data points per probe set

Page 18: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Microarray data• Red denotes the up regulated expression and green denotes

the down regulated expression relative to the median value• Genes differentially expressed in the basal plate of

placentas: Rows contain data from a single basal plate cRNA sample and columns correspond to a single probe set.

http://www.uchsc.edu/winnlab/index.html

Page 19: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Summary of data used in bioinformatics experiments

• 36 placentas• 45, 000 probe sets

• Time-series data from

14-16 weeks to term

Gene egfr

Page 20: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

The experimental approach: Modeling gene regulatory

networks

Page 21: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Outline of bioinformatics experimental design

Step 1. Create a naïve Bayesian network using the probe set dataStep 2. Score the naïve Bayesian networkStep 3. Randomly add/delete an edge and rescore the Bayesian

networkStep 4. Continue until best score reachedStep 5. Combine probe sets to create the gene regulatory

network

PS1

PS 2 PS

3

PS4

Page 22: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Four probe sets (Three genes)

Page 23: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Define naïve Bayesian network

• Choose a root node• All other nodes branch

off of the root node• PS1 is the parent node

PS1

PS 2 PS

3

PS4

Page 24: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Step 1: Create a naïve Bayesian network using probe set data

• Use data from one time segment• Choose Weeks 23-24 data (6 placentas)• Choose 4 probe sets

PS1

PS2 PS3 PS4

Page 25: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Placenta data for Weeks 23-24

PS1 corresponds to 201984 which corresponds to EGFR

PS2 corresponds to 236034, PS3 corresponds to 211148: PS2 and PS3 both correspond to ANGPT2

PS4 corresponds to 204620 which corresponds to CSPG2

Page 26: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Step 2: Score the naïve Bayesian network

• We want to score this network:

PS1

PS2 PS4 PS3

Page 27: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

The network score is a function of conditional probabilities

• Conditional probability, Prob(N | Pa(N)), is the probability of child node N given parent of N

• Example: Given a parent PS1’s node has an associated expression value 10, what is the probability that its child node, PS4, has an expression value of 8?

PS1

PS4

Page 28: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Conditional probability

• EGFR (PS1) is the parent node and has value 10. • CSPG2 (PS4) is the child node and has value 8 two times• Conditional probability = 2/6

PS1

PS4

Page 29: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Score for a Bayesian network

The score of the naive network equals the product of all the nonzero conditional probabilities associated with the network:

P(N1, N2, N3, N4) = Π P(Ni | pa(Ni))i=1

4

Page 30: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Score for the naïve Bayesian network

P(N1, N2, N3, N4) = 1/3966

= 2.54 x 10-5

PS1

PS4PS2 PS3

Page 31: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Step 3: Randomly add/delete an edge and rescore the Bayesian network

The score becomes 1/78732 = 1.27 x 10-5.

PS1

PS2

PS4

PS3

Page 32: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Step 4. Continue until best score reached

• Since the score is a probability, we want the score to be high.

• The naïve network is the better choice between the two networks, so we pick it as our final network.

PS1

PS4PS2 PS3

Page 33: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Step 5. Combine probe sets to create the gene regulatory network

CSPG2ANGPT2

EGFR

Page 34: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

40 probe sets (26 genes)

Page 35: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Gene regulatory pathwayfor 26 genes

Step 1. Create a naïve Bayesian network using 40 probe sets for each time segment

Step 2. Score the naïve Bayesian networkStep 3. Randomly add/delete an edge and rescore the Bayesian

networkStep 4. Continue until best score reachedStep 5. Combine probe sets to create the gene regulatory

network for the placenta

Page 36: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Step 1. Create a naïve Bayesian network using 40 probe sets for each time segment

Page 37: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Create a naïve Bayesian network

PS1

PS 2 PS

3

PS4

PS5

PS9

PS6

PS8

PS7

Page 38: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Step 2. Score the naïve Bayesian network

Page 39: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Score for a Bayesian network

The score of the naive network equals the product of all the nonzero conditional probabilities associated with the network:

P(N1, N2, N3, N4) = Π P(Ni | pa(Ni))

40

i=1

Page 40: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Step 3. Randomly add/delete an edge and rescore the Bayesian network

Step 4. Continue until best score reached

Page 41: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

With four probe sets, at least two Bayesian networks were constructed:

PS1

PS4PS2 PS3

PS1

PS2

PS4

PS3

Page 42: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Exhaustive search

• To be certain that we have the best scoring network, we need to construct all possible networks from our naïve networks

• With four probe sets, we only constructed one other network than the naïve network

• How to construct all possible networks?

Page 43: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

How do we construct all possible networks?

• 1 probe set 1 Bayesian network• 2 probe sets 2 possible Bayesian networks• 3 probe sets 12 possible Bayesian networks• 4 probe sets 144 possible Bayesian networks• 5 probe sets > 4800 possible Bayesian networks!• 6 probe sets … ??• And so on…

Page 44: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Welcome to “Modern Heuristics”• Step 1. Representation of a model• Step 2. The scoring function • Step 3. Defining the search problem• Step 4. Consider local optima

score

local

change

Page 45: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Step 1: Representation of the model

• The model is a gene regulatory pathway.• We are going to assume a Bayesian model for our

probe set:

• The number of possible pathways is so large as to forbid an exhaustive search for the best Bayesian network.

PS1

PS 2 PS

3

PS4

Page 46: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Step 2: The scoring function

• The fair coin, p(X = heads) = ½• What happens if the coin is unfairly weighted?• We need to re-think probability:

p(X) = ∫p(x) r(x) dx

• r(x) is a weight function.

Page 47: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Step 2. The scoring function

• The scoring function is a probability

• Assume the network has a Dirichlet distribution which is the weight function used to weight the conditional probabilities.

www.wikipedia.com

Page 48: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Step 2. The scoring function

Probability of a fixed network equals product of conditional probabilities times the Dirichlet distribution:

P(N) = Π P(Ni | pa(Ni)) D(Ni)40

i = 1

D(Ni) = ∏ Θiάi-1(N i)

such that

Page 49: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Step 3: Defining the search problem

What it means to search: a. Construct a first network (Use a naïve

Bayesian network) b. Score the first network using the scoring

function c. Perform the Hill-climbing algorithm.

Page 50: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Step 3. Defining the search problem The Hill-climbing Algorithm:

• Randomly choose a node• “Search” in the neighborhood of that node for

the best scoring network

Page 51: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Step 4. Consider local optima

• Hill-Climbing is a traditional method for search techniques

• Can get caught on local maxima

• Step 4 is to keep choosing random nodes.

From http://content.answers.com/

score

local

change

randomly chosen node is the origin

Page 52: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Software • Weka software package written by members of the University of

Waikato, New Zealand, http://www.cs.waikato.ac.nz/~ml/people.html

• DEAL, R package, written by Susanne G. Bøttcher, Claus Dethlefsen, http://www.math.auc.dk/novo/deal

• BayesNet Toolbox, Matlab package, written by Kevin Murphy, http://www.cs.ubc.ca/~murphyk/Software/BNT/bnt.html

• ExpressionNet, written by Jingchun Zhu, http://expressionnet.sourceforge.net/

Page 53: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Results from experiments

Page 54: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

26 genes

Page 55: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

IGFBP1

PLAU

MRC2

ATP5E

ERG

PECAM1

IL2RB CECAM1 CYP19A1

USP6NL

EGFR

ADAM9

GLB1

CCNG2

RAP2B

P4HA1

BAMBI

INHBA

CSPG2

DCN

COL5A2COL5A1COL3A1

COL1A2

SPP1

ANGPT2

SFRP1

Ingenuity network

Page 56: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Results for 26 genes

• 40 probe sets (26 genes)• Data comes from five different time intervals: 1. 14 – 16 gestational weeks 2. 18 – 19 gestational weeks 3. 21 gestational week 4. 23 – 24 gestational weeks 5. 37 – 40 gestational weeks

Page 57: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Time Segment:

Week 14-16 weeks

IGFBP1

PLAU

MRC2

ATP5E

ERG

PECAM1

IL2RB CECAM1 CYP19A1

USP6NL

EGFR

ADAM9

GLB1

CCNG2

RAP2B

P4HA1

BAMBI

INHBA

CSPG2

DCN

COL5A2COL5A1COL3A1

COL1A2

SPP1

ANGPT2

SFRP1

Page 58: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

IGFBP1

PLAU

MRC2

ATP5E

ERG

PECAM1

IL2RB CECAM1 CYP19A1

USP6NL

EGFR

ADAM9

GLB1

CCNG2

RAP2B

P4HA1

BAMBI

INHBA

CSPG2

DCN

COL5A2COL5A1COL3A1

COL1A2

SPP1

ANGPT2

SFRP1

Time segment:

18 – 19 weeks

Page 59: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

IGFBP1

PLAU

MRC2

ATP5E

ERG

PECAM1

IL2RB CECAM1 CYP19A1

USP6NL

EGFR

ADAM9

GLB1

CCNG2

RAP2B

P4HA1

BAMBI

INHBA

CSPG2

DCN

COL5A2COL5A1COL3A1

COL1A2

SPP1

ANGPT2

SFRP1

Time segment:

21 weeks

Page 60: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

IGFBP1

PLAU

MRC2

ATP5E

ERG

PECAM1

IL2RB CECAM1 CYP19A1

USP6NL

EGFR

ADAM9

GLB1

CCNG2

RAP2B

P4HA1

BAMBI

INHBA

CSPG2

DCN

COL5A2COL5A1COL3A1

COL1A2

SPP1

ANGPT2

SFRP1

Time segment:

23 – 24 weeks

Page 61: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

IGFBP1

PLAU

MRC2

ATP5E

ERG

PECAM1

IL2RB CECAM1 CYP19A1

USP6NL

EGFR

ADAM9

GLB1

CCNG2

RAP2B

P4HA1

BAMBI

INHBA

CSPG2

DCN

COL5A2COL5A1COL3A1

COL1A2

SPP1

ANGPT2

SFRP1

Time segment:

37 – 40 weeks

Page 62: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

How to display data

• One of the most pressing questions in bioinformatics research is how to display the data effectively

• We have two solutions 1. An interaction map 2. Geometrical considerations

Page 63: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

An interaction map for 26 genes

Page 64: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Geometrical considerations

• Will illustrate with the gene egfr• egfr is an epidermal growth factor Functions on the cell surface Activated by binding of its specific ligands Responsible for many pathways in animal

models

Page 65: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Gene egfr regulated by:

Page 66: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Genes on a dodecahedron: Gene regulatory network for egfr

Adapted from http://www.math.cornell.edu/~mec/2003-2004/geometry/platonic/dodecahedron.jpg

PLAU

CCNG2COL1A2

CSPG2

INHBA

DCN

On backside:PECAM1ANGPT2IGFBP1MRC2 SPP1USP6NL

Page 67: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Conclusions

• We can predict gene regulatory networks using Bayesian networks as an intermediate step

• When we leave arrows in network, we are able to show causal relationships between the genes

• Interaction maps and use of geometry are novel ways to display gene behavior

Page 68: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Future Directions

• A three-dimensional viewer with numerical values will be implemented to use with the Weka software

• Use molecular genetics techniques to validate a portion of the results

• Design a genetic programming algorithm (evolutionary algorithm) to create a Bayesian network

Page 69: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University

Acknowledgements San Francisco State University: Leticia Márquez-Magaña, Chris Smith, Frank Bayliss, Juan Castellon, Ernesto

Flores, Rebecca Garcia, Alba Gutierrez, Jainee Lewis, Rebecca Mendez, Cylyn Cruz, Jasmin Reyes, Jackie Robinson, Peter Thorsen, My family

UC San Francisco:Susan Fisher, Matthew Gormley

M.B.R.S.-R.I.S.E. Grant 5 - R25-GM59298

Page 70: A Bayesian Statistical Approach to Modeling Gene Regulatory Pathways in Human Placental Data Elinor Velasquez Dept. of Biology San Francisco State University