bio277 lab 3: finding transcription factor binding motifs

15
Bio277 Lab 3: Finding Transcription Factor Binding Motifs Adapted from a Lab Written by Prof Terry Speed Jess Mar Department of Biostatistics Quackenbush Lab DFCI [email protected]

Upload: yin

Post on 18-Feb-2016

45 views

Category:

Documents


0 download

DESCRIPTION

Bio277 Lab 3: Finding Transcription Factor Binding Motifs. Adapted from a Lab Written by Prof Terry Speed Jess Mar Department of Biostatistics Quackenbush Lab DFCI [email protected]. Outline. Analyze cell cycle gene expression data. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Bio277 Lab 3: Finding Transcription Factor Binding Motifs

Bio277 Lab 3: Finding Transcription Factor Binding

MotifsAdapted from a Lab Written by Prof Terry Speed

Jess MarDepartment of Biostatistics

Quackenbush Lab [email protected]

Page 2: Bio277 Lab 3: Finding Transcription Factor Binding Motifs

OutlineAnalyze cell cycle gene expression data.

Cluster cell cycle data using hierarchical clustering.

Visualize cell cycle clusters.

Find motifs in these clusters and visualize them using sequence logos.

Page 3: Bio277 Lab 3: Finding Transcription Factor Binding Motifs

The Cell Cycle

Page 4: Bio277 Lab 3: Finding Transcription Factor Binding Motifs

Cell Cycle Data Set

• Experiments assayed mRNA expression patterns over the duration of one cell cycle (at least).

• Custom cDNA microarray platform. • RNA samples from Saccharomyces cerevisiae cell culture. • 3 methods of synchronization - -factor arrest, cdc15, elutriation.

Today's data:-factor arrest (blocks cell division in G1). ~6000 genes x 17 times pointsSampled at 7min intervals over 120min, starting at time zero.

See http://cellcycle-www.stanford.edu

Paper: Spellman et al. Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Molecular Biology of the Cell, 1998, p3273.

Page 5: Bio277 Lab 3: Finding Transcription Factor Binding Motifs

Experimental DataFrom the ~6000 yeast genes, we have chosen to focus on those involved in key biological processes (such as cell cycle, oxidative phosphorylation and nucleotide metabolism).

Read the data into R:dat <- read.table("ccexpdata.txt", header=T, sep="\t")

Objective: find transcription factor binding sites implicated in the cell cycle.

• How do we search for these binding sites? • Where do we begin to search?

Page 6: Bio277 Lab 3: Finding Transcription Factor Binding Motifs

Linking Gene Expression and Promoters

One canonical representation of gene regulation.

Genes that are regulated by the same transcriptional program share similar expression patterns.

But co-expression does not always imply co-regulation.

We look to upstream promoter regions to see if we can elucidate common regular expression patterns.

Statistically over-represented patterns are potential transcription binding sites.

Page 7: Bio277 Lab 3: Finding Transcription Factor Binding Motifs

Building Gene Expression Clusters

distMat <- dist(dat, method="euclidean")clustObj <- hclust(distMat)plot(clustObj)

How many clusters should we use?cluster.labels <- cutree(clustObj, 15)print(table(cluster.labels))

The cluster distribution looks like:

barplot(table(cluster.labels), xlab="Cluster Size", ylab="Frequency")

Page 8: Bio277 Lab 3: Finding Transcription Factor Binding Motifs

Visualizing Clusters

par(mfrow=c(2,4))

for( i in 1:8 ){

titleLab <- paste("Cluster ", i, sep="")

expr.prof <- as.matrix(dat[cluster.labels == i,])

plot(expr.prof[1,],

ylim=range(expr.prof, na.rm=T), type="l", xlab="Time", ylab="Expression", main=titleLab)

apply(expr.prof, 1, lines)

}

Let's plot the first 8 clusters:

Page 9: Bio277 Lab 3: Finding Transcription Factor Binding Motifs

par(mfrow=c(2,4))for( i in 9:15 ){

titleLab <- paste("Cluster ", i, sep="") expr.prof <- as.matrix(dat[cluster.labels == i,])plot(expr.prof[1,], ylim=range(expr.prof, na.rm=T), type="l", xlab="Time", ylab="Expression", main=titleLab)apply(expr.prof, 1, lines)

}

Page 10: Bio277 Lab 3: Finding Transcription Factor Binding Motifs

Exporting Expression ClustersWrite out the gene names in each cluster into a text file:

for( i in 1:15 ){cluster.genes <- row.names(dat)[cluster.labels == i] fileName <- paste("cluster", i, ".txt", sep="")write(cluster.genes, fileName)

}

Are they there?

dir()

Page 11: Bio277 Lab 3: Finding Transcription Factor Binding Motifs

Retrieving Promoter SequencesLet's focus on Cluster 12.

We can retrieve the promoter sequences for these genes using a tool called RSA:

http://rsat.scmbb.ulb.ac.be/rsat//RSAT_home.cgi

When working on yeast genomics, another great resource is: http://www.yeastgenome.org/

Page 12: Bio277 Lab 3: Finding Transcription Factor Binding Motifs

TF Motif Finding ToolsMEMEhttp://meme.sdsc.edu/meme/meme.html

AlignACEhttp://atlas.med.harvard.edu/cgi-bin/alignace.pl

BioProspectorhttp://ai.stanford.edu/~xsliu/BioProspector/

Page 13: Bio277 Lab 3: Finding Transcription Factor Binding Motifs

Making Sequence LogosWebLogohttp://weblogo.berkeley.edu/logo.cgi

SEQLOGO http://ep.ebi.ac.uk/EP/SEQLOGO/

Page 14: Bio277 Lab 3: Finding Transcription Factor Binding Motifs

TRANSFAC Databasehttp://www.gene-regulation.com/pub/databases.html#transfac

Database on eukaryotic cis-acting regulatory transcription factors.

SITE: gives information on (regulatory) transcription factor binding sites within eukaryotic genes.

GENE: explanation of the gene where a site (or group of sites) belongs to.

FACTOR: describes the proteins binding to these sites. CELL gives brief information about the cellular source of proteins that have been shown to interact with the sites.

CELL: gives brief information about the cellular source of proteins that have been shown to interact with the sites.

CLASS: contains some background information about the transcription factor classes.

MATRIX: gives nucleotide distribution matrices for the binding sites of transcription factors.

Page 15: Bio277 Lab 3: Finding Transcription Factor Binding Motifs

Public Data Repositories for Gene Expression Studies

http://www.ebi.ac.uk/microarray-as/aer/?#ae-main[0]

2701 experiments available.

Expression profiles derived from 180 experiments, 112 510 genes available.

http://www.ncbi.nlm.nih.gov/projects/geo/3916 expression platforms.174 783 samples.