using public databases to inform research questions

Post on 16-Apr-2017

283 Views

Category:

Science

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Using Public Databases to Inform Research Questions

Day 1

Large Reference Epigenome Projects Understanding Natural Variation

Understanding Epigenetic Variation

• International consortia have pursued the establishment of reference epigenomes for a large number of cell types and conditions

• Epigenome Roadmap published in Nature February 2015

• main findings of the NIH Roadmap EpigenomicsProgram

The Beginnings

• ENCODE (the Encyclopedia of DNA Elements) started in 2003 to identify functional elements in the non-coding parts of the genome

• Understand how the genome is packaged, regulated, and read

• In 2012 Nature, Genome Research and Genome Biologypublished 30 papers all together on the results of the ENCODE project

• While ENCODE transformed technology and data analysis, clinical application was limited

• Most results were from a small number of cell lines

ENCODE Data types:

• After the human genome had been mapped, it was clear the epigenomeneeded to be explored as well

https://www.encodeproject.org/

ENCODE to Roadmap• Funded by NIH, the Roadmap Epigenomic Project was established to generate

epigenomic data from the primary cells and tissues from both healthy individuals and patients with diseases (e.g. cancer, neurodegenerative and autoimmune disease)

• To the right are the tissues and cell types profiled

• 127 human tissues and cell types

PMID: 25693563

Cell Types in Roadmap• Many of the adult tissues investigated were broken down by cell type or region

• e.g. blood into several types of immune cell, and the brain into regions including the hippocampus and dorsolateral prefrontal cortex

PMID: 25693562

profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression

Data Sets Available in Roadmap

Many more cell/tissue groups…

To see full list: http://www.nature.com/nature/journal/v518/n7539/fig_tab/nature14248_F2.html

111 reference epigenomes from Roadmap with 16 additional epigenomesfrom ENCODE

International Human EpigenomeConsortium (IHEC)

• Roadmap represents an early component of IHEC• IHEC plans to determine the epigenomes of every cell type in the human body

— estimated to about a thousand

• IHEC brings together data from several consortiums, including:• ENCODE• Roadmap • BLUEPRINT (http://www.blueprint-epigenome.eu/)- aim to decipher the

epigenomes of more than 100 different types of blood cells• Canadian Epigenetics, Environment and Health Research Consortium (CEEHRC)

Using these Reference Data Sets

• What questions can these data sets answer?

• What questions can they not answer?

• How can we use this data to inform our study questions?

Lets Explore the Data

• There are a couple of ways to view and download this reference data

• I am going to provide a few examples

• Human Epigenome Atlas provides interactive Visualization and Download

• Not going to focus on download because there are many ways to view the data online

• http://www.genboree.org/epigenomeatlas/index.rhtml

click

Epigenetic Data BrowsersGetting an Idea of Content

UCSC Genome Browser

• Very user friendly, can search for a region or a gene

• Directions:• Go to: http://genome.ucsc.edu/• Click on “Genome Browser”• Search a favorite gene, click

“submit”

• Things to try:• zoom• Adding tracks• getting track info

click

Viewing the Roadmap Data

• From Genboree you can export the Roadmap data to Genome Browser, but there are two browsers from WashU that are great for view this data

• WashU Epigenome Browser• Supports multiple organisms, visualizes chromatin-interaction

data (e.g. Hi-C), performs gene set view, gene plot, and many other capabilities

• The Roadmap Epigenome Browser• Powered by the WashU Epigenome Browser, but specific to the

Roadmap data

WashU Epigenome Browser

• http://epigenomegateway.wustl.edu/browser/• Click on “Human h19”• Then click on “Public Hubs”• Then click “Reference human epigenomes from

Roadmap Epigenomics Consortium”

click

WashU Epigenome Browser

• Then click “Load” Roadmap Data from GEO• Then click “Loaded” to select the samples to view

click

WashU Epigenome Browser

• Can select and view all epigenetics marks for one tissue

• Walk through together

• Can view one type of epigenetic mark for multiple tissues

• Prefer to do this in Roadmap EpigenomeBrowser

Roadmap Epigenome Browser

• http://epigenomegateway.wustl.edu/browser/roadmap/

• Click on “h19” and then “Load”

• Can then select epigenetic mark

• Can select region or gene to interrogate

• Click submit

Select Region

Select mark

Can see how the data clusters

Exploring SNP data in Ensembl

• http://useast.ensembl.org/index.html• Can also explore in dbSNP but I prefer this interface

• Put rs number in search – example: rs1801133

Exploring SNP data in Ensembl

• Can explore population genetics from 1000 genomes project

• Can look at LD • Must select a

reference population

Exploring Other Public Data in NCBI Epigenomics• http://www.ncbi.nlm.nih.gov/epigenomics• Can browse experiments or samples and then view results

Exploring Other Public Data in NCBI Epigenomics• Choose “Browse Experiments”• Choose species, biological source and all features• “Select all” Experiment IDs• Click “View on Genome”

Exploring Other Public Data in NCBI Epigenomics

• Can view in UCSC Genome Browser

• Click “View at UCSC”

Exploring Other Public Data: GEO

GEO: Gene Expression Omnibus• A public functional genomics data repository• Both array and sequencing data stored

• Visit GEO DataSet Site: http://www.ncbi.nlm.nih.gov/gds/?term=

• Can search a research question of interest

• Many ways to download data into R – can download processed data or raw data

• Can download directly from GEO• Can use R library GEOquery• Both minfi and RnBeads have functions to download GEO data and

format for their specific purpose

Task: Replicating some Roadmap ResultsPracticing with these viewers and data sets

Brain Epigenomics

• Paper: Dissecting neural differentiation regulatory networks through epigenetic footprinting

• One of the papers published in the February issue of Nature

• Going to replicate some of the results presented in Supplementary Figure 1 and Figure 1

Brain Epigenomics: Supp. Figure 1

• Identified 3,396 differentially expressed genes between undifferentiated ES cells and the first four neural progenitor cell stages

• Pluripotency-associated genes such as OCT4 and NANOG are downregulated

• We’re going to look at H9 cells, H9 derived neuronal progenitor cells, and H9 derived neuron culture cells

• Data not available yet for all the cell types presented in paper

Brain Epigenomics: Supp. Figure 1

• Go to: http://epigenomegateway.wustl.edu/browser/• Going to load expression data for these 3 cell types

• Go to Human hg19• Go to Public Hubs• Click on “Roadmap Epigenomics Interactive Analysis Hub”• Load “Complete Consolidated Dataset”• Load Expression Data (under ES/iPS cells) for:

• H9 cells• H9 derived neuronal progenitor cells• H9 derived neuron culture cells

• Explore some of the genes in Supplementary Figure 1 (e.g. NANOG)

Brain Epigenomics: Supp. Figure 1

• Look at some of the genes in the figure all together by creating a gene set

• Click “Apps”• Select Gene and region set• Upload the list of genes in the

comments of slide• Can try “Gene set view” to see

all the genes at once• Under apps look at “Scatter

plot” to compare to plots

Brain Epigenomics: Figure 1Going to replicate part of this figure

Brain Epigenomics: Figure 1

• From the previous exercise we can see the downregulation of NANOG in the derived cells

• Now can add in additional epigenetic marks• Do they seem to be associated with regulation?

• Can also try to recreate the Figure 1 for SOX2 • Use H9 derived neuronal progenitor cell

Exploring Epigenomic Annotation of Genetic Variants• Another paper published in February Issue

• Epigenomic annotation of genetic variants using the Roadmap Epigenome Browser

• Going to look at multiple sclerosis–associated SNPs identified that are annotated using epigenomic and expression data from 31 primary human tissues (orange) and cells (light green).

• The region associated with rs756699 has H3K4me1 mostly confined to immune-related cell types (solid black box).

• The closest gene, TCF7 (3.8 kb downstream), also shows high expression in the same group of cell types

• The region surrounding rs307896 has H3K4me1 signal in all tissues and cell types (dashed black box).

• SNP rs307896 lies in an intron of SAE1, a gene that is also expressed in all the samples

Exploring Epigenomic Annotation of Genetic Variants

Exploring Epigenomic Annotation of Genetic Variants

• http://epigenomegateway.wustl.edu/browser/roadmap/• Can also do this in WashU Epigenome Browser, but this has a

nice clustering capability• Add H3K9me1 and RNA-seq (have to use search)

• Load data• Clink hg19 to select the samples – select primary cells and

tissues • In window click “chr” to select the gene TCF7, zoom out• Click the track header (e.g. “H3K4me1”) to add

“Annotation track”• Go to population variation• Add dbSNP

top related