regulatory genomics lab - university of illinois urbana

33
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics | Saurabh Sinha | 2018 1 PowerPoint by Saba Ghaffari

Upload: others

Post on 27-Jan-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Regulatory Genomics Lab - University of Illinois Urbana

Regulatory Genomics LabSaurabh Sinha

Regulatory Genomics | Saurabh Sinha | 2018 1

PowerPoint by Saba Ghaffari

Page 2: Regulatory Genomics Lab - University of Illinois Urbana

ExerciseIn this exercise, we will do the following:.

1. Use Galaxy to manipulate a ChIP track for BIN in D. Mel.

2. Subject peak sets to MEME suite.

3. Compare MEME motifs with Fly Factor Survey motifs for BIN.

4. Subject peak set to a gene set enrichment test.

Regulatory Genomics | Saurabh Sinha | 2018 2

Page 3: Regulatory Genomics Lab - University of Illinois Urbana

Step 0A: Local Files

For viewing and manipulating the files needed for this laboratory exercise, insert your flash drive.

Denote the path to the flash drive as the following:

[course_directory]

We will use the files found in:

[course_directory]/06_Regulatory_Genomics/data/

Regulatory Genomics | Saurabh Sinha | 2018 3

Page 4: Regulatory Genomics Lab - University of Illinois Urbana

Step 0B: Logging into Galaxy

Go to: http://galaxy.illinois.edu/galaxyClick Enter

Click Login

Input your login credentials.

Click Login.

Regulatory Genomics | Saurabh Sinha | 2018 4

Page 5: Regulatory Genomics Lab - University of Illinois Urbana

Computational Prediction of MotifsIn this exercise, we will upload a ChIP track of the transcription factor BIN in Drosophila Melanogaster to Galaxy.

After performing various file manipulations, we will use the MEME suite to identify a motif from the top 100 ChIP regions.

Subsequently, we will compare our predicted motif with the experimentally validated motif for BIN at Fly Factor Survey.

Regulatory Genomics | Saurabh Sinha | 2018 5

Page 6: Regulatory Genomics Lab - University of Illinois Urbana

Step 1: Accessing Input Files

At the top of the page, click Shared Data.

Then click Histories.

Regulatory Genomics | Saurabh Sinha | 2018 6

Page 7: Regulatory Genomics Lab - University of Illinois Urbana

Step 2: Accessing Input Files

Click dm3 Data

You should see this page.Click Import History.

Regulatory Genomics | Saurabh Sinha | 2018 7

Page 8: Regulatory Genomics Lab - University of Illinois Urbana

Step 3: Accessing Input Files

Click Import

You should see an imported history like the following.

Regulatory Genomics | Saurabh Sinha | 2018 8

Page 9: Regulatory Genomics Lab - University of Illinois Urbana

Step 4: Upload BIN ChIP Track to GalaxyClick “Get Data” on the left panel and then Upload File

Click “Choose local file”, indicated by black arrow in the picture below, and then upload our ChIP file:[course_directory]/06_Regulatory_Genomics/data/BIN_Fchip_s11_1000.gff

Set the Type to gff.

Set Genome to dm3.

Click Start

Regulatory Genomics | Saurabh Sinha | 2018 9

Page 10: Regulatory Genomics Lab - University of Illinois Urbana

Step 5: Sort ChIP Track By Score

Click on “Filter and Sort” and Sort.

Under Sort Dataset, select our ChIP track.

Under on column, select column: 6.

Under with flavor, select Numerical sort.

Under everything in, select Descending order.

Click Execute.

Regulatory Genomics | Saurabh Sinha | 2018 10

Page 11: Regulatory Genomics Lab - University of Illinois Urbana

Step 6: Obtain Top 100 ChIP Regions

Click on "Text Manipulation" and Select First.

Under Select first, enter 100 lines.

Under from, select our sorted ChIP data.

Click Execute.

Regulatory Genomics | Saurabh Sinha | 2018 11

Page 12: Regulatory Genomics Lab - University of Illinois Urbana

Step 7: Extract DNA of Top 100 ChIP Regions

Click on Fetch Sequences.

Click on Extract Genomic DNA.

Under Fetch sequences for intervals in select our top 100 ChIP regions.

Set Interpret features when possible to No.

Set Source for Genomic Data to History anduse dm3.fasta file as reference.

Set Output data type to FASTA.

Click Execute.

Regulatory Genomics | Saurabh Sinha | 2018 12

Page 13: Regulatory Genomics Lab - University of Illinois Urbana

Step 8: Download The Data

When finished, click on to download the file toour desktop.

This has already been done for you.

The resulting sequence is in the following file:

[course_directory]/06_Regulatory_Genomics/data/BIN_top_100.fasta

Regulatory Genomics | Saurabh Sinha | 2018 13

Page 14: Regulatory Genomics Lab - University of Illinois Urbana

Step 9: Submit to MEME

In this step, we will submit the sequences to MEME

Go to the following address:

http://meme-suite.org/tools/meme

Upload your sequences file here

Enter your email address here.

Leave other parameters as default.

Click “Start Search”.

Regulatory Genomics | Saurabh Sinha | 2018 14

DO NOT RUN THIS NOW. MEME TAKES A VERY LONG TIME.

Page 15: Regulatory Genomics Lab - University of Illinois Urbana

Step 9A: Analyzing MEME Results

Go to the following web address:

The webpage contains a summary of MEME’s findings.

It is also available on the results directory:

[course_directory]/06_Regulatory_Genomics/results/MEME.html

Let’s investigate the top hit.

Regulatory Genomics | Saurabh Sinha | 2018 15

Page 16: Regulatory Genomics Lab - University of Illinois Urbana

Step 9B: Analyzing MEME Results

To the right is a LOGO of our predicted motif, showing the per position relative abundance of each nucleotide

At the bottom are the aligned regions in each of our sequences that helped produce this motif. As the p-value increases (becomes less significant) matches show greater divergence from our LOGO.

Regulatory Genomics | Saurabh Sinha | 2018 16

Page 17: Regulatory Genomics Lab - University of Illinois Urbana

Step 9C: Analyzing MEME Results

Other predicted motifs do not seem as plausible.

Regulatory Genomics | Saurabh Sinha | 2018 17

Page 18: Regulatory Genomics Lab - University of Illinois Urbana

Step 10A: Comparison with Experimentally Validated Motif for BIN

FlyFactorSurvey is a database of TF motifs in Drosophila Melanogaster.

Go to the following link to view the motif for BIN:

http://pgfe.umassmed.edu/ffs/TFdetails.php?FlybaseID=FBgn0045759

Regulatory Genomics | Saurabh Sinha | 2018 18

Page 19: Regulatory Genomics Lab - University of Illinois Urbana

Step 10B: Comparison with Experimentally Validated Motif for BIN

Actual BIN Motif

Regulatory Genomics | Saurabh Sinha | 2018 19

There is strong agreement between the actual motif and the reverse complement of MEME’s best motif. This indicates MEME was actually able to find the motif from the top 100 ChIP regions for this TF.

Best MEME Motif

Best MEME MotifReverse Complemented

Page 20: Regulatory Genomics Lab - University of Illinois Urbana

Gene Set Enrichment AnalysisIn this exercise, we will extract the nearby genes for each one of the ChIP peaks for BIN.

We will then subject the nearby genes to enrichment analysis tests on various Gene Ontology gene sets utilizing DAVID.

Regulatory Genomics | Saurabh Sinha | 2018 20

Page 21: Regulatory Genomics Lab - University of Illinois Urbana

Step 11A: Acquire Nearby Genes

In this step, we will acquire all genes in Drosophila Melanogaster using UCSC Main Table Browser:https://genome.ucsc.edu/

Regulatory Genomics | Saurabh Sinha | 2018 21

Page 22: Regulatory Genomics Lab - University of Illinois Urbana

Step 11B: Acquire Nearby GenesEnsure the following settings are configured.

Click get output and then get BED.

Regulatory Genomics | Saurabh Sinha | 2018 22

Page 23: Regulatory Genomics Lab - University of Illinois Urbana

Step 11C: Acquire Nearby GenesGo back to Galaxy ServerClick Get Data and then Upload File

Click Choose local file and then upload our gene file:[course_directory]/06_Regulatory_Genomics/results/flygenes.bed

Set the Type to bed.

Set Genome to dm3.

Click Start

Regulatory Genomics | Saurabh Sinha | 2018 23

Page 24: Regulatory Genomics Lab - University of Illinois Urbana

Step 11D: Acquire Nearby Genes

Select Operate on Genomic Intervals

Then Select Fetch Closest non-overlapping interval feature.

Regulatory Genomics | Saurabh Sinha | 2018 24

Page 25: Regulatory Genomics Lab - University of Illinois Urbana

Step 11E: Acquire Nearby Genes

For For every interval feature in select our original ChIP track.

For Fetch closest features from select the UCSC genes track we just downloaded.

Click Execute

Regulatory Genomics | Saurabh Sinha | 2018 25

Page 26: Regulatory Genomics Lab - University of Illinois Urbana

Step 12A: Cut Out Genes

The resulting file has the list of nearby genes in CG format in the 12th

column.

We are only interested in the genes, so we need to cut them out using the CUT tool.

Under Text Manipulation click Cut

Regulatory Genomics | Saurabh Sinha | 2018 26

Page 27: Regulatory Genomics Lab - University of Illinois Urbana

Step 12B: Cut Out Genes

For Cut Columns type c12 to denote column 12.

Under Delimited By select Tab

Under From select the track we just generated: the intersection of the ChIP-peaks and Fly Base genes.

Click Execute.

Regulatory Genomics | Saurabh Sinha | 2018 27

Page 28: Regulatory Genomics Lab - University of Illinois Urbana

Step 12C: Download The Data

When finished, click on to download the file to our desktop.

This has already been done for you.

The resulting sequence is in the following file:

[course_directory]/06_Regulatory_Genomics/results/cg_transcript.txt

Regulatory Genomics | Saurabh Sinha | 2018 28

Page 29: Regulatory Genomics Lab - University of Illinois Urbana

Step 13A: Convert IDs

The enrichment tool we will use doesn’t accept genes in this format.

We will use the FlyBase ID converter to convert these transcript ids into FlyBase transcript ids.

Regulatory Genomics | Saurabh Sinha | 2018 29

Page 30: Regulatory Genomics Lab - University of Illinois Urbana

Step 13B: Convert IDs

Regulatory Genomics | Saurabh Sinha | 2018 30

Go to http://flybase.org/static_pages/downloads/IDConv.html

Upload our cg_transcript.txt file and hit Go.

On the next page, click file, uniq IDs only to download the file of converted IDs.

Page 31: Regulatory Genomics Lab - University of Illinois Urbana

Step 14A: Gene Set Enrichment - DAVID

Move the resulting file from the previous analysis to the course directory and rename it: (This has already been done for you.)

[course_directory]/06_Regulatory_Genomics/results/fb_transcripts.txt

With our correct ids of transcripts of genes near ChIP peaks, we now wish to perform a gene set enrichment analysis on various gene sets.

A tool that allows us to do this from a web interface is DAVID located at the following address:

https://david-d.ncifcrf.gov/summary.jsp

Regulatory Genomics | Saurabh Sinha | 2018 31

Page 32: Regulatory Genomics Lab - University of Illinois Urbana

Step 14B: Gene Set Enrichment - DAVID

We will perform a Gene Set Enrichment Analysis on our transcript list (gene list) and see what GO categories we are significantly enriched in.

Analyze the gene list with Functional Annotation Tool

Click Choose File on select our fb_transcripts.txt file.

Under Select Identifier select FLYBASE_TRANSCRIPT_ID.

Under Step 3: List Type check Gene List.

Click Submit List.

Regulatory Genomics | Saurabh Sinha | 2018 32

Page 33: Regulatory Genomics Lab - University of Illinois Urbana

Step 14C: Gene Set Enrichment - DAVID

On the next page, select Functional Annotation Chart.

Our gene set seems to be enriched in the BP_FAT GO category!

This is consistent with the activity of the BIN transcription factor in the literature.

Regulatory Genomics | Saurabh Sinha | 2018 33