deep motif dashboard: visualizing and understanding ... · demo dashboard - lanchantin, singh,...

Post on 24-Aug-2020

12 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Deep Motif Dashboard: Visualizing and Understanding Genomic Sequences Using Deep

Neural Networks

Jack Lanchantin, Ritambhara Singh, Beilun Wang, Yanjun Qi

1

University of Virginia, Department of Computer Science

deepmotif.org

2University of Virginia

“Dog”

DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

3University of Virginia

“Dog”

DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

4

“Dog”

ATGCGATCAAGTCTG “Protein Binding Site”

DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

“Dog”

ATGCGATCAAGTCTG “Protein Binding Site”

5

ATGCGATCAAGTCTG “Protein Binding Site”

6DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

???

ATGCGATCAAGTCTG “Protein Binding Site”

Deep Motif Dashboard: Opening the black box for deep-learning based genomic sequence classifications

7DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

8DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

Deep Motif Dashboard: Opening the black box for deep-learning based genomic sequence classifications

ATGCGATCAAGTCTG “Protein Binding Site”

IntroductionTFBS Classification TaskNeural ModelsVisualization MethodsEvaluation and Results

9DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

GCGACGAATCG AACGATATGCT CATATCATTTC TGTCAAG CTCGAGTC TATCAAGCTG

Transcription Factor Binding Sites (TFBSs)

TF1 TF1TF2 TF3

10

TF2TF3

DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

AACGATATGCT

TGTCAAGCAAG

ATATCGATATA

AGCATATGCGA

TF1 ( )

TFBS Classification Datasets

11

GCGACGAATCG

CTCGAGTCTCA

CGATATGCTTC

AAGAAGCATTA

CATATCATTTC

TATCAAGCTGG

CGAATGCATAC

ACGACGATTAT

TF2 ( )

TF3 ( )

DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

Deep Motif (DeMo) Dashboard Approach

GAAGCTTGTACGCTATGGACTCGATCGAATCGCATGTCATGAGATCATGCTTCATCTCTCGATCGAATCGCATATGTGTCAACTATGCTCTCGAA

1.TFBS

NO TFBSTFBSTFBS

NO TFBS

12DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

Deep Motif (DeMo) Dashboard Approach

GAAGCTTGTACGCTATGGACTCGATCGAATCGCATGTCATGAGATCATGCTTCATCTCTCGATCGAATCGCATATGTGTCAACTATGCTCTCGAA

1.

2.

TFBSNO TFBS

TFBSTFBS

NO TFBS

13DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

IntroductionTFBS Classification TaskNeural ModelsVisualization MethodsEvaluation and Results

14DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

1. Convolutional (CNN)2. Recurrent (RNN)

3. Convolutional-Recurrent (CNN-RNN)

Neural Network Models

15DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

3 Neural Models

16

Input SequenceProbability ofBinding Site

1. Convolutional (CNN) (short local patterns, or motifs)

3. Convolutional- Recurrent (CNN-RNN)

DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

2. Recurrent (RNN) (long term dependencies)

(long term dependencies among motifs)

IntroductionTFBS Classification TaskNeural ModelsVisualization MethodsEvaluation and Results

17DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

1. Saliency Maps2. Temporal Output Values

3. Class Optimization

Visualization Methods

18DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

1. Saliency Map

Which nucleotides are most important for classification?

positive binding siteX S+

19DeMo Dashboard - Lanchantin, Singh, Wang, & Qi

1. Saliency Map

positive binding siteX S+

20

= “saliency map”

DeMo Dashboard - Lanchantin, Singh, Wang, & Qi

1. Saliency Map

Positive sentimentX S+

21

This movie has one of the best plots I have seen

= important for classification

This movie has one of the best plots I have seen

DeMo Dashboard - Lanchantin, Singh, Wang, & Qi

1. Saliency Map

Positive Test Sequence

Saliency Map

= important nucleotide for prediction

positive binding siteX S+

22DeMo Dashboard - Lanchantin, Singh, Wang, & Qi

2. Temporal Output Values

What are the model’s predictions at each timestep of the DNA sequence?

positive binding siteX S+

23DeMo Dashboard - Lanchantin, Singh, Wang, & Qi

2. Temporal Output Values

Check the RNN’s prediction scores when we vary the input of the RNN starting from the beginning to the end of a sequence.

positive binding siteX S+

24DeMo Dashboard - Lanchantin, Singh, Wang, & Qi

I don’t like the actors, but I really enjoyed this movie

2. Temporal Output Values

positive sentimentXS+

I don’t like the actors, but I really enjoyed this movie

= negative sentiment = positive sentiment25

DeMo Dashboard - Lanchantin, Singh, Wang, & Qi

2. Temporal Output Values

positive binding siteX S+

Positive Test Sequence

RNN Forward Output

RNN Backward Output

= negative binding site prediction = positive binding site prediction 26DeMo Dashboard - Lanchantin, Singh, Wang, & Qi

1. Saliency Maps2. Temporal Output Values

3. Class Optimization

Visualization Methods

27

SequenceSpecific

TF Specific

DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

3. Class Optimization

For a particular TF, what does the optimal binding site sequence look like?

? positive binding site for TF “CBX3”

28DeMo Dashboard - Lanchantin, Singh, Wang, & Qi

3. Class Optimization

positive binding site for TF “CBX3”

Where X is the input sequence and the score S+ is probability of sequence X being a positive binding site

29DeMo Dashboard - Lanchantin, Singh, Wang, & Qi

Optimal binding site for TF “CBX3”

3. Class Optimization

positive binding site for TF “CBX3”

30DeMo Dashboard - Lanchantin, Singh, Wang, & Qi

IntroductionTFBS Classification TaskNeural ModelsVisualization MethodsEvaluation and Results

31DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

Experimental Setup

Dataset● Alipanahi et al. “Predicting the sequence specificities of DNA- and

RNA-binding proteins by deep learning”. Nature Biotechnology 2015.● 108 cancer cell TFs (train separate model for each TF)● Each sequence is 101-length centered around ChIP-seq peak

Models● Test several variations of 3 different models (CNN, RNN, CNN-RNN)

Evaluation● Compare models using AUC scores on test set● Evaluate visualization methods manually and by motif matching

32DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

Model Accuracy (AUC Scores)

Our Models33

DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

1. Saliency Maps

34DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

= important nucleotide for prediction

GATA1

2. Temporal Output Values

35DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

= positive binding site prediction

= negative binding site prediction

NFYB

Saliency Map AND Temporal Output Values

36DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

NFYB

3. Class Optimization

37DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

GATA1

DeMo Dashboard

38DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

Deep Motif (DeMo) Dashboard Contributions and Results

1. Comparative analysis of 3 different neural models on TFBS task

● CNN-RNNs perform the best

2. Presented 3 different visualization techniques to understand the predictions of neural models

● Although TFBSs are influenced by motifs, the interactions among motifs are also important

39DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

Thank You!

40

UVA Machine Learning and Biomedicine Group

Ritambhara Singh Beilun Wang Dr. Yanjun Qi

code available at: deepmotif.org

DeMo Dashboard - Lanchantin, Singh, Wang, & Qi University of Virginia

top related