2016 10-27 timbers

24
Combining phenome and genome to uncover the genetic basis for naturally occurring differences in development and behavior CSHL Biological Data Science Meeting, 2016 Tiffany Timbers, Ph.D. Dept. of Statistics & Master’s of Data Science Program, University of British Columbia @TiffanyTimbers #BIODATA16 1

Upload: tiffany-timbers

Post on 07-Jan-2017

325 views

Category:

Science


0 download

TRANSCRIPT

Combining phenome and genome to uncover the genetic basis for naturally occurring differences in development

and behavior

CSHL Biological Data Science Meeting, 2016

Tiffany Timbers, Ph.D.

Dept. of Statistics & Master’s of Data Science Program, University of British Columbia

@TiffanyTimbers #BIODATA161

What is the genetic basis of individuality?

2

What is the genetic basis of individuality? (behaviour & development)

3

Potential way to answer:

Combine genome & phenome

4

Caenorhabditis elegans as a model in the lab

• Easy to work with and store in lab

• Easy genetics/genome

• Complete neuronal wiring diagram

• Exhibits many well characterized behaviours

5

Caenorhabditis elegans also is a wild animal!

individuals can shift during the L1 stage to analternative developmental route and enter a pre-dauer stage (L2d), followed by the non-feedingdiapause stage called dauer (an alternative L3stage) (Figure 3). Dauer larvae are resistant tovarious stresses and can survive for several monthswithout food. Upon their return to more favorableconditions, dauer larvae feed again and resumedevelopment.

Population demographic surveys at the localscale in orchards and woods indicate that C.elegans has a boom-and-bust lifestyle (Felix andDuveau, 2012). C. elegansmetapopulations evolvein a fluctuating environment where optimal habitatsare randomly distributed in space and time(Figure 4A). A cycle of colonization of a foodsource likely begins when one to several dauerlarvae discover a fruit or stem, exit the dauer stageand seed a growing population of up to 104

feeding nematodes at different life-cycle stages(Figure 3). Some moderate-sized populationsfound in rotting fruits and stems do not containany dauer larvae, but larger ones always includeonly adults, L1, L2d and dauers (Felix and Duveau,2012). As a food source runs low, dauers may leave

it to explore the neighboring environment for newislands of resources. Most of them will fail.

Developmental regulation and the behavior ofdauer larvae are central to the C. eleganslifestyle. Dauer larvae display active locomotionand a specific behavior called nictation, wherethey stand on their tail and wave their body inthe air. Remarkably, dauers may also congregateto form a column and nictate as a group(Felix and Duveau, 2012) (see the video;http://www.wormatlas.org/dauer/behavior/Images/DBehaviorVID4.mov). These behaviorsare thought to help dauers to find passinginvertebrate hosts that they can use for theirdispersal, such as isopods, snails and slugs.Together, dauer physiology and behavior sug-gest that this developmental stage plays a keyrole in C. elegans’ stress resistance, long-distance dispersal, and possibly its overwinteringcapacity.

Over the year, in surveys performed in Franceand Germany, C. elegans populations in rottingfruits typically peak in the fall, with proliferationpossible in spring through to early winter (Felixand Duveau, 2012; Petersen et al., 2014). This

Figure 2. The habitat of C. elegans at different scales. (A–D) Landscapes that correspond to the macroscale C. elegans habitat; all are relatively humid

areas where C. elegans has been found: (A) wet shrubland; (B) urban garden; (C) riverbank; and (D) fruit trees. (E–G) Bacteria-rich decomposing vegetal

substrates, corresponding to the microscale C. elegans habitat: (E) Arum stem; (F) oranges and (G) plums. (H) Detail of a rotting apple at the stage where

C. elegans proliferates. Springtails (white) and a mite are examples of animals that share the bacteria-rich habitat of C. elegans and that are potential

carriers and/or predators (see also Table 1). (I) C. elegans nematodes on an E. coli lawn, just coming out of a rotten fruit. (J) Scanning electron micrograph

of C. elegans infected with the fungus Drechmeria coniospora. Image credits: Marie-Anne Felix.

DOI: 10.7554/eLife.05849.003

Frezal and Felix. eLife 2015;4:e05849. DOI: 10.7554/eLife.05849 3 of 14

Feature article The natural history of model organisms | C. elegans outside the Petri dish

Frezal & Felix, eLife, 2015

6

−50

0

50

−100 0 100 200long

lat

~ 40 genetically diverse wild-isolate C. elegans strains

C. Loucks

•Genomes sequenced to > 25X •630 541 unique single nucleotide variants (SNVs)•65 360 missense, 1015 nonsense and 545 splice mutations•14 602 genes have at least one non-synonymous or splice mutation

7

• CO2 may serve as an important predictor of food, mates, and/or predators.

C. elegans can sense, and generally avoid CO2

0 sec 100 sec

4 psi

750 msec

300 sec

CO2

rapid burstof turns

normalmovement

8

stimulus delivery

Swierczek & Giles et al. Nature Methods 2011

image extraction

post-experimentanalysis

High-content behavioural and morphological screening using the Multi-Worm Tracker

CO2

10

The Multi-worm tracker records many features as a time series (~ 25 frames/sec):

• area • speed • angular speed • length • width • kink • direction bias • path length

• curve • path length • direction

consistency • x-y coordinates • orientation • sideways rolling

speed

11

C. elegans wild-isolates exhibit a wide variety of

polymorphisms

12

13

14

Polymorphism in locomotion (distance travelled) 30s after CO2 stimulus

15

Are we looking at all the important aspects of the phenotype?

How would do we know if we are?

The behavioural phenome is very large, how do we focus without losing potentially very important

information?

16

A machine learning approach: Iterative denoising trees (IDT)

to reduce dimensionality of time series behaviour data

17

- Discovery of potentially meaningful relationships and structures in large heterogeneous datasets

- Originally demonstrated for use in text mining to identify meaningful, implicit, and previously unknown information in an unstructured corpus (Giles et al., 2008)

- Used successfully to reduce dimensionality of behavioral time series from Drosophila larvae (Vogelstein et al., Science, 2013)

A machine learning approach: Iterative denoising trees (IDT)

to reduce dimensionality of time series behaviour data

18

|worm1|strain1|val1 @ time1 |...|val1 @ timei|...|valj @ time1|valj @ timei| ...|wormn|strain1|val1 @ time1 |...|val1 @ timei|...|valj @ time1|valj @ timei|...|worm1|strainm|val1 @ time1 |...|val1 @ timei|...|valj @ time1|valj @ timei|...|wormn|strainm|val1 @ time1 |...|val1 @ timei|...|valj @ time1|valj @ timei|

7800 X 9600

A machine learning approach: Iterative denoising trees (IDT)

to reduce dimensionality of time series behaviour data

dist

ingu

isha

ble

CO

2 beh

avio

r-typ

es

123456789

1011121314151617

• CO2 responses from ~40 wild isolates may results in ~20 distinct sub-behaviours

• Groupings still need to be optimized & verified (work in progress)

19

Use the behavior-tree to derive phenotypic profiles for

each strain

strain behav1 behav2 behav3 behav4 behav5 … behavk

strain1 0.85 0.05 0.02 0.04 0.03 … 0.01strain2 0.05 0.02 0.04 0.03 0.01 … 0.85strain3 0.90 0.01 0.02 0.00 0.01 0.01

… … … … … … … …strainm 0.02 0.00 0.01 0.2 0.00 … 0.90

total1.01.01.0…1.0

20

covariants(e.g. age, sex)

variant sets(e.g. genes)

Linear model:

null model:

disease/phenotype

Wu et al., Am. J. Hum. Genet., 2011

Sequence kernel association test (SKAT) (think GWAS for NGS data)

• Whole-genome sequencing can detect rare variants

• Group variants into genes/windows and tests for association

• Variants can be assigned weights: f(Gi)

• Can easily obtain a p-value for each gene, which will need to be adjusted for multiple comparisons

SKAT is a successful technique to identify causal genes for single phenotypes in C. elegans (Timbers et al., PLoS Genetics, 2016)

21

Multivariate Rare-Variant Association Test to derive a list of candidate genes driving CO2 behavior-types

Ho: ß = 0

MURAT - a multi-variate SKAT method

Sun et al., Eur. J. Hum. Genet. 2016

22

Summary of a work in progress:

1. IDT to reduce dimensionality of time series behaviour data

2. Perform MURAT rare-variant analysis to identify candidate genes/regions

3. Confirm roles of candidates using standard genetic and cell biology methods

4. Crispr-swap of variants between wild-isolate strains

https://github.com/LerouxLab/Celegans_wild_isolate_behaviour23

Simon Fraser UniversityMichel Leroux Catrina Loucks**Jinko Graham & SFU Statistical Genetics Working Group University of British Columbia Don Moerman (UBC) Stephane Flibotte (UBC) Mark Edgley (UBC)

@TiffanyTimbers #BIODATA16

UBC MDS is hiring a

Data ScienceTeaching Fellow!

24