algorithmic analysis of human dna replication timing from discrete micro array data
TRANSCRIPT
8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data
http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 1/33
Algorithmic Analysis of Human
DNA Replication Timing from
Discrete Microarray Data
Prof. Rushen Chahal
8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data
http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 2/33
Thesis Statement
The DNA replication timing profile can be
reconstructed efficiently and accurately
from discrete time points.
(Glossary)
Prof. Rushen Chahal
8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data
http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 3/33
Presentation Outline
Biology background
Microarray technology
Experimental data
± Challenges
Algorithms
Research Plans
± Replication timing
± Origins
± Scale upProf. Rushen Chahal
8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data
http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 4/33
Natural Science
± DNA is the blueprint for organisms
It must be passed on (organism, cell)
Engineering
± Gene therapy
Insertion, deletion, modification
± Cancer is unchecked replication
Why Study DNA Replication?
Prof. Rushen Chahal
8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data
http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 5/33
... A G G T C G A C A C ...
... T C C A G C T G T G ...
Human genome > 3 billion bp
Replication rate ~ 1000 bp/min
Serial replication 5.7 years
6 to 10 hours (speedup > 5000)
Prof. Rushen Chahal
8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data
http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 6/33
Background
Prokaryotes
± E. Coli
DnaA binds to oriC
Eukaryotes ± ORC ± S. Cerevisiae (yeast)
ARS 11 bp consensus
± Mapping of origins
± Human
No known consensus
Few origins characterized
Prof. Rushen Chahal
8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data
http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 7/33
ATGGACTACGGATCAGTAAATCGATTAGGCACCAGATCAAGTACGATCCAGAGTACATAGCATACCATGACTAGA
TACCTGATGCCTAGTCATTTAGCTAATCCGTGGTCTAGTTCATGCTAGGTCTCATGTATCGTATGGTACTGATCT
GAGTACATAGCATACCATGACTAGA
CTCATGTATCGTATGGTACTGATCT
Interrogation at genomic scale
± Large increase in data Microarray data analysis
Array of probes tiles genome
PM probe
Cross-hybridization ± Repeats not tiled
Gaps in genome
Genome Tiling Microarrays
GAGTACATAGCATACCATGACTAGA MM probe A
Prof. Rushen Chahal
8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data
http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 8/33
Image analysis computes intensity of each array probe
Prof. Rushen Chahal
8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data
http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 9/33
The Cell Cycle
Start of S-phase
(0 hour)
S-PhaseProf. Rushen Chahal
8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data
http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 10/33
Profiling DNA Replication Timing
Ideal: f(chr, bp) = rtime
Isolate DNA replicated in
discrete parts of S-phase
± One cell is not enough ± Synchronize S-phase entry
Apply drugs
Release together
± Synchronization error
± Label in two hour intervals
Allelic Variation
± mf(chr, bp) = {rtime1, rtime2, «}
Prof. Rushen Chahal
8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data
http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 11/33
Allelic Variation
Fluorescent in-situ Hybridization
(FISH)
± Replication timing at a given site
0hr
2hr
4hr
6hr
8hr
10hr
0hr
2hr
4hr
6hr
8hr
10hr
Temporally specific replication (TS)
Temporally non-specific replication (TNS)
11Prof. Rushen Chahal
8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data
http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 12/33
What is the Problem?
Reconstruct a continuous replication profile
± Temporally (time points)
± Spatially (probes)
from noisy data ± Biological experiments
± Synchronization error
± Microarray artifacts
efficiently ± Genomic data (> 3 billion bp)
Prof. Rushen Chahal
8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data
http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 13/33
Initial Analysis
Tiling Analysis Software (T AS)
± Wilcoxon Rank Sum test in sliding window
Assess enrichment of treatment over control
± Window slides to get p-value for each probe
O(kn) time complexity
± n = # probes on array
± k = # probes in a window
» k scales linearly with window size
Prof. Rushen Chahal
8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data
http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 14/33
New Analysis
Thesis Statement (revisited):
The DNA replication timing profile can be
reconstructed efficiently and accurately from
discrete time points. Incorporate information from all time points
± Continuous view of replication timing (TR50)
Address temporally non-specific replication Scale up to the whole genome efficiently
Prof. Rushen Chahal
8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data
http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 15/33
0 0 1/1 0 0
0 2 4 6 8 10
1/6 1/6 1/3 0 1/3
0 2 4 6 8 10
5
5
Allelic Variation Examples
TR50
TR50
Temporally specific replication
Temporally non-specific replication
Challenge: From distribution of array signal, determine replication category.
Prof. Rushen Chahal
8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data
http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 16/33
Temporal Specificity Algorithm
// Is there evidence that all alleles are replicating together?
If ( max sum of two adjacent time points 5 /6 * total sum)
then {probe is temporally specific}
// Is at least one allele replicating apart from the majority?
Else If ( max sum of two adjacent time points not including
the maximum time point 1 /3 * total sum)
then {probe is temporally non-specific}
// Isolated signal is not strong enough to be an allele.
Else
{probe is temporally specific}
Prof. Rushen Chahal
8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data
http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 17/33
Plotting TR50
8
6
4
2
T R 5 0 ( h
o ur s )
33 33.5 34
Chromosomal Position (in millions of bp)
Smoothed TR50 curve recovers replication pattern
Local minima Possible locations of replication origin
Prof. Rushen Chahal
8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data
http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 18/33
Segregation Algorithm
Sliding window passes over probes to generate intervals ± Ratio of TSP to TNSP determines temporal specificity
± Average TR50 determines timing category
Mid Late
Ratio 2-to-1 &
TNS Early
Ratio < 2-to-1
Ratio < 2-to-1
Avg > 3.93.4 Avg 3.9
Avg < 3.4
Avg < 3.4 Avg 3.9
Avg 3.4 Avg > 3.9
Ratio < 2-to-1
Prof. Rushen Chahal
8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data
http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 19/33
Research Plan: Profile Generation
No Signal Probes Segregation
Algorithm
(Sliding Window)
Probe Classification
(Temporal Specificity
Algorithm) &
TR50 Calculation
0-2hr
2-4hr
4-6hr
6-8hr
8-10hr
TNS Probes
TS Probes & TR50
Low Probe Density
TNS Regions
TS Regions
Join Intervals
Joined TNS Regions
Joined TS Regions
M
ask TS probeswith JTS RegionsTS Probes that fall into JTS Regions
TR50 Smoothing
Smoothed TR50Segregate JTS Regions into
1/3¶s based on STR50
Early
Mid
Late
Join
Intervals
Joined Early
JoinedMid
Joined Late
Parameters to evaluate:
± Segregation Algorithm: sliding window size, minimum probe density
± Join Intervals: minimum interval size
Prof. Rushen Chahal
8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data
http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 20/33
Evaluation
Concordance of biological phenomena
± Segregation intervals FISH
± STR50 local minima Other origin methods
± Correlation with other biological data Gene density Early replication
AT content Late replication
Gene expression Early replication
Activating acetylation/methylation Early replication
Performance on random data
± Large quantity of TNS replication
Prof. Rushen Chahal
8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data
http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 21/33
Research Plan: Replication Origins
Drive DNA replication pattern
Smoothed TR50 local minima
± Cleaned up with new profiles
Other biological assays ± Early labeling fragments
± Nascent strands
± Bubble trapping
± ORC binding
Prof. Rushen Chahal
8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data
http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 22/33
Approach and Evaluation
Correlation between methods
± Consensus sets
Motif analysis
± Positional attributes
Replication timing
Proximity to genes
Evaluation is difficult (few validated origins)
± Agreement between methods
± Testing proposed correlations
± Paper in preparation
Prof. Rushen Chahal
8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data
http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 23/33
Scaling Up to Whole Genome
Pilot 1% 100% of human genome
± Algorithms developed with scalability in mind
Incremental update sliding windows Linear time
Performance based evaluation ± If 100% data available
Profile multiple runs
± Else
Profile many 1% runs
Prof. Rushen Chahal
8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data
http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 24/33
Implementation Details
Java ± Class representation of proprietary microarray files
± Algorithms to process raw microarray data
± Diagnostic tools
Perl ± Scripts to process intermediate and final data
± Correlations, data transformation, quality assurance
R statistical language ± Smoothing, statistical plots, correlation studies
Shell scripts ± Automated processing of microarray sets
Prof. Rushen Chahal
8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data
http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 25/33
Current/Expected Contributions
Algorithms, Software Infrastructure, Analysis
Probe-by-probe TR50 analysis ± Temporal Specificity Algorithm
Combinatorial analysis of allele locations
Segregation Algorithm ± TNS, Early, Mid, Late replicating areas
Used to design validation experiments
Smoothed TR50 profile
± Local minima provide candidate origin set
Linear algorithms enable scale up
Randomness testing
Prof. Rushen Chahal
8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data
http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 26/33
PublicationsC ompleted:
ENCODE Project Consortium. The ENCODE(ENCyclopedia Of DNA Elements) Project. Sci enc e.2004 Oct 22; 306(5696):636-40.
ENCODE Project Consortium. Identification andanalysis of functional elements in 1% of the humangenome by the ENCODE pilot project. Nature.
{In Press, to appear in June 14, 2007 issue} Karnani N., Taylor C., Malhotra A., Dutta A. Pan-Sreplication patterns and chromosomal domainsdefined by genome tiling arrays of encode genomicareas. Genome Resear c h.{In Press, to appear in June 2007 issue}
UCSC Browser Tracks:TR50, Smoothed TR50, Local Minima, Segregation
In Progress:
Multi-million dollar NIH grant for scale up to fullhuman genome
Paper detailing origin methods, correlations, etc.
Prof. Rushen Chahal
8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data
http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 27/33
Why is this work computer science?
Fred Brooks: The Computer Scientist as Toolsmith II
± ³H itching our research to someone else¶s driving problems, and
solving those problems on the owners¶ terms, leads us to richer
computer science research.´
Not an incremental improvement
± Algorithmic techniques and analysis used to solve a problem
previously addressed inadequately with a statistical approach that
performed poorly
Collaboration outside of engineering disciplines enhancesvisibility, funding opportunities, and demand for CS work
Developed algorithms, time complexity analysis,
combinatorial analysis, feedback to experimental design
Prof. Rushen Chahal
8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data
http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 28/33
Will this work lead to any CS
publications?
The Nature article focused on analysis of the
biological data and includes descriptions of
some of my algorithms
The Genome Research paper and origins paper will also contain writeups of my algorithms and
analysis techniques
The Pacific Symposium on Biocomputing
focuses on algorithms and computational
techniques
Prof. Rushen Chahal
8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data
http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 29/33
Isn't your approach too simple?
The approach isn¶t simple:
± Combinatorial analysis
± Temporal specificity algorithm (many iterations)
± Probewise computation to deal with binding affinity
± Incremental updating sliding windows
Cross-hybridiztion
Synchronization error
± Smoothing Parameterization
± Linear algorithms for scale up
Prof. Rushen Chahal
8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data
http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 30/33
Can't your algorithm be replaced by
a well-known statistical method?
HMM¶s were used for segregation of intervals
± Performed poorly in comparison to my algorithm
Less accurate categorization of replication intervals
Prone to rapid oscillation, producing tiny intervals Parameterization was difficult
Lowess smoothing is a statistical method
± Parameterization was not easy
Prof. Rushen Chahal
8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data
http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 31/33
What are the biggest challenges in
this work?
Noise!
± The data to analyze comes from biological experiments with
several sources of noise that compound upon one another
Biology
± I haven¶t had a course in biology since 10th grade
Microarrays
± New, evolving technology we¶re still learning to deal with
Data size
± Hundreds of GB of data to process
± Replicates, failed experiments
± Algorithms must be efficient
Prof. Rushen Chahal
8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data
http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 32/33
What kind of career are you aiming
for after graduation, and why?
Teaching Computer Science (Small College)
± I enjoyed learning in my undergraduate curriculum with
meaningful interactions with professors
± I taught Discrete Math at UVa in Fall ¶02 and Spring ¶03 Enjoyable, but 60-70 students too large
Post-doctoral (Biological Computing)
± Many opportunities around the world
± Further exploration of the field
Prof. Rushen Chahal
8/3/2019 Algorithmic Analysis of Human DNA Replication Timing From Discrete Micro Array Data
http://slidepdf.com/reader/full/algorithmic-analysis-of-human-dna-replication-timing-from-discrete-micro-array 33/33
How will you know when your
work/thesis is done?
Research is never really done, but you have to
declare victory at some point
The replication profiling algorithms I¶ve developed
already perform quite well ± I have concrete plans to improve and finalize them
Prof. Rushen Chahal