heshan lin: accelerating short read mapping, local realignment, and a discovery on a graphics...
TRANSCRIPT
synergy.cs.vt.edu
Accelerating Sequence Analysis on Graphics Processing Unit (GPU)
Wu Feng and Heshan Lin
Department of Computer Science
synergy.cs.vt.edu
NGS Democratizing DNA Sequencing
Source: www.genome.gov
Sequencing available to the masses in the near future
synergy.cs.vt.edu
Bottleneck Shift -> Computation
ChIP-Seq …Transcriptome
Sequencing
BIG Data
Complete
Genome Re-
sequencing
Metagenomics
synergy.cs.vt.edu
Graphics Processing Unit (GPU)
Graphics & gaming -> general purpose computing
Ubiquitously available: Desktop, laptop, iPad
synergy.cs.vt.edu
“Personalized Supercomputer”
• 10x > CPU
• 512 cores
• 10^12 flops
• On par with power of a
supercomputer in 2004
synergy.cs.vt.edu
Traditional CPU Cores
Control
(Fetch / Decode)
ALU
Execution
Context
(Registers)
Out-of-order Control Logic
Branch Predictor
Memory Prefecter
Data Cache
Courtesy to K. Fatahalian
Optimized for single thread
synergy.cs.vt.edu
Power Density will Increase
4004 8008
8080
8085
8086
286 386 486 Pentium®
1
10
100
1000
10000
1970 1980 1990 2000 2010
Power D
ensity (W
/cm2)
Hot Plate
Nuclear
Reactor
Rocket
Nozzle
Power densities too high to keep junctions at low temps
Source: Borkar, De Intelâ
Sun’ s
Surface
Source: Borkar, De Intel
synergy.cs.vt.edu
GPU: Optimized for Throughput
Use much simpler cores
Use vectorization to replicate simple cores
Control
(Fetch / Decode)
ALU
Execution
Context
(Registers)
Control
(Fetch / Decode)
ALU
Execution
Context
(Registers)
Execution
Context
(Registers)
ALU
ALU
Execution
Context
(Registers)
Execution
Context
(Registers)
ALU
ALU
Execution
Context
(Registers)
Execution
Context
(Registers)
ALU
ALU
Execution
Context
(Registers)
Execution
Context
(Registers)
ALU
ALU
Execution
Context
(Registers)
Execution
Context
(Registers)
ALU
Shared Execution Context
Courtesy to K. Fatahalian
synergy.cs.vt.edu
Take with a Grain of Salt
Raw Compute Power != Application Performance Not all applications are suitable for GPUs
Developing fully optimized codes on GPU is non-trivial and requires computational rethinking A GPU core is MUCH SLOWER than a CPU core
Need a lot of parallelism to hide memory latency
Reduce branching as much as possible
Think about an army of synchronized snails
synergy.cs.vt.edu
GPU Potential for Sequence Alignment
Why sequence alignment? Fundamental in sequence analysis
Computationally intensive
Preliminary study
Algorithm Description Speedup
RMAP Short read mapping 10x
Smith Waterman Optimal Sequence alignment 30x
BLASTP Sequence database search 6.5x
Indel realigner Locally realign mismatched reads On going
synergy.cs.vt.edu
Lessons Learnt
CPU optimized code may be difficult to accelerate on GPUs BLASTP 6.5x vs. Smith Waterman 30x
Require rethinking of algorithm design Scalable but less optimal algorithm is better
Example: RMAP Originally uses hash table to find the match (O(n))
Switched to a slower binary search algorithm (O(nlogn))
synergy.cs.vt.edu
Opportunities
Smith Waterman
Needleman-Wunsch
BWA
BLAST
Bowtie
Next-gen
Algorithm?
Accuracy
Tim
e
synergy.cs.vt.edu
Compute the Cure Initiative
Partnership between NVIDIA and VT
Goal: Leverage GPU power to fight cancer
Current focus: GPU accelerated sequence alignment framework
http://www.nvidia.com/object/compute-the-cure.html
synergy.cs.vt.edu
Conclusion
Democratizing DNA sequencing requires more accessible HPC resources
GPUs present both opportunities and challenges Initial results are promising
For more information Synergy website – http://synergy.cs.vt.edu