accelerating spark genome sequencing in cloud—a data driven approach, case studies and beyond:...

ACCELERATING SPARK GENOME

SEQUENCING IN CLOUD – A DATA DRIVEN

APPROACH, CASE STUDIES AND BEYOND

Yingqi (Lucy) Lu

Mulugeta Mammo

Eric Kaczmarek

Intel Corporation

Legal Disclaimer

2

• Intel technologies’ features and benefits depend on system configuration and may require enabled hardware,

software or service activation. Learn more at intel.com, or from the OEM or retailer.

• No computer system can be absolutely secure.

• Tests document performance of components on a particular test, in specific systems. Differences in hardware,

software, or configuration will affect actual performance. Consult other sources of information to evaluate

performance as you consider your purchase. For more complete information about performance and benchmark

results, visit http://www.intel.com/performance.

Intel, the Intel logo, Xeon, Xeon phi, Lake Crest, etc. are trademarks of Intel Corporation in the U.S. and/or other

countries.

*Other names and brands may be claimed as the property of others.© 2017 Intel Corporation

http://www.intel.com/performance

Spark Deployment Is Moving to Cloud

Cloud

On- premises

3


Cloud

On- premise

+ Quick deployment

+ Elasticity

+ Manageability/Maintenance

4


Cloud

On- premise

- Don’t expect similar performance

- Limited perf counters available

- Need to re-profile and retune your

application

5

Cloud vs. On-Premises

6

“Do I need 10 instances with 2 cores per instance and

network attached storage or a single instance with 20 cores

and attached storage”

Cloud vs. On-Premises

7

“Do I need 10 instances with 2 cores per instance and

network attached storage or a single instance with 20 cores

and attached storage”

It depends.

The performance of your application

in a Cloud environment will be

directly affected by your resource

partitioning.

Compute vs. IO

8

Setup #1

36 cores

9 storage disks

Setup #2

12 cores

9 storage disks

Setup #3

15 cores

9 storage disks

A Spark Application

CPU cycles spent

waiting on IO

computation wasted

CPU fully utilized

IO under utilized

Storage wasted

CPU fully utilized

IO fully utilized

Best ROI

Run on

Pay attention to IO vs. Core ratio

9

Starting from on-premises baseline, profiling

Spark Application and Java Virtual Machine– Hot functions

– Locking contentions

– Java garbage collection

Partition Resources in the Cloud

10

Partition Resources in the Cloud

Starting from on-premises baseline, profiling

Spark Application and Java Virtual Machine– Hot functions

– Locking contentions

– Java Garbage collection

*System– Processor

– Network and Storage

– Memory

* Be conscious on available tools and counters, not everything would actually work

Case Study – Genome Analysis Toolkit

Structured programming framework designed to enable rapid

development of efficient and robust analysis tools for next-

generation DNA sequencers

– Industry standard for analyzing/sequencing human genome data

– Developed by the Broad Institute of MIT and Harvard

11

Profile Application and Java VM

Java Flight Recorder

− Ships with Oracle JDK

− Thread lock contention

− Hot functions

− Garbage collection

12

Hot function

Lock contention Garbage collection

Lock Contention Example

13

• Spark application using SynchronizedMap resulting in heavy lock

contention (50+% of time spent waiting on lock)

• Replacing SynchornizedMap with ConcurentHashMap improved

performance by 3.5x

Uncover a Scala Scalability Issue

14

• The problem resides in Scala APIs is caused by highly concurrent

Instanceof calls from Java VM

• The problem gets exacerbated with increasing # of threads inside

Java VM

Scala API Fix

15

• Use polymorphism instead of instanceof!

• 1.6x performance improvement in the critical stage and 1.3x across

the entire workload.

• Code changes released in Scala 2.12.0

• https://issues.scala-lang.org/browse/SI-9823

https://issues.scala-lang.org/browse/SI-9823

Beyond Scala and Spark

16

• Scalability issue with Instanceof impacts other Java applications– Apache Cassandra: https://issues.apache.org/jira/browse/CASSANDRA-

12787

– Similar fix results in 61% better throughput and 15% reduction in 99

percentile latency reduction

https://issues.apache.org/jira/browse/CASSANDRA-12787

• Hottest GC function is

PSPromotionManager::copy_to_survivor_space

• Tuning following parameters improves 10% performance

-XX:SurvivorRatio

-XX:InitialTenuringThreshold

-XX:MaxTenuringThreshold

Garbage Collection Example

17

Eden

Old

Generation

Survivor Space #1 Survivor Space #2

Object

Profile System

18

• Baseline shows up to 40% CPU cycles spent waiting on IO

• With same total number of cores, changing Core vs. Storage ratio

from 32 vs.1 to 4 vs.1 provides 1.4x performance improvements

1.0

1.4

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

6VM with 32vCPU/VM 48VM with 4vCPU/VM

Th

rou

gh

pu

t

1 storage disk/VM

Summary

• Spark deployment is moving from on-premises to cloud

• Cloud environment provides elastic deployment, but at

the same time brings the challenges of repartitioning

resources

• Profiling applications and understand their behavior lead

to good performance improvement

19

Acknowledging

Agata Gruza, Intel Corporation

Olasoji Denloye, Intel Corporation

20

Thank You.

Yingqi (Lucy) Lu: [email protected]

Mulugeta Mammo: [email protected]

Eric Kaczmarek: [email protected]

accelerating spark genome sequencing in cloud—a data driven approach, case studies and beyond:...

Data & Analytics