dancing elephants - efficiently working with object stories from apache spark and apache hive

30
© Hortonworks Inc. 2011 – 2017 All Rights Reserved Dancing Elephants: Working with Object Storage in Apache Spark and Hive Steve Loughran Sanjay Radia April 2017

Upload: dataworks-summithadoop-summit

Post on 22-Jan-2018

332 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Dancing Elephants - Efficiently Working with Object Stories from Apache Spark and Apache Hive

© Hortonworks Inc. 2011 – 2017 All Rights Reserved

Dancing Elephants:Working with Object Storage in Apache Spark and HiveSteve LoughranSanjay Radia

April 2017

Page 2: Dancing Elephants - Efficiently Working with Object Stories from Apache Spark and Apache Hive

© Hortonworks Inc. 2011 – 2017 All Rights Reserved

Why?

• No upfront hardware costs• Data ingress for IoT, mobile apps• Cost effective if sufficiently agile

Page 3: Dancing Elephants - Efficiently Working with Object Stories from Apache Spark and Apache Hive

© Hortonworks Inc. 2011 – 2017 All Rights Reserved

Object Stores are a key part of agile cloud applications

⬢ It's the only persistent store for in-cloud clusters

⬢ Object stores are the source and final destination of work

⬢ Cross-application data storage

⬢ Asynchronous data exchange

⬢ External data sourcesAlso useful in physical clusters!

Page 4: Dancing Elephants - Efficiently Working with Object Stories from Apache Spark and Apache Hive

© Hortonworks Inc. 2011 – 2017 All Rights Reserved

Cloud Storage Integration: Evolution for Agility

HDFS

Application

HDFS

Application

GoalEvolution towards cloud storage as the persistent Data Lake

Input Output

Backup Restore

InputOutput

Upload

HDFS

Application

Input

Output

tmp

AzureAWS –today

Page 5: Dancing Elephants - Efficiently Working with Object Stories from Apache Spark and Apache Hive

© Hortonworks Inc. 2011 – 2017 All Rights Reserved

ORCdatasets

inbound

Elastic ETL

HDFS

external

Page 6: Dancing Elephants - Efficiently Working with Object Stories from Apache Spark and Apache Hive

© Hortonworks Inc. 2011 – 2017 All Rights Reserved

datasets

external

Notebooks

library

Page 7: Dancing Elephants - Efficiently Working with Object Stories from Apache Spark and Apache Hive

© Hortonworks Inc. 2011 – 2017 All Rights Reserved

Streaming

Page 8: Dancing Elephants - Efficiently Working with Object Stories from Apache Spark and Apache Hive

© Hortonworks Inc. 2011 – 2017 All Rights Reserved

Danger: Object stores are not filesystems∗

Cost & Geo-distribution overConsistency and Performance

∗for more information, please re-read

Page 9: Dancing Elephants - Efficiently Working with Object Stories from Apache Spark and Apache Hive

© Hortonworks Inc. 2011 – 2017 All Rights Reserved

A Filesystem: Directories, Files Data

/

work

pending

part-00

part-01

00

00

00

01

01

01

complete

part-01

rename("/work/pending/part-01", "/work/complete")

Page 10: Dancing Elephants - Efficiently Working with Object Stories from Apache Spark and Apache Hive

© Hortonworks Inc. 2011 – 2017 All Rights Reserved

Object Store: hash(name)⇒data

00

00

00

01

01

s01 s02

s03 s04

hash("/work/pending/part-01")["s02", "s03", "s04"]

copy("/work/pending/part-01","/work/complete/part01")

01

010101

delete("/work/pending/part-01")

hash("/work/pending/part-00")["s01", "s02", "s04"]

Page 11: Dancing Elephants - Efficiently Working with Object Stories from Apache Spark and Apache Hive

© Hortonworks Inc. 2011 – 2017 All Rights Reserved

Often: Eventually Consistent

00

00

00

01

01

s01 s02

s03 s04

01

DELETE /work/pending/part-00

GET /work/pending/part-00

GET /work/pending/part-00

200

200

200

Page 12: Dancing Elephants - Efficiently Working with Object Stories from Apache Spark and Apache Hive

© Hortonworks Inc. 2011 – 2017 All Rights Reserved

The dangers of Eventual Consistency

⬢ Temp Data leftovers

⬢ List inconsistency means new data may not be visible

⬢ Lack of atomic rename() can leave output directories inconsistent

You can get bad data and not even notice

Page 13: Dancing Elephants - Efficiently Working with Object Stories from Apache Spark and Apache Hive

© Hortonworks Inc. 2011 – 2017 All Rights Reserved

org.apache.hadoop.fs.FileSystem

hdfs s3awasb adlswift gs

Page 14: Dancing Elephants - Efficiently Working with Object Stories from Apache Spark and Apache Hive

© Hortonworks Inc. 2011 – 2017 All Rights Reserved

s3:// —“inode on S3”

s3n://“Native” S3

s3a:// Replaces s3n

swift://OpenStack

wasb://Azure WASB

Phase I: Stabilize S3A

oss://Aliyun

gs://Google Cloud

Phase II: speed & scale

adl://Azure Data Lake

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

s3://Amazon EMR S3

History of Object Storage Support

Phase III: scale & consistency

(proprietary)

Page 15: Dancing Elephants - Efficiently Working with Object Stories from Apache Spark and Apache Hive

© Hortonworks Inc. 2011 – 2017 All Rights Reserved

Make Apache Hadoop at home in the cloud Step 1: Hadoop runs great on AzureStep 2: Beat EMR on EC2

✔✔

Page 16: Dancing Elephants - Efficiently Working with Object Stories from Apache Spark and Apache Hive

© Hortonworks Inc. 2011 – 2017 All Rights Reserved

Problem: S3 Analytics is too slow/broken

1. Analyze benchmarks and bug-reports 2. Fix Read path for Columnar Data3. Fix Write path4. Improve query partitioning5. The Commitment Problem

Page 17: Dancing Elephants - Efficiently Working with Object Stories from Apache Spark and Apache Hive

getFileStatus()read()

LLAP (single node) on AWS

TPC-DS queries at 200 GB

scale

readFully(pos)

Page 18: Dancing Elephants - Efficiently Working with Object Stories from Apache Spark and Apache Hive

© Hortonworks Inc. 2011 – 2017 All Rights Reserved

HDP 2.6/Hadoop 2.8 transforms I/O performance!

// forward seek by skipping streamfs.s3a.readahead.range=256K

// faster backward seek for Columnar Storagefs.s3a.experimental.input.fadvise=random

// enhanced data uploadfs.s3a.fast.output.enabled=true

—see HADOOP-11694 for lots more!

Page 19: Dancing Elephants - Efficiently Working with Object Stories from Apache Spark and Apache Hive

© Hortonworks Inc. 2011 – 2017 All Rights Reserved

benchmarks !=your queriesyour datayour VMsyour directory tree

…but we think we've made a good start

Page 20: Dancing Elephants - Efficiently Working with Object Stories from Apache Spark and Apache Hive

© Hortonworks Inc. 2011 – 2017 All Rights Reserved

S3 Data Source 1TB TPCDS LLAP- vs Hive 1.x:

0

500

1,000

1,500

2,000

2,500

LLAP-1TB-TPCDS

Hive-1-1TB-TPCDS

1 TB TPC-DS ORC DataSet

3 x i2x4x Large (16 CPU x 122 GB RAM x 4 SSD)

Page 21: Dancing Elephants - Efficiently Working with Object Stories from Apache Spark and Apache Hive

© Hortonworks Inc. 2011 – 2017 All Rights Reserved

Apache Spark

Object store work appliesNeeds tuningCommits to S3 "trouble"

Page 22: Dancing Elephants - Efficiently Working with Object Stories from Apache Spark and Apache Hive

© Hortonworks Inc. 2011 – 2017 All Rights Reserved

spark-default.conf

spark.hadoop.fs.s3a.readahead.range 256Kspark.hadoop.fs.s3a.experimental.input.fadvise random

spark.sql.orc.filterPushdown truespark.sql.orc.splits.include.file.footer truespark.sql.orc.cache.stripe.details.size 10000spark.sql.hive.metastorePartitionPruning true

spark.sql.parquet.filterPushdown truespark.sql.parquet.mergeSchema falsespark.hadoop.parquet.enable.summary-metadata false

Page 23: Dancing Elephants - Efficiently Working with Object Stories from Apache Spark and Apache Hive

© Hortonworks Inc. 2011 – 2017 All Rights Reserved

The S3 Commitment Problem

⬢ rename() depended upon for atomic transaction

⬢ Time to copy() + delete() proportional to data * files

⬢ Compared to Azure Storage, S3 is slow (6-10+ MB/s)

⬢ Intermediate data may be visible

⬢ Failures leave storage in unknown state

Page 24: Dancing Elephants - Efficiently Working with Object Stories from Apache Spark and Apache Hive

© Hortonworks Inc. 2011 – 2017 All Rights Reserved

Spark's Direct Output Committer? Risk of Corruption of data

Page 25: Dancing Elephants - Efficiently Working with Object Stories from Apache Spark and Apache Hive

© Hortonworks Inc. 2011 – 2017 All Rights Reserved

S3guardFast, consistent S3 metadataHADOOP-13445

Page 26: Dancing Elephants - Efficiently Working with Object Stories from Apache Spark and Apache Hive

© Hortonworks Inc. 2011 – 2017 All Rights Reserved

DynamoDB as fast, consistent metadata store

00

00

00

01

01

s01 s02

s03 s04

01

DELETE part-00200

HEAD part-00200

HEAD part-00404

PUT part-00200

00

Page 27: Dancing Elephants - Efficiently Working with Object Stories from Apache Spark and Apache Hive

© Hortonworks Inc. 2011 – 2017 All Rights Reserved

Netflix Staging Committer

1. Saves output to local files file://

2. Task commit: upload to S3A as multipart PUT —but do not complete it

3. Job committer completes all uploads from successful tasks; cancels others.

Outcome:

⬢ No work visible until job is commited

⬢ Task commit time = data/bandwidth

⬢ Job commit time = POST * #files

Page 28: Dancing Elephants - Efficiently Working with Object Stories from Apache Spark and Apache Hive

© Hortonworks Inc. 2011 – 2017 All Rights Reserved

Availability

Read + Write in HDP 2.6 and Apache Hadoop 2.8 S3Guard: preview of DDB integration soon Zero-rename commit: work in progress

Look to HDCloud for the latest work!

Page 29: Dancing Elephants - Efficiently Working with Object Stories from Apache Spark and Apache Hive

© Hortonworks Inc. 2011 – 2017 All Rights Reserved

Big thanks to:

Rajesh Balamohan

Mingliang Liu

Chris Nauroth

Dominik Bialek

Ram Venkatesh

Everyone in QE, RE

+ everyone who reviewed/tested, patches and added their own, filed bug reports and measured performance

Page 30: Dancing Elephants - Efficiently Working with Object Stories from Apache Spark and Apache Hive

© Hortonworks Inc. 2011 – 2017 All Rights Reserved31

© Hortonworks Inc. 2011 – 2017. All Rights Reserved

Questions?

[email protected] @[email protected] @srr