the state of sql-on-hadoop in the cloud
Post on 12-Feb-2017
480 Views
Preview:
TRANSCRIPT
The state of SQL-on-Hadoop in the Cloud
By Nicolas Poggi
Lead researcher – Big Data Frameworks
Data Centric Computing (DCC) Research Group
BDOOP Meetup – Dec 2016
Agenda
• Intro and motivation
• PaaS services overview• Instances comparison
• SW and HW specs• Elasticity• Perf metrics
• SQL Benchmark• Test methodology• Evaluations
• Execution times• Data size scalability
• Price / Performance
• PaaS evolution over time• SW and HW improvements
• Summary• Lessons learned• Conclusions & future work
2
ALOJA: towards cost-effective Big Data
• Open research project for automating characterization and optimization of Big Data deployments
• Open source Benchmarking-to-Insights platform and tools
• Largest Big Data public repository• Community collaboration with industry and academia
• Preliminary to this study:• Big Data Benchmark Compendium (TPC-TC `15)
• The Benefits of Hadoop as PaaS (Hadoop Summit EU `16)
http://aloja.bsc.es
Big Data Benchmarking
Online Repository
Web / ML
Analytics
Motivation of SQL-on-Hadoop study
• Extend the ALOJA platform to survey popular PaaS SQL Big Data Cloud solutions using Hive [to begin with]
• First approach to services, from an end-user’s perspective• Using the public cloud (and pricing), online docs, and resources• Medium-size test deployments and data (8 data-nodes, up to 1TB)
• Evaluate and compare out-of-the-box (default VMs and config)
• Architectural differences, readiness, competitive advantages • Scalability, Price and Performance
Disclaimer: snapshot of the out-of-the-box price and performance during March-July 2016. Performance and especially costs change often. We use non-discounted pricing. I/O costs are complex to estimate for a single benchmark.
5
Platform-as-a-Service Big Data
• Cloud-based managed Hadoop services• Ready to use Hive, spark, …
• Simplified management
• Deploys in minutes, on-demand, elastic• You select the instance and
• the number of processing nodes
• Pay-as-you-go, pay-as-you-process models
• Optimized for general purpose• Fined tuned to the cloud provider architecture
6
Surveyed Hadoop/Hive PaaS services
• Amazon Elastic Map Reduce (EMR)• Released: Apr 2009• OS: Amazon Linux AMI 4.4 (RHEL-like)• SW stack: EMR (custom, 4.7*)• Instances:
• m3.xlarge and m4.xlarge
• Google Cloud DataProc (CDP)• Released: Feb 2016• OS: Debian GNU/Linux 8.4• SW stack: (custom, v1)• Instances:
• n1-standard-4 and n1-standard-8
• Azure HDInsight (HDI) • Released: Oct 2013• OS: Windows Server and Ubuntu 14.04.5 LTS• SW stack: HDP based (v 2.3 and 2.4 **)• Instances:
• A3s, D3s v1-2, and D4s v1-2
• Rackspace Cloud Big Data (CBD) • Released: Oct 2013• OS: CentOS 7• SW stack: HDP (2.3)• API: OpenStack (+ Lava)• Instances:
• Hadoop 1-7, 1-15, 1-30, On Metal 40
We selected defaults, general purpose VMs, Also on-premises results as baseline.* EMR v5 released in Aug 2016. ** HDI offers HDP 2.5 since Sept 2016
7
Systems-Under-Test (SUTs):VM/Instance specs, elasticity, perf characterization
Focus: 8-datanodes, up to 1TB data size
8
SUTs: Tech specs and costs
* Estimate based on 3 years life time including support and maintenance (see refs.) 10
Notes:• Default Cloud SKUs have 4
cores and ~15GB of in all providers• 4GBs of RAM / core
• Prices vary greatly
• Rackspace defaults to high-end OnMetal
Provider Instance type Default? Cores/Node RAM/Node RAM/core
Amazon EMR(us-east-1)
m3.xlarge Yes 4 15 3.8
m4.xlarge 4 16 4
Google CDP(Europe-west1-b)
n1-standard-4 Yes 4 15 3.8
n1-standard-4 1 SSD 4 15 3.8
n1-standard-8 8 30 7.5
Azure HDI(South Central US)
A3 (Large) (old def.) 4 7 1.8
D3 v1 and v2 Yes 4 14 3.5
D4 v1 and v2 8 28 3.5
Rackspace CBD(Northern Virginia (IAD))
hadoop1-7 2 7 3.5
hadoop1-15 (2nd) 4 15 3.8
hadoop1-30 8 30 3.8
OnMetal 40 Yes 40 128 3.2
On-premises2012 (12cores/64GB) 12 64 5.3
D Nodes Cost/Hour Cluster Shared
8 $ 3.36 Yes
8 $ 2.99 Yes
8 $ 1.81 Yes
8 $ 1.92 Yes
8 $ 3.61 Yes
8 $ 2.70 Yes
8 $ 5.25 Yes
8 $ 10.48 Yes
8 $ 2.72 Yes
8 $ 5.44 Yes
8 $ 10.88 Yes
4 $ 11.80 No
8 $ 3.50 * No
Includes I/O costs Cost/5TB/hr* Deploy time (min)
Yes / No with EBS$ 0.07
~ 10
No ~ 10
No
$ 0.18
~ 01
No ~ 01
No ~ 01
No
$ 0.17
~ 25
No ~ 25
No ~ 25
YesLocal
$ 0.00Cloud
$ 0.07
~ 25
Yes ~ 25
Yes ~ 25
Yes ~ 25
Yes $ 0.00 N/A
SUTs: Elasticity and I/O
*Tests need 5TB of raw HDFS storage, this cost is used. **supports up to 4 SSD drives 12
Provider Instance type Elasticity Storage
Amazon EMRm3.xlarge Compute (and EBS option) 2x40GB Local SSD / node
m4.xlarge Compute and EBS (fixed size) EBS size defined on deploy
Google CDP
n1-standard-4
Compute and GCS (fixed size )
GCS size defined on deploy
n1-standard-4 1 SSD 1x375GB SSD ** + GCS
n1-standard-8 GCS size defined on deploy
Azure HDI
A3 (Large)
Compute and storage
Elastic (WASB)
D3 v1 and 2 Elastic (WASB) + 200GB SSD local
D4 v1 and 2 Elastic (WASB) + 400GB SSD local
Rackspace CBD
hadoop1-7
Compute (Cloud files option)
1.5TB SATA / node
hadoop1-15 2.5TB SATA / node
hadoop1-30 5TB SATA / node
OnMetal 40 2x1.5TB SSD / node
On-premises 2012 (12cores/64GB) No 1TB SATA x6 / node
SUTs: CPU, MEM, and NET characterization
MEM: Ops/s (sysbench 1-thread, default blk size)
0
500000
1000000
1500000
2000000
2500000
3000000
3500000
Higher is better. Defaults in Purple. Benchmarked using ALOJA cluster-bench, sysbench, and iperf 15
EMR m3.x, CBD hadoop1-x, and HDI A3 older gen CPUs
OnMetal much higher
NET: iperf Gb/s (1 thread 100GB)
8.77 8.77
6.85
5.3
7.41
0.999
2
2.99
2.47
5.97
4.134.4
3.98
4.49
0.941
0
1
2
3
4
5
6
7
8
9
10EMR < 10Gbp/s
CDP < 8Gbp/s
HDI < 6Gbp/s(SKU dependent)
CBD < 5Gbp/s
OnPrem 1Gbp/s
30.3814 35.813949.5461
320.8429
37.1409 29.9051 31.970346.30423
56.575
29.2788 28.7281
73.1942
615.449
49.5514
15.53331.128 28.9609
90.5373
12.2101 16.308435.0701 37.924
50.0872
1.9575 3.5915 4.8559
315.7558
227.9924
DFSIO_read MB/s
DFSIO_write MB/s
SUTs: HDFS I/O 1TB
17
CBDHighest OnMetal
Very low write throughput for
Cloud
CDPSSDs are used as
tiered storage
EMRSKU
dependentLow MB/s
HDIThroughput is SKU dependent
Higher is better. Benchmarked using ALOJA Hadoop-Examples
Most SUTs < 50 MB/s R/WExpect for OnMetal and CDP with SSD
SQL-on-Hadoop benchmarkingMethodology and evaluations
18
Benchmark suite: TPC-H (derived)
• DB industry standard for decision support• well understood benchmark and accepted (since `99)• available audited results on-line
• 22 “real world” business queries• Complex joins, grouping, nested queries
• Defines scale factors for data
• DDLs and queries from D2F-Bench project:• Includes Hive adaptation with ORC tables • Repo: https://github.com/Aloja/D2F-Bench
• based on https://github.com/hortonworks/hive-testbench• changes make it HDP agnostic
• Supports other engines: Spark, pig, impala, drill, …
19
TPC-H 8-tables schema
Test methodology
• ALOJA-BENCH as a driver• Test methodology
• Queries run from 1-22 • sequentially
• To try to avoid caches• [at least] 3 repetitions• Query ALL (Q ALL) as full run
• Power runs (no concurrency)
• Data sizes:• 1GB, 10GB, 100GB, 500GB*, 1TB
• Metrics: execution time and cost• Comparisons
• Q ALL (full run)• Scans Q1, and Q6, • Joins Q2, Q16• Q16 most “complete” single query
• Process and settings• TCP-H datagen CSVs converted to
Hive ORC tables• Each system its own hive.settings
• On-prem from source repo
20*500GB is not a standard size, but 300GB is.
SUTs Performance and ScalabilityExecution times
Scalability to data size
Query drill-down
Latency test
21
Exec times by SUT: 8dn 100GB Q ALL
Notes:• Results show execution times for full TPC-H, on SKUs with 8 data nodes at 100GB. Except for the CBD-on metal which has 4dns.
• CBD: • OnMetal fast• Cloud, scale to SKU size
• CDP:• SSD slightly faster than regular• N1std8 only 30% faster than
N1std4• EMR:
• m4.xlarge 18% faster than m3.xlarge
• HDI:• Scale to SKU size• Fastest result D4v2
• OnPrem:• Poor results with M/R
• A3s and CBD Cloud present high variability
22
CBD CDP EMR HDI
SSD version marginal results
Local SSD + EBS
OnMetal
D4v2 Fastest
D3v2 fastest defaultEBS Only
OnPrem
Exec times by SKU: 8dn 1TB Q ALL
23
Notes:• Results show execution times for full TPC-H, on SKUs with 8 data nodes at 1TB. Except for the CBD-on metal which has 4dns.• At 1TB, lower end systems obtain poorer performance.• CBD:
• OnMetal fast• Cloud: 1-7 cannot process 1TB, 1-
15,1-30 similar results• CDP:
• SSD slightly slower than regular• N1std8 2x faster than N1std4 (as
expected)• EMR:
• m4.xlarge 15% faster than m3.xlarge
• HDI:• Scale to SKU size• Fastest result D4v2
• OnPrem:• Improves results (comparing to
100GB)
Systems similar, but poor results
OnMetal2nd fastestCloud D4v2 Fastest
CBD CDP EMR HDI
Data size scalability of defaults: up to 1TB (Q ALL)
24
Notes:• Chart shows the data
scale factor from 100GB to 1TB of the default SUTs of 8 data nodes. Except for CBD On Metal, which has 4.
• Comparing defaults instances, CDP has poorest scalability, then EMR.
• On-prem scales linearly up to 1TB
• HDI and OnMetal can scale to larger sizes
Exec times defaults: Scans vs. Joins 1TBScans (parallelizable Q1 CPU, Q6 I/O) Joins (less parallelizable Q2, Q16)
27Notes: Q1 (I/O + CPU) is slow on the CDP and EMR systems. Same for Q16. CBD has inverted times for Q2 and Q16 than other systems. On metal fastest for I/O and Joins, then HDI D3v2.
Defaults with 4-cores Defaults with 4-cores
CPU utilization default VMs: Q16 1TB
Notes: EMR high CPU and Sys. HDI different pattern (spikes). CDP high iowait.CBD OnMetal Low usage, hadoop1-15 very high iowait 29
Configurations
32Notes: CDP and CBD on Java 1.8, all on OpenJDK. HDI only to enable Tez and config perf options
Category Config EMR CDP HDI CBD (On Metal) On-prem
System Java version OpenJDK 1.7.0_111 OpenJDK 1.8.0_91 OpenJDK 1.7.0_101 OpenJDK 1.8.0_71 JDK 1.7
HDFS File system EBS / S3 GCS (hadoopv.) WASB Local + Swift + S3 Local
Replication 3 2 3 2 3
Block size 128MB 128MB 128MB 256MB 128MB
File buffer size 4KB 64KB 128KB 256KB 64KB
M/R Outputcompression SNAPPY False False SNAPPY False
IO Factor / MB 48 /200 10 /100 100 / 614 100 / 358 10 / 100
Memory MB 1536 3072 1536 2048 1536 / 4096
Hive Engine MR MR Tez MR MR / Tez
ORC config Defaults Defaults Defaults Defaults Defaults
Vectorized exec False False Enabled False Enabled
Cost-based Opt False Enabled Enabled Enabled Enabled
Enforce Bucketing False False True False True
Optimizebucket map join False False True False True
Latency test: Exec time by SKU 8dn 1GB Q 16
Notes:• Results show execution times for query 16 and 1GB. Except for the CBD-on metal which has 4.
• HDI D3v2 and D4v2 have the lowest times• Then the CDP systems• OnPrem M/R worst restuls
33
CBD CDP EMR HDI
D3v2 and D4v2“lowest latency”
Price / PerformancePrice and Execution times assume:
• only cost of running benchmark or full 24/7 utilization
• no provisioning time or idle times
• by the second billing
34
Price/Performance 100GB (Q ALL)
35
Notes:• Shows the price/performance ratio by SUT• Lower in price and time is better• Chart zoomed to differentiate clusters
Price assumptions:• Measures only the cost of running the
benchmark in seconds. Cluster setup time is ignored.
Rank Cluster Best cost Best time1 CDP-n1std4-8 $ 6.37 3:11:57
2 CDP-n1std4-1SSD-8 $ 6.55 3:06:44
3 EMR-m4.xlarge-8 $ 8.18 2:40:24
4 HDI-D3v2-HDP24-8 $ 8.74 1:36:45
5 CDP-n1std8-8 $ 9.35 2:27:57
6 HDI-D4v2-HDP24-8 $ 10.20 0:57:29
7 EMR-m3.xlarge-8 $ 10.79 3:08:49
8 HDI-A3-8 $ 11.96 4:10:04
9 M100-8n $ 13.10 3:32:29
10 HDI-D4-8 $ 15.08 1:24:59
11 CBD-hadoop1-7-8 $ 19.16 7:02:33
12 CBD-OnMetal40-4 $ 19.31 1:38:12
13 CBD-hadoop1-15-8 $ 26.45 4:51:41
Cheapest run
Fastest run
Most Cost-effective
Price/Performance 1TB (Q ALL)
36
Notes:• Shows the price/performance ratio by SUT• Lower in price and time is better• Chart zoomed to differentiate clusters
Price assumptions:• Measures only the cost of running the
benchmark in seconds. Cluster setup time is ignored.
Rank Cluster Best cost Best time1 HDI-D3v2-HDP24-8 $ 39.63 7:18:42
2 HDI-D4v2-HDP24-8 $ 42.02 3:56:45
3 M100-8n $ 42.85 11:34:50
4 CDP-n1std8-8 $ 44.91 11:50:46
5 CDP-n1std4-8 $ 46.49 23:21:05
6 CDP-n1std4-1SSD-8 $ 50.53 24:00:52
7 EMR-m4.xlarge-8 $ 54.26 17:44:01
8 HDI-D4-8 $ 62.75 5:53:32
9 CBD-OnMetal40-4 $ 67.77 5:44:36
10 EMR-m3.xlarge-8 $ 69.92 20:23:01
11 HDI-A3-8 $ 74.83 27:42:56
12 CBD-hadoop1-15-8 $ 128.44 23:36:37
Cheapest run
Fastest run
Most cost effective
HW and SW improvementsPaaS provider improvements over time (tests on 4 data nodes)
37
HDI default HW improvement: 4 nodes Q ALL
Notes:• Test to compare perf improvements on HDI default VM instances from A3, to
D3 and D3v2 (30% faster CPU, same price) on HDP 2.338
HDI default VM improvement
Run time at 1TB Scalability from 1GB to 1TB
Variability
SW: HDP version 2.3 to 2.4 improvement on HDI D3v1 4 nodes Q ALL 100GB
Notes:• Test to compare migration to HDP 2.4. D3s improved, they can now run 1TB without modifications
on 4 data nodes (D3s). No more namenode swapping. On larger nodes less improvements.39
D3s 35% Improvement
Run time at 100GB Scalability from 1GB to 1TB
D3s can scale to 1TB now
SW: EMR version 4.7 to 5.0 improvement on m4.xlarge 4 nodes Q ALL
Notes:• Test to compare perf improvements on EMR 5.0 (Hive 2.1, Tez by default, Spark 2.0)• EMR 5.0 gets a 2x increase at 4 nodes.
40
ERM 5.0 2x improvement
Run time at 1TB Scalability from 1GB to 1TB
Summary Lessons learned, findings, conclusions, references
41
Remarks / Findings
• Setting up and fine tuning Big Data stacks is complex and requires and iterative process• Cloud services optimize continuously their PaaS for general-purpose
• All tune M/R and Yarn, and their custom file storages• Update HW (and prices) overtime
• You might need to re-deploy to get benefits
• Room for improvement• Only HDI fine-tunes Hive, what about other new services? (Spark, Storm, R, HBASE)• All updating to Hive and Spark v2 (and enabling Tez, tuning ORC)• CDP upgrading HDP version
• Beware, commodity VMs != commodity Bare-Metal for Big Data• Errors … Originally this was to be a 4-node comparison … • Variability, An issue for low-end, old-gen VMs
• Also scalability, and reliability, beware.• Less of an issue on newer VMs
• Scale by adding more nodes, not disks• Network throttling, not apparent at 8-datanode cluster, but for larger clusters…
42
Summary:
Similarities
• Similar defaults for cloud based:• 4-cores, ~16GB RAM, local SSDs
• ~4GB RAM / Core• Good enough for Hadoop / Hive
• Elasticity• All allow on-demand scaling-up• Mixed mode of local + remote
• Fast networking• Especially EMR• HDI, depending on VM size• Required for networked storage…
• Most deploy in < 25 mins
Differences
• CBD offers OnMetal as default• High-end, non-shared system.
• What about in-mem systems• Spark, Graph/Graph?
• Elasticity• But no all down-scaling / stop (delete)• HDI completely for disks (local for temp)
• Pricing, very different!• EMR, CBD, HDI / hour• CDP / minute• But similar overall price/perf (<30%)
• CDP deploys in 1 minute
43
The state of SQL-on-Hadoop in the Cloud
• Providers have integrated successfully on-demand Big Data services• Most are in the path to offer pay-what-you process models• Disaggregating completely storage-to-compute• Giving more elasticity to your data and needs
• Multiple clusters, pay only what you use, planning free, governance
• What about performance and reliability?• Providers are upgrading and defaulting to newer-gen VMs
• Faster CPUs, SSDs (local and remote), end-of-rotational?, fast networks• As well as keeping the SW up-to date
• Newer versions, security and performance patches, tuned for their infrastructure
• Is it price-performant?• Yes, at least for the medium-sized. The cost is in compute, so you pay for what you use!
• For ALOJA, this work is the basis for future research
44
More info:
• Upcoming publication:
• The state of SQL-on-Hadoop in the Cloud• Data release and more in-depth tech analysis
• ALOJA Benchmarking platform and online repository• http://aloja.bsc.es http://aloja.bsc.es/publications
• SPEC Research Big Data working group• http://research.spec.org/working-groups/big-data-working-group.html
Thanks, questions?
Follow up / feedback : Nicolas.Poggi@bsc.es
Twitter: @ni_po
The state of SQL-on-Hadoop in the Cloud
top related