accumulo summit 2015: hdfs short circuit local read performance benchmarking with apache accumulo...

of 22 /22
1 © Cloudera, Inc. All rights reserved. HDFS Short Circuit Local Read Performance in Accumulo and HBase Michael Ridley | Solutions Architect

Author: accumulo-summit

Post on 15-Jul-2015

163 views

Category:

Technology


10 download

Embed Size (px)

TRANSCRIPT

PowerPoint Presentation

HDFS Short Circuit Local Read Performance in Accumulo and HBaseMichael Ridley | Solutions Architect# Cloudera, Inc. All rights reserved.AgendaExplanation of HDFS short circuit local readsCluster configurationTesting methodologyAccumulo resultsHBase resultsThoughts on possible future researchQ&A# Cloudera, Inc. All rights reserved.What are HDFS short circuit local reads?Why are we here?# Cloudera, Inc. All rights reserved.HDFS short circuit local readsTypical communication between an HDFS client and the HDFS datanode is over a TCP socket.In cases where the read happens to be occurring on the same host as the datanode serving the data, it is more efficient to avoid the socket overhead.HDFS provides a facility to communicate over a named pipe in these cases.# Cloudera, Inc. All rights reserved.The more recent implementations of short circuit local reads avoid the security issues that were present in the original implementation and are now secure.

CDH enables short circuit local reads by default on data nodes but does not have it enabled on the HDFS gateway role. For some components it is enabled by default (eg. HBase, Impala). In general CDH tries to mirror upstream defaults in that regard.4How did we test?Cluster stats, testing methodology, etc.# Cloudera, Inc. All rights reserved.Cluster configurationBenchmarking was performed on a 40-node cluster using 36 tablet servers/region servers.All testing was performed on CDH 5.3.3 using the latest Cloudera build of Accumulo 1.6.Tablet servers and region servers were configured with 4 GB heap.Cluster installation and configuration was performed via Cloudera Manager using parcels.

# Cloudera, Inc. All rights reserved.CDH 5.3.3 using Cloudera 1.6.0-cdh5.1.4 version of Accumulo and stock CDH 5.3.3 Hbase which is based on 0.98.6 upstream (plus patches)6HDFS configurationPrimary hdfs-site.xml configuration property is dfs.client.read.shortcircuitMust be set both on the datanodes and the clients (tablet servers and region servers).The property dfs.domain.socket.path specifies the path to the local socket file.Additional performance could be possible by setting dfs.client.read.shortcircuit.skip.checksum to true (not tested in this experiment).# Cloudera, Inc. All rights reserved.Testing methodology - YCSBPerformance testing of Accumulo and HBase was performed using the YCSB benchmark suite available from https://github.com/brianfrankcooper/YCSB.Stock YCSB does not work with modern versions of HBase so the https://github.com/apurtell/ycsb/tree/new_hbase_client fork was used for HBase testing.Two YCSB workloads were used, a small workload with 10 byte rows and a larger workload with 100 KB rows.Each workload was run with HDFS short circuit local reads disabled and then with short circuit local reads enabled.# Cloudera, Inc. All rights reserved.Testing methodology - YCSB (continued)Each YCSB workload was run ten times.In the results, the first iteration is dropped because in some cases the services were restarted and the first run included JVM JIT warm-up overhead.YCSB benchmarks include two phases, load and run.Tables were flushed after the load phase to empty the memtable and OS disk caches were flushed.The YCSB workloads were 100% read operations.# Cloudera, Inc. All rights reserved.Testing methodology - cachingEach test was performed with caching enabled and disabled.For Accumulo the table properties table.cache.block.enable and table.cache.index.enable were set to true or false.The tablet server properties tserver.cache.data.size and tserver.cache.index.size were set to 0 or 2G.For HBase the property BLOCKCACHE was set to true or false.# Cloudera, Inc. All rights reserved.BLOCKCACHE was set to true or false for the ycsb column family upon table creation10Testing methodology pre-splittingFor both HBase and Accumulo the table was pre-split for better distribution.A splits file was used with 100 splits.The same splits were used for each workload.# Cloudera, Inc. All rights reserved.So what did we learn?On to the results!# Cloudera, Inc. All rights reserved.Accumulo results10 Byte Workload100 KB Workload# Cloudera, Inc. All rights reserved.Accumulo results10 Byte WorkloadsWithout CachingWith CachingSCR EnabledMedian Average Latency (us)Median 95th Percentile (ms)SCR EnabledMedian Average Latency (us)Median 95th Percentile (ms)No25669.3585912No23226.631966Yes22518.6224812Yes22124.008686% Change-12.27%0.00%% Change-4.75%0.00%Absolute Change3150.736110Absolute Change1102.623280100 KB WorkloadsWithout CachingWith CachingSCR EnabledMedian Average Latency (us)Median 95th Percentile (ms)SCR EnabledMedian Average Latency (us)Median 95th Percentile (ms)No1723347.444456No1739103.513479Yes1589593.573459Yes1143180.924470% Change-7.76%0.66%% Change-34.27%-1.88%Absolute Change133753.871-3Absolute Change595922.5899# Cloudera, Inc. All rights reserved.This slide more for reference than for presentation14Accumulo single node results10 Byte Workload100 KB Workload# Cloudera, Inc. All rights reserved.Single node 10 byte performance improvement is more significant probably because the entire data set fits in the block cache so after the first read a lot is happening from the memtable15Accumulo single node results10 Byte WorkloadsWithout CachingWith CachingSCR EnabledMedian Average Latency (us)Median 95th Percentile (ms)SCR EnabledMedian Average Latency (us)Median 95th Percentile (ms)No5335.530576No5264.115656Yes4228.430335Yes1607.291182% Change-20.75%-16.67%% Change-69.47%-66.67%Absolute Change1107.100241Absolute Change3656.824474100 KB WorkloadsWithout CachingWith CachingSCR EnabledMedian Average Latency (us)Median 95th Percentile (ms)SCR EnabledMedian Average Latency (us)Median 95th Percentile (ms)No223443.433402No227722.941416Yes210199.774316Yes189733.018313% Change-5.93%-21.39%% Change-16.68%-24.76%Absolute Change13243.65986Absolute Change37989.923103# Cloudera, Inc. All rights reserved.This slide more for reference than for presentation

16HBase results10 Byte Workload100 KB Workload# Cloudera, Inc. All rights reserved.Single node 10 byte performance improvement is more significant probably because the entire data set fits in the block cache so after the first read a lot is happening from the memtable17HBase results10 Byte WorkloadsWithout CachingWith CachingSCR EnabledMedian Average Latency (us)Median 95th Percentile (ms)SCR EnabledMedian Average Latency (us)Median 95th Percentile (ms)No2047.912691No1264.74923595423Yes1549.744520Yes1255.205020% Change-24.33%-100.00%% Change-0.75%-100.00%Absolute Change498.168171Absolute Change9.54421595423100 KB WorkloadsWithout CachingWith CachingSCR EnabledMedian Average Latency (us)Median 95th Percentile (ms)SCR EnabledMedian Average Latency (us)Median 95th Percentile (ms)No192503.1168118No183588.094544Yes182921.338571Yes181761.417647% Change-4.98%-39.83%% Change-0.99%6.82%Absolute Change9581.778347Absolute Change1826.6769-3# Cloudera, Inc. All rights reserved.This slide more for reference than for presentation

18Where to from here?Possible future research directions.# Cloudera, Inc. All rights reserved.Future research possibilitiesTesting with a more diverse set of workloads to better understand which workloads benefit most from short circuit local reads.Memory profiling during benchmarking to understand HDFS client memory overhead.# Cloudera, Inc. All rights reserved.Q&AAny questions?# Cloudera, Inc. All rights reserved.Thank youMichael [email protected]# Cloudera, Inc. All rights reserved.