achieving 100k queries per hour on hive on tez

54
7/5/22 Achieving 100k Queries per Hour with Hive on Tez

Upload: hadoop-summit

Post on 16-Jan-2017

509 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Achieving 100k Queries per Hour on Hive on Tez

May 1, 2023

Achieving 100k Queries per Hour with Hive on Tez

Page 2: Achieving 100k Queries per Hour on Hive on Tez

2

About Yahoo! JAPAN

The Largest Portal Site in Japan

65 billon pageviews / month

2.1 billon pageviews / day

Page 3: Achieving 100k Queries per Hour on Hive on Tez

3

YDN Report

What is YDN Report?• Report for Yahoo Display Ads. Networks

Batch Reporting over Massive Dataset• 13 months, 800B+ rows of data• Adding 3.3B+ rows of data per day

Highly Parallel Workload• 100K reports per hour

Page 4: Achieving 100k Queries per Hour on Hive on Tez

4

YDN Report Query

Typical Query• Query is Relatively Simple• Answer “How many clicks did I get last week?”

0

5000

10000

15000

SELECT account, yyyymmdd, sum(total_imps), sum(total_click),

... FROM table_x WHERE yyyymmdd >= xxx

AND yyyymmdd < xxx AND account = xxx ... GROUP BY account, yyyymmdd, ...;

Page 5: Achieving 100k Queries per Hour on Hive on Tez

5

Test Setup

Page 6: Achieving 100k Queries per Hour on Hive on Tez

Hive Performance Recap

Hive is fast: interactive response• ORC columnar file format• Cost based optimizer (CBO)• Vectorized SQL engine• Tez execution engine (replacing MapReduce)

Hive 0.10Batch Processing 100-150x Query Speedup

Hive 1.2HumanInteractive (5 seconds)

Page 7: Achieving 100k Queries per Hour on Hive on Tez

Hive on Tez Query Execution

A query execution essentially is put together from• Client execution [ 0s if done correctly ]• Optimization [HiveServer2] [~ 0.1s]• Metadata lookups [Hcatalog, Metastore] [ very fast in hive 0.14 ]• Application Master creation [4-5s]• Container Allocation [3-5s]• Tez task execution on YARN

YARN and HDFS

HiveServer2Server #1Client

Running testing tool

N connections

N connections

Metastore Metastore DB

HiveServer2Server #2

TezAM

TezContain

er

TezContaine

r…

Page 8: Achieving 100k Queries per Hour on Hive on Tez

8

Mini Test

Mini Setup Tested• 50 nodes• 450B rows dataset• Achieved 15K queries per hour

So, can we get 100K qph on 700 nodes?

We thought it should be easy, but…

Page 9: Achieving 100k Queries per Hour on Hive on Tez

9

The Bottlenecks at Scale

Challenges at Scale• Hive Metastore Server• YARN Resource Manager• Datanode Hotspot• YARN ATS

Page 10: Achieving 100k Queries per Hour on Hive on Tez

10

Hive Metastore Server

Use Local Metastore• Before: HS2 -> Metastore Server -> Metastore DB• After: HS2 (local metastore) -> Metastore DB

Page 11: Achieving 100k Queries per Hour on Hive on Tez

11

Hive Metastore Server

Use Local Metastore• Throughput: 7.6K -> 22K qph

Page 12: Achieving 100k Queries per Hour on Hive on Tez

12

Pending Apps

YARN ResourceManager Scalability• Too much pending apps

Page 13: Achieving 100k Queries per Hour on Hive on Tez

13

Pending Apps

YARN ResourceManager Scalability• Too much pending apps• Resolve: increase

yarn.resourcemanager.amlauncher.thread-count• Throughput: 22K -> 26K qph

Page 14: Achieving 100k Queries per Hour on Hive on Tez

14

Pending Containers

YARN ResourceManager Scalability• Too much pending containers

Page 15: Achieving 100k Queries per Hour on Hive on Tez

15

Pending Containers

YARN ResourceManager Scalability• Too much pending containers• Resolve: increase tez.am-rm.heartbeat.interval-

ms.max • Throughput: 26K -> 72.5K qph

Page 16: Achieving 100k Queries per Hour on Hive on Tez

16

Datanode Hotspot

Last Hour Problem• Connection timeout and disk access error• Many queries access recently added data

Page 17: Achieving 100k Queries per Hour on Hive on Tez

17

Datanode Hotspot

Last Hour Problem• Resolve: Increase HDFS replication factor• Throughput: 72.5K -> 95K qph

Page 18: Achieving 100k Queries per Hour on Hive on Tez

18

Other Tunings

Other Tunings We Did• Container reuse timeout• YARN capacity scheduler node locality delay• Tez shuffle keep alive• TCP fin_wait

Notes on YARN ATS• Disabling YARN ATS gives higher throughput• Trade off losing YARN log aggregation

Page 19: Achieving 100k Queries per Hour on Hive on Tez

19

End of first half

End of first half

Page 20: Achieving 100k Queries per Hour on Hive on Tez

Yohei Abe@Yahoo! JAPAN

Real-life Hive LLAP at Yahoo! JAPAN

Aug 2016

Page 21: Achieving 100k Queries per Hour on Hive on Tez

21

Agenda

• Hive LLAP at Yahoo! JAPAN

• Tuning• Performance Result• Future Work

Page 22: Achieving 100k Queries per Hour on Hive on Tez

Hive LLAP at Yahoo! JAPAN

Page 23: Achieving 100k Queries per Hour on Hive on Tez

23

Hive on Tez

Hive on Tez is able to produce 100K reports/hour

Page 24: Achieving 100k Queries per Hour on Hive on Tez

24

Hive on Tez+LLAP

How Hive on Tez+LLAP handle 100K reports ?

• how many servers • Tuning?

Page 25: Achieving 100k Queries per Hour on Hive on Tez

What is LLAP

Page 26: Achieving 100k Queries per Hour on Hive on Tez

26

What is LLAP?

LLAP is for sub-second query procesisng

•Persistent daemons

•Caching data

Page 27: Achieving 100k Queries per Hour on Hive on Tez

27

What is LLAP?

Tez container

Tez container

Tez AppMaster

Tez

created dynamically

LLAPdaemon

LLAPdaemon

Tez AppMaster

Tez+LLAP

persistent daemon

Page 28: Achieving 100k Queries per Hour on Hive on Tez

Basic Tuning

Page 29: Achieving 100k Queries per Hour on Hive on Tez

29

LLAP test cluster

Server node Xeon E5-2660v2 2.2GHz / 2CPU / 128GBMEM / 10GBase-T 2port

Slave node 45 nodesHiveServer2 node 10 nodesHadoop 2.7.1Hive 2.1.0-snapshotTez 0.8.3

Page 30: Achieving 100k Queries per Hour on Hive on Tez

30

Parameters

Some basic parameters needs to be changed

very slow performance if it’s default value

Page 31: Achieving 100k Queries per Hour on Hive on Tez

31

Threading model

hive.llap.daemon.num.executors

hive.llap.io.threadpool.size

threadexecutor

thread

threadI/O

thread

data

Page 32: Achieving 100k Queries per Hour on Hive on Tez

Executor thread pool

32

hive.llap.daemon.num.executors (default 4)• the number of JVM thread for

query execution• set this same with the num of

vCPU• 40 in our cpu

Page 33: Achieving 100k Queries per Hour on Hive on Tez

33

Performance: executor thread

Page 34: Achieving 100k Queries per Hour on Hive on Tez

34

I/O thread pool

hive.llap.io.threadpool.size(default 10)• number of IO threads• Set the number of vCPU

• 40 in our case

Page 35: Achieving 100k Queries per Hour on Hive on Tez

35

Performance: I/O thread

Page 36: Achieving 100k Queries per Hour on Hive on Tez

36

Memoryhive.llap.daemon.memory.per.instance.mb java -Xmx …

hive.llap.io.memory.size

executor I/O

JVM on-heap JVM off-heap

Page 37: Achieving 100k Queries per Hour on Hive on Tez

Performance(compared to Tez)

Page 38: Achieving 100k Queries per Hour on Hive on Tez

Performance: QPS

38

Page 39: Achieving 100k Queries per Hour on Hive on Tez

39

100K / hour ?

LLAP 45 nodes(test cluster)

max: 24 qps ≈ 87K query/hour

70 nodes for 100K(if it’s scaled linearly)

Page 40: Achieving 100k Queries per Hour on Hive on Tez

Advanced Tuning

Page 41: Achieving 100k Queries per Hour on Hive on Tez

Advanced Tuning

41

hive.llap.client.consistent.splits

false(default) => Use file locality for selecting LLAP daemon

true => LLAP daemon is selected evenly(by hash distribution)

Page 42: Achieving 100k Queries per Hour on Hive on Tez

42

Recap: LLAP

A node runs LLAPand also datanode

Page 43: Achieving 100k Queries per Hour on Hive on Tez

hive.llap.client.consistent.splits

43 Locality No Locality

Page 44: Achieving 100k Queries per Hour on Hive on Tez

Future Work

Page 45: Achieving 100k Queries per Hour on Hive on Tez

Web UI

Page 46: Achieving 100k Queries per Hour on Hive on Tez

46

Web UI (HIVE-11526)LLAP daemon exposes basic metrics on port 15002(default)

Included in HIVE2.1

Contributed from Yahoo! JAPAN

Page 47: Achieving 100k Queries per Hour on Hive on Tez

47

Web UI (HIVE-14030)

HIVE-11526 is just for each daemon

HIVE-14030 provides aggregation view of a LLAP cluster (not yet in master)

Contributed from Yahoo! JAPAN

Page 48: Achieving 100k Queries per Hour on Hive on Tez

ACL

Page 49: Achieving 100k Queries per Hour on Hive on Tez

49

Hive Column-level ACL

HS2 LLAP

YARN

HDFS

GOAL: Column-level ACL

SQL

ANSWER(?):HiveServer2 can do it

Page 50: Achieving 100k Queries per Hour on Hive on Tez

50

Direct Access to HDFSbreaks everything

HS2 LLAP

YARN

HDFS

Storage Based Authorization

M/R,Pig,

SparkBreak SQL

Standard Based ACLs

!!

But direct accessing(Not from Hive) to HDFS breaks the security model.

Other solutions(not only Hive)are necessary

Page 51: Achieving 100k Queries per Hour on Hive on Tez

51

Future Directions

HS2 LLAP

YARN

HDFSLlapInputFormat

M/R,Pig,

Spark

CheckSQL

Based ACLs

LlapInputFormat checks ACLs to HS2 for other applications.HIVE-13441 HIVE-12991

see LlapDump.java

Page 52: Achieving 100k Queries per Hour on Hive on Tez

Summary

Page 53: Achieving 100k Queries per Hour on Hive on Tez

Summary

53

• Throughput is greatly improved by LLAP

• Some tunings are necessary

• LLAP is also effective for batch processing

Page 54: Achieving 100k Queries per Hour on Hive on Tez

Q & A