unorthodox paths to high performance - qcon ny 2016

67
UNORTHODOX PATHS TO HIGH PERFORMANCE

Upload: alex-rasmussen

Post on 15-Apr-2017

100 views

Category:

Software


2 download

TRANSCRIPT

Page 1: Unorthodox Paths to High Performance - QCon NY 2016

UNORTHODOX PATHS TO HIGH PERFORMANCE

Page 2: Unorthodox Paths to High Performance - QCon NY 2016

@ALEXRAS

Page 3: Unorthodox Paths to High Performance - QCon NY 2016

TRITONSORT (NSDI 2011)THEMIS (SOCC 2012)

MapReduce Really Fast

Sort Really Fast

Page 4: Unorthodox Paths to High Performance - QCon NY 2016

2011!!!!!

2010!!

2014!!!

Page 5: Unorthodox Paths to High Performance - QCon NY 2016

THIS TALK: HOW WE DID IT

Page 6: Unorthodox Paths to High Performance - QCon NY 2016

LANGUAGE FRAMEWORK

TOOLS API

Page 7: Unorthodox Paths to High Performance - QCon NY 2016

1. SOFTWARE-HARDWARE CO-DESIGN

2. BUILDING FOR EXPERIMENTATION

3. CAREFULLY MANAGING MEMORY

Page 8: Unorthodox Paths to High Performance - QCon NY 2016

MOTIVATION

Page 9: Unorthodox Paths to High Performance - QCon NY 2016

[SORT] IS AN EXCELLENT TEST OF THE INPUT-OUTPUT ARCHITECTURE OF A COMPUTER AND ITS OPERATING SYSTEM.

"A measure of transaction processing power" Datamation 1985

Page 10: Unorthodox Paths to High Performance - QCon NY 2016

THE SORT BENCHMARK‣ SORTING K/V PAIRS (RECORDS) ‣ MANY CATEGORIES, VARIANTS ‣ TODAY: GRAYSORT (100TB)

Page 11: Unorthodox Paths to High Performance - QCon NY 2016

2009: YAHOO! SORTS 100TB IN 173 MINUTESON 3452 HADOOP NODES

Page 12: Unorthodox Paths to High Performance - QCon NY 2016

2.79 MBPS PER NODE

9.6 GB PER SECOND578 GB PER MINUTE

3452 NODES

Page 13: Unorthodox Paths to High Performance - QCon NY 2016

THE GAP IN THEORETICAL PER NODE PERFORMANCE AND WHAT IS ACTUALLY ACHIEVED HAS BECOME GLARINGLY LARGE.

Anderson and Tucek“Efficiency Matters!” SIGOPS OSR 2010

Page 14: Unorthodox Paths to High Performance - QCon NY 2016

HOW CAN WE DO

BETTER?

Page 15: Unorthodox Paths to High Performance - QCon NY 2016

HARDWARE & SOFTWARE CO-DESIGNED

FOR WORKLOAD

Page 16: Unorthodox Paths to High Performance - QCon NY 2016

UNDERSTANDING THE PROBLEM

Page 17: Unorthodox Paths to High Performance - QCon NY 2016

HOW TO MAXIMIZE PER-NODE SPEED?

Page 18: Unorthodox Paths to High Performance - QCon NY 2016

CPU RAM NETWORK DISK ~15 disks/NIC

10Gbps

Page 19: Unorthodox Paths to High Performance - QCon NY 2016

‣8 CORES ‣16 DISKS

‣24 GB RAM ‣10 GBPS NIC

Page 20: Unorthodox Paths to High Performance - QCon NY 2016

WHAT ARE THE EXPENSIVE

OPERATIONS?

Page 21: Unorthodox Paths to High Performance - QCon NY 2016

SEEKING

ROTATION

Page 22: Unorthodox Paths to High Performance - QCon NY 2016

WRITING"☠ SEEKING

Page 23: Unorthodox Paths to High Performance - QCon NY 2016

KEEP OFF THE DISKTWO READS + TWO WRITES PER RECORD

Page 24: Unorthodox Paths to High Performance - QCon NY 2016

SEEK INFREQUENTLY

BIG READS BIG WRITES

Page 25: Unorthodox Paths to High Performance - QCon NY 2016

WHICH DISTRIBUTED SORTING ALGORITHM?

Page 26: Unorthodox Paths to High Performance - QCon NY 2016

MERGESORT1 2 8 7 12 103 5 4 6 9 11

31 2 4 5 6 7 98 12 10 11

1 2 3 4 5 6 7 8 9 1210 11

Page 27: Unorthodox Paths to High Performance - QCon NY 2016

THE TROUBLE WITH MERGESORTAT SCALE, MANY SORTED CHUNKS

FETCHING RANDOMLY CAUSES SEEKS

Page 28: Unorthodox Paths to High Performance - QCon NY 2016

DISTRIBUTION SORT

3 1 24 56

1 2 8 7 12 103 5 4 6 9 11

8 7 9 12 10 11

31 2 4 65 7 8 9 1210 11

Page 29: Unorthodox Paths to High Performance - QCon NY 2016

DISTRIBUTION SORT

‣ BIG WRITES IN FIRST PASS ‣ SEQUENTIAL I/O IN SECOND PASS

Page 30: Unorthodox Paths to High Performance - QCon NY 2016

NOW, TO BUILDING!

Page 31: Unorthodox Paths to High Performance - QCon NY 2016

IT WON'T JUST MAGICALLY

BE FAST

Page 32: Unorthodox Paths to High Performance - QCon NY 2016

EXPERIMENTATION!

Page 33: Unorthodox Paths to High Performance - QCon NY 2016

FLEXIBLE MODULAR GRAPHS

Page 34: Unorthodox Paths to High Performance - QCon NY 2016

STAGE

Page 35: Unorthodox Paths to High Performance - QCon NY 2016

READ HASH SEND

RECEIVE PARTITION WRITE

NETWORK

READ SORT WRITE

Page 36: Unorthodox Paths to High Performance - QCon NY 2016

SYNTHETIC HASH SEND

RECEIVE PARTITION SINK

NETWORK

REALLY FAST DISKS?

Page 37: Unorthodox Paths to High Performance - QCon NY 2016

READ HASH SINK

SYNTHETIC PARTITION WRITE

NETWORK

REALLY FAST NETWORK?

Page 38: Unorthodox Paths to High Performance - QCon NY 2016

DIFFERENT SORTS?READ RADIX SORT WRITE

READ QUICKSORT WRITE

READ TIMSORT WRITE

Page 39: Unorthodox Paths to High Performance - QCon NY 2016

MEASUREMENT

Page 40: Unorthodox Paths to High Performance - QCon NY 2016

HOW LARGE ARE WRITES?

WHERE IS THE BOTTLENECK?

ARE STAGES BLOCKED? IDLE?

Page 41: Unorthodox Paths to High Performance - QCon NY 2016

TONS AND TONS OF LOGS

Page 42: Unorthodox Paths to High Performance - QCon NY 2016
Page 43: Unorthodox Paths to High Performance - QCon NY 2016

AGGREGATES

Page 44: Unorthodox Paths to High Performance - QCon NY 2016

TIME-SERIES

Page 45: Unorthodox Paths to High Performance - QCon NY 2016

RUNTIME INFO

Page 46: Unorthodox Paths to High Performance - QCon NY 2016

ORGANIZING LOGS/2016/06/14/frob_widgets_1.tar.bz2

Page 47: Unorthodox Paths to High Performance - QCon NY 2016

ORGANIZING LOGS/2016/06/14/frob_widgets_1.tar.bz2 /cluster_nodes.txt /node.conf

Page 48: Unorthodox Paths to High Performance - QCon NY 2016

ORGANIZING LOGS/2016/06/14/frob_widgets_1.tar.bz2 /cluster_nodes.txt /node.conf /notes.md

Page 49: Unorthodox Paths to High Performance - QCon NY 2016

SMARTLY CONTROLLING MEMORY

Page 50: Unorthodox Paths to High Performance - QCon NY 2016
Page 51: Unorthodox Paths to High Performance - QCon NY 2016

BUFFER POOLS ARE FAST AND SIMPLE AND INFLEXIBLE

Page 52: Unorthodox Paths to High Performance - QCon NY 2016

MALLOC ISSIMPLE AND FLEXIBLE

AND DANGEROUS

Page 53: Unorthodox Paths to High Performance - QCon NY 2016

By default, Linux follows an optimistic memory allocation strategy. This means that when malloc() returns non-NULL there is no guarantee that the memory really is available.

In case it turns out that the system is out of memory, one or more processes will be killed by the OOM killer.

Page 54: Unorthodox Paths to High Performance - QCon NY 2016
Page 55: Unorthodox Paths to High Performance - QCon NY 2016

ARE WE SOLVING THE RIGHT PROBLEM?

Page 56: Unorthodox Paths to High Performance - QCon NY 2016

WHAT IF MALLOC WAITED?WAITING PROVIDES BACKPRESSURE

CALLERS CAN BE SCHEDULED INTERFACE STAYS SIMPLE

DECISIONS CAN BE GLOBAL

Page 57: Unorthodox Paths to High Performance - QCon NY 2016

POOLS QUOTAS

CONSTRAINTS

Page 58: Unorthodox Paths to High Performance - QCon NY 2016

WRAPPING UP

Page 59: Unorthodox Paths to High Performance - QCon NY 2016

300 MBPS PER NODE

15.6 GB PER SECOND938 GB PER MINUTE

52 NODES

Page 60: Unorthodox Paths to High Performance - QCon NY 2016

632 MBPS PER NODE

112.6 GB PER SECOND6757 GB PER MINUTE

178 NODES

Page 61: Unorthodox Paths to High Performance - QCon NY 2016

LESSONS LEARNED:

Page 62: Unorthodox Paths to High Performance - QCon NY 2016

BOTTLENECKS SHAPE YOUR

ARCHITECTURE

Page 63: Unorthodox Paths to High Performance - QCon NY 2016

STRUCTURE SOFTWARE FOR EXPERIMENTATION

AND MEASUREMENT

Page 64: Unorthodox Paths to High Performance - QCon NY 2016

SAVE YOUR LOGS SAVE YOUR CONFIG SAVE YOUR NOTES

Page 65: Unorthodox Paths to High Performance - QCon NY 2016

SOMETIMES YOU NEED MORE CONTROL THAN THE OS WILL GIVE YOU

Page 66: Unorthodox Paths to High Performance - QCon NY 2016

GOING FAST IS HARD

Page 67: Unorthodox Paths to High Performance - QCon NY 2016

THANKS