scalability update, 2007 - freebsd.org personal home pageskris/scaling/scalability... ·...

24
Scalability Update, 2007 Kris Kennaway The FreeBSD Project [email protected] May 17, 2007 Progress on FreeBSD SMP performance and scalability since BSDCan Dev Summit 2006 Kris Kennaway The FreeBSD Project [email protected] Scalability Update, 2007

Upload: others

Post on 25-Jun-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Scalability Update, 2007 - FreeBSD.org personal home pageskris/scaling/Scalability... · 2007-05-17 · MySQL: Progress in FreeBSD 5.x and 6.x 0 500 1000 1500 2000 2500 3000 3500

Scalability Update, 2007

Kris KennawayThe FreeBSD [email protected]

May 17, 2007

Progress on FreeBSD SMP performance and scalability sinceBSDCan Dev Summit 2006

Kris Kennaway The FreeBSD Project [email protected] Scalability Update, 2007

Page 2: Scalability Update, 2007 - FreeBSD.org personal home pageskris/scaling/Scalability... · 2007-05-17 · MySQL: Progress in FreeBSD 5.x and 6.x 0 500 1000 1500 2000 2500 3000 3500

Outline

1. A case study: SQL database performance

2. Major SMP changes since May 2006

3. Remnants of the Giant lock

4. Future work and new SMP goals

Kris Kennaway The FreeBSD Project [email protected] Scalability Update, 2007

Page 3: Scalability Update, 2007 - FreeBSD.org personal home pageskris/scaling/Scalability... · 2007-05-17 · MySQL: Progress in FreeBSD 5.x and 6.x 0 500 1000 1500 2000 2500 3000 3500

A Case Study: SQL database performance

Why Database?

I Off-the-shelf benchmark (sysbench;/usr/ports/benchmarks/sysbench/)

I Seems to be a reasonable benchmark (as opposed tosuper-smack; 1-byte I/O!)

I FreeBSD did not perform well compared to Linux; excellentmotivator for performance improvements

What Database?

I mySQL 5.0.37 (thread-based)

I PostgreSQL 8.2.4 (process-based + SysV IPC)

Kris Kennaway The FreeBSD Project [email protected] Scalability Update, 2007

Page 4: Scalability Update, 2007 - FreeBSD.org personal home pageskris/scaling/Scalability... · 2007-05-17 · MySQL: Progress in FreeBSD 5.x and 6.x 0 500 1000 1500 2000 2500 3000 3500

Configuration details

I amd64 system, 4*dual-core 2.2GHz CPUs

I 3GB RAM (working set fits in RAM)I sysbench OLTP test

I Complex transaction-based queriesI Read-only; no disk access to avoid benchmarking disk

performance

I multithreaded benchmark client

I clients and servers on the same system

I communication via UNIX domain socket

I 2 minute runs, after a warm-up run with 8 clients.

Kris Kennaway The FreeBSD Project [email protected] Scalability Update, 2007

Page 5: Scalability Update, 2007 - FreeBSD.org personal home pageskris/scaling/Scalability... · 2007-05-17 · MySQL: Progress in FreeBSD 5.x and 6.x 0 500 1000 1500 2000 2500 3000 3500

MySQL: Progress in FreeBSD 5.x and 6.x

0

500

1000

1500

2000

2500

3000

3500

4000

2 4 6 8 10 12 14 16 18 20

Tra

nsac

tions

/sec

Concurrency (# client threads)

5.5-4bsd-c_r5.5-4bsd-kse6.2-4bsd-kse6.2-4bsd-thr

Kris Kennaway The FreeBSD Project [email protected] Scalability Update, 2007

Page 6: Scalability Update, 2007 - FreeBSD.org personal home pageskris/scaling/Scalability... · 2007-05-17 · MySQL: Progress in FreeBSD 5.x and 6.x 0 500 1000 1500 2000 2500 3000 3500

MySQL: The competition (FreeBSD vs Linux)

0

500

1000

1500

2000

2500

3000

3500

4000

2 4 6 8 10 12 14 16 18 20

Tra

nsac

tions

/sec

Concurrency (# client threads)

5.5-4bsd-c_r5.5-4bsd-kse6.2-4bsd-kse6.2-4bsd-thrlinux 2.6.19

Kris Kennaway The FreeBSD Project [email protected] Scalability Update, 2007

Page 7: Scalability Update, 2007 - FreeBSD.org personal home pageskris/scaling/Scalability... · 2007-05-17 · MySQL: Progress in FreeBSD 5.x and 6.x 0 500 1000 1500 2000 2500 3000 3500

MySQL: Progress in FreeBSD 7.0 through February 2007

0

500

1000

1500

2000

2500

3000

3500

4000

2 4 6 8 10 12 14 16 18 20

Tra

nsac

tions

/sec

Concurrency (# client threads)

5.5-4bsd-c_r5.5-4bsd-kse6.2-4bsd-kse6.2-4bsd-thrlinux 2.6.19

7.0-4bsd-thr-feb97.0-ule-thr-feb9

Kris Kennaway The FreeBSD Project [email protected] Scalability Update, 2007

Page 8: Scalability Update, 2007 - FreeBSD.org personal home pageskris/scaling/Scalability... · 2007-05-17 · MySQL: Progress in FreeBSD 5.x and 6.x 0 500 1000 1500 2000 2500 3000 3500

MySQL: Where we are today

0

500

1000

1500

2000

2500

3000

3500

4000

2 4 6 8 10 12 14 16 18 20

Tra

nsac

tions

/sec

Concurrency (# client threads)

5.5-4bsd-c_r5.5-4bsd-kse6.2-4bsd-kse6.2-4bsd-thrlinux 2.6.19

7.0-4bsd-thr-feb97.0-ule-thr-feb9

7.0-4bsd-thr-may137.0-ule-thr-may13

Kris Kennaway The FreeBSD Project [email protected] Scalability Update, 2007

Page 9: Scalability Update, 2007 - FreeBSD.org personal home pageskris/scaling/Scalability... · 2007-05-17 · MySQL: Progress in FreeBSD 5.x and 6.x 0 500 1000 1500 2000 2500 3000 3500

Linux: Blame it on malloc? glibc vs tcmalloc

Source: http://ozlabs.org/∼anton/linux/sysbench/

Kris Kennaway The FreeBSD Project [email protected] Scalability Update, 2007

Page 10: Scalability Update, 2007 - FreeBSD.org personal home pageskris/scaling/Scalability... · 2007-05-17 · MySQL: Progress in FreeBSD 5.x and 6.x 0 500 1000 1500 2000 2500 3000 3500

Linux: Not the whole story?

I Some reports on LKML of difficulty reproducing the tcmallocresult on 8-core systems

I tcmalloc claimed not to be a good general-purpose allocatorI A process-global semaphore acquired by all linux futex

operations.I Agrees with the asymptotically single-threaded performance

measured on amd64

...not our problem!

Kris Kennaway The FreeBSD Project [email protected] Scalability Update, 2007

Page 11: Scalability Update, 2007 - FreeBSD.org personal home pageskris/scaling/Scalability... · 2007-05-17 · MySQL: Progress in FreeBSD 5.x and 6.x 0 500 1000 1500 2000 2500 3000 3500

MySQL: jemalloc vs phkmalloc

0

500

1000

1500

2000

2500

3000

3500

4000

2 4 6 8 10 12 14 16 18 20

Tra

nsac

tions

/sec

Concurrency (# client threads)

7.0-ule-thr-may137.0-ule-thr-phkmalloc-may13

Kris Kennaway The FreeBSD Project [email protected] Scalability Update, 2007

Page 12: Scalability Update, 2007 - FreeBSD.org personal home pageskris/scaling/Scalability... · 2007-05-17 · MySQL: Progress in FreeBSD 5.x and 6.x 0 500 1000 1500 2000 2500 3000 3500

PostgreSQL

0

1000

2000

3000

4000

5000

2 4 6 8 10 12 14 16 18 20

Tra

nsac

tions

/sec

Concurrency (# client threads)

5.5-4bsd6.2-4bsd

7.0-4bsd-cvs-may137.0-ule-cvs-feb01

7.0-ule-cvs-may137.0-ule-phkmalloc-may13

Kris Kennaway The FreeBSD Project [email protected] Scalability Update, 2007

Page 13: Scalability Update, 2007 - FreeBSD.org personal home pageskris/scaling/Scalability... · 2007-05-17 · MySQL: Progress in FreeBSD 5.x and 6.x 0 500 1000 1500 2000 2500 3000 3500

Progress since BSDCan 2006 (highlights)

I More efficient lock profilingI Still very inefficient; explore a worker thread model?

I UNIX domain socket locking; fine-grained locking for betterconcurrency

I Experimental scheduler locking work (on-going)

I reader-writer (rw) locks (non-sleepable)

I Optimized sx locks (sleepable); now an efficient low-levelprimitive

I File descriptor locking; home-brewed msleep lock → sx

Kris Kennaway The FreeBSD Project [email protected] Scalability Update, 2007

Page 14: Scalability Update, 2007 - FreeBSD.org personal home pageskris/scaling/Scalability... · 2007-05-17 · MySQL: Progress in FreeBSD 5.x and 6.x 0 500 1000 1500 2000 2500 3000 3500

Progress since BSDCan 2006 II

I ULE 2.0. Now significantly out-performing 4BSD on the SMPworkloads that have been benchmarked.

I A remaining performance anomaly: poor CPU affinity atintermediate loads on dual-core systems

I Benchmarking on other workloads required

I Giant pushdown

I Network stack optimization and regularizationI libthr is now the default thread library in 7.0 (finally!)

I Should also be default on 6.x

I CAM subsystem locking (scottl)

Kris Kennaway The FreeBSD Project [email protected] Scalability Update, 2007

Page 15: Scalability Update, 2007 - FreeBSD.org personal home pageskris/scaling/Scalability... · 2007-05-17 · MySQL: Progress in FreeBSD 5.x and 6.x 0 500 1000 1500 2000 2500 3000 3500

Conclusion of the SMPng project

The SMPng project was launched in June 2000 and was a majorfocus of development in FreeBSD 5.x and above.

“The goal of the SMPng Project is to decompose theGiant lock into a number of smaller locks, resulting inreduced contention (and improved SMP performance).”

The SMPng project was formally concluded in February 2007 afterachieving this goal.

Kris Kennaway The FreeBSD Project [email protected] Scalability Update, 2007

Page 16: Scalability Update, 2007 - FreeBSD.org personal home pageskris/scaling/Scalability... · 2007-05-17 · MySQL: Progress in FreeBSD 5.x and 6.x 0 500 1000 1500 2000 2500 3000 3500

Remnants of the Giant lock

The FreeBSD 7.0 kernel is mostly Giant-free. RemainingGiant-locked systems are:

I TTY locking. In most configurations the Giant lock has noweffectively devolved onto a Giant TTY lock.

I Some drivers (some CAM SIMs, pseudo-devices)

I Some filesystems (MSDOS, SMB, NFSv4, . . .)

I Some network protocols (IPX over IP, IPv6 ND6/MLD6,netatm, . . .)

I newbus (scottl; work in progress)

I USB, Firewire

I Parts of VM (contigmalloc, . . .)

I Some scattered use elsewhere: sysctls, file operations,sysarch(), . . .

See http://wiki.freebsd.org/SMPTODO for the up-to-date listof Giant-locked code and project ownership.

Kris Kennaway The FreeBSD Project [email protected] Scalability Update, 2007

Page 17: Scalability Update, 2007 - FreeBSD.org personal home pageskris/scaling/Scalability... · 2007-05-17 · MySQL: Progress in FreeBSD 5.x and 6.x 0 500 1000 1500 2000 2500 3000 3500

Danish axe time!

Kris Kennaway The FreeBSD Project [email protected] Scalability Update, 2007

Page 18: Scalability Update, 2007 - FreeBSD.org personal home pageskris/scaling/Scalability... · 2007-05-17 · MySQL: Progress in FreeBSD 5.x and 6.x 0 500 1000 1500 2000 2500 3000 3500

Legacy Giant-locked code to be removed in 7.0

I KAME IPSec; in favour of Fast IPSec (gnn, bz)

I I4B ISDN stack (rwatson)

I Legacy drivers: an arl awi cnw ce cp cs ctau cx en ex fe hfa idtie if ic oltr mn pcf pdq ray sbni sbsh snc sr tx wl xe (rwatson)

I netatm? (rwatson, Skip Ford)

Kris Kennaway The FreeBSD Project [email protected] Scalability Update, 2007

Page 19: Scalability Update, 2007 - FreeBSD.org personal home pageskris/scaling/Scalability... · 2007-05-17 · MySQL: Progress in FreeBSD 5.x and 6.x 0 500 1000 1500 2000 2500 3000 3500

Future work I: locking primitives

Legacy/home-grown locking primitivesI lockmgr

I Proof of concept from upsI 1000 concurrent stat calls of the same file on an 8-core:

reduces system time by 95% and real time by 82%.I lockmgr to be rewritten by Attilio Rao for Google SoC 2007

I Any others lurking?I msleep() is a pessimal synchronization/serialization primitive!

I Lesson of SMPng: when the lock owner is running onanother CPU it is almost always much better to spin insteadof always going to sleep immediately.

I Use standard primitives instead of rolling your own.

I Consolidation of primitives; too many?I sema()I msleep()/tsleep()/wakeup()/. . .→ cv *()?I . . .?

Kris Kennaway The FreeBSD Project [email protected] Scalability Update, 2007

Page 20: Scalability Update, 2007 - FreeBSD.org personal home pageskris/scaling/Scalability... · 2007-05-17 · MySQL: Progress in FreeBSD 5.x and 6.x 0 500 1000 1500 2000 2500 3000 3500

Future work II: General

I Look for opportunities for shared locking - but profilecarefully!

I e.g. namei() is an “obvious” candidate for using a rwlock, butthis slows it down significantly.

I probably because it exposes lockmgr to concurrent access.

I select() locking (giant select lock)I How can it be optimized? This is the major remaining source

of lock contention/wait time for the SQL benchmarks(contended on 50-75% of acquisitions)

I SysV IPC; POSIX advisory locksI Sanitize, explore finer-grained locking

I Remove non-MPSAFE subsystem crutchesI VM

I vm page queue mutex is heavily contended

Kris Kennaway The FreeBSD Project [email protected] Scalability Update, 2007

Page 21: Scalability Update, 2007 - FreeBSD.org personal home pageskris/scaling/Scalability... · 2007-05-17 · MySQL: Progress in FreeBSD 5.x and 6.x 0 500 1000 1500 2000 2500 3000 3500

Future work III: Filesystem and Scheduler

Filesystem workI More MPSAFE filesystems needed

I MSDOS: Brian Chu, Google SoC 2007

I Shared lookups for UFSI Enables other work: rwlock for namei()

SchedulerI scheduler lock is abused for non-scheduler purposes (rusage,

ldt, vmmeter, timers)I //depot/user/attilio/attilio schedlock

I global scheduler lock appears to be a barrier to scaling on ≥ 8CPUs.

I WIP to explore per-CPU scheduler locks with ULE; jeffr,attilio

Kris Kennaway The FreeBSD Project [email protected] Scalability Update, 2007

Page 22: Scalability Update, 2007 - FreeBSD.org personal home pageskris/scaling/Scalability... · 2007-05-17 · MySQL: Progress in FreeBSD 5.x and 6.x 0 500 1000 1500 2000 2500 3000 3500

Future work IV: Network stack

I Explore methods for improving parallelism in networkprocessing

I Driver workI e.g. bce driver performs poorly with concurrent send/receiveI others not yet evaluated

I multiple netisr worker threadsI e.g. loopback transport; very easy to saturate a single netisr

I important for services that cannot use unix domain socket forlocal transport

I Optimization work to support 10ge (kmacy)

Kris Kennaway The FreeBSD Project [email protected] Scalability Update, 2007

Page 23: Scalability Update, 2007 - FreeBSD.org personal home pageskris/scaling/Scalability... · 2007-05-17 · MySQL: Progress in FreeBSD 5.x and 6.x 0 500 1000 1500 2000 2500 3000 3500

New SMP Objectives: consolidation and optimization

1. Identify additional workloads to be optimized.I You can help! We need to identify good benchmarks

modelling real-world performance cases.I Ideally should be simple to set up and operate.I If you have a test case I will be happy to work with you to

profile it.

2. Consolidate performance on 8-core systems without regressingperformance on < 8 CPU cores

3. Ready to push scaling upwards to 16 and 32-core systemsI ...but need access to hardwareI sun4v? Not currently stable.I Need access to 16-core amd64 hardware

Kris Kennaway The FreeBSD Project [email protected] Scalability Update, 2007

Page 24: Scalability Update, 2007 - FreeBSD.org personal home pageskris/scaling/Scalability... · 2007-05-17 · MySQL: Progress in FreeBSD 5.x and 6.x 0 500 1000 1500 2000 2500 3000 3500

An empirical observation

The scaling of FreeBSD n.x appears to be governed by

NCPUs ∼ 2n−4, n ≥ 4

Kris Kennaway The FreeBSD Project [email protected] Scalability Update, 2007