java concurrent optimization: concurrent queue

26

Click here to load reader

Upload: min-zhou

Post on 15-Jan-2015

2.303 views

Category:

Technology


20 download

DESCRIPTION

Step by step optimize a BlockingQueue, make the ops from 3m to 110m

TRANSCRIPT

Page 1: Java Concurrent Optimization: Concurrent Queue

并发队列篇

作者:周忱 | 数据平台-DXP微博:@MinZhou

邮箱:[email protected]

Page 2: Java Concurrent Optimization: Concurrent Queue

Java并发编程优化之阻塞队列

关于我• 花名:周忱(chén)• 真名:周敏• 微博: @MinZhou• Twitter: @minzhou• 2010年6月加入淘宝• 曾经淘宝Hadoop&Hive研发

组Leader• 目前云梯跨机房临时工• Hive Contributor• 自由、开源软件热爱者

Data eXchange Platform| zhouchen.zm

Page 3: Java Concurrent Optimization: Concurrent Queue

Java并发编程优化之阻塞队列

关于我• 花名:周忱(chén)• 真名:周敏• 微博: @MinZhou• Twitter: @minzhou• 2010年6月加入淘宝• 曾经淘宝Hadoop&Hive研发

组Leader• 目前云梯跨机房临时工• Hive Contributor• 自由、开源软件热爱者

Data eXchange Platform| zhouchen.zm

Page 4: Java Concurrent Optimization: Concurrent Queue

队列是什么?

Java并发编程优化之阻塞队列

Data eXchange Platform| zhouchen.zm

Page 5: Java Concurrent Optimization: Concurrent Queue

队列是什么?

Java并发编程优化之阻塞队列

Data eXchange Platform| zhouchen.zm

Page 6: Java Concurrent Optimization: Concurrent Queue

队列的运用

Java并发编程优化之阻塞队列

Data eXchange Platform| zhouchen.zm

Page 7: Java Concurrent Optimization: Concurrent Queue

ArrayBlockingQueue & LinkedBlockingQueue

Java并发编程优化之阻塞队列

Data eXchange Platform| zhouchen.zm

• BlockingQueue

• ArrayBlockingQueue: 数组实现

• LinkedBlockingQueue: 链表实现

• Ops约300万

Page 8: Java Concurrent Optimization: Concurrent Queue

队列的性能问题

Java并发编程优化之阻塞队列

Data eXchange Platform| zhouchen.zm

• Linked list is the EVIL of performance

• 在head, tail和size三个变量的写冲突

• put/take和offer/poll上的大锁

• GC问题

Page 9: Java Concurrent Optimization: Concurrent Queue

单Writer原则

Java并发编程优化之阻塞队列

Data eXchange Platform| zhouchen.zm

方法 时间(ms)

单线程 long 300

单线程 volatile long 4,700

单线程 AtomicLong(CAS ) 5,700

双线程 AtomicLong(CAS ) 18,000

单线程synchronized + long 10,000

双线程synchronized + long 118,000

• 一个变量递增500,000,000次所需时间

Page 10: Java Concurrent Optimization: Concurrent Queue

第一步:环形队列

Java并发编程优化之阻塞队列

Data eXchange Platform| zhouchen.zm

• 没有写冲突, 不需要上锁, 甚至不需要CAS

• 采用volatile关键字让对方线程可见

• 不需要维护size

• Ops约1100万

Page 11: Java Concurrent Optimization: Concurrent Queue

内存屏障

Java并发编程优化之阻塞队列

Data eXchange Platform| zhouchen.zm

• Load Buffer

• Store Buffer

• CPU串行化指令

– CPUID

– SFENCE

– LFENCE

– MFENCE

• Lock系指令

Page 12: Java Concurrent Optimization: Concurrent Queue

第二步:lazySet

Java并发编程优化之阻塞队列

Data eXchange Platform| zhouchen.zm

• AtomicXXX.lazySet()保证StoreStore

• 但不保证StoreLoad

• 保证最终一致性

• 一个轻量的volatile

• Unsafe.putOrderedXXX

• Ops约1700万

"This is a niche method that is sometimes useful when fine-tuning code using non-blocking data structures. The semantics are that the write is guaranteed not to be re-ordered with any previous write, but may be reordered with subsequent operations(or equivalently, might not be visible to other threads) until some other volatile write or synchronizing action occurs).“

--Doug Lea

Page 13: Java Concurrent Optimization: Concurrent Queue

第三步:求模优化

Java并发编程优化之阻塞队列

Data eXchange Platform| zhouchen.zm

• & (k pow 2) - 1 替代%

• Ops约2200万

public boolean offer(final E e) {

…buffer[(int) (currentTail % buffer.length)] = e;…

}

public boolean offer(final E e) {

…buffer[(int) currentTail & mask] = e;…

}

Page 14: Java Concurrent Optimization: Concurrent Queue

False Sharing

Java并发编程优化之阻塞队列

Data eXchange Platform| zhouchen.zm

Page 15: Java Concurrent Optimization: Concurrent Queue

第四步:去除伪共享

Java并发编程优化之阻塞队列

Data eXchange Platform| zhouchen.zm

• Ops约4000万

public class PaddedAtomicLong extends AtomicLong {private static final long serialVersionUID = 1L;

public PaddedAtomicLong() {}

public PaddedAtomicLong(final long initialValue) {super(initialValue);

}

public long p1, p2, p3, p4, p5, p6;}

Page 16: Java Concurrent Optimization: Concurrent Queue

CPU Cache

Java并发编程优化之阻塞队列

Data eXchange Platform| zhouchen.zm

Page 17: Java Concurrent Optimization: Concurrent Queue

内存排布对性能的影响

Java并发编程优化之阻塞队列

Data eXchange Platform| zhouchen.zm

• 测试

– 顺序读取内存数据

– 在一个内存页内随机, 然后转到另外的页内随机

– 全随机访问

• https://gist.github.com/coderplay/4453283

Page 18: Java Concurrent Optimization: Concurrent Queue

Cache Line

Java并发编程优化之阻塞队列

Data eXchange Platform| zhouchen.zm

cat /sys/devices/system/cpu/cpu0/cache/index0/*

Page 19: Java Concurrent Optimization: Concurrent Queue

第五步:优化内存排布

Java并发编程优化之阻塞队列

Data eXchange Platform| zhouchen.zm

• 使用Direct ByteBuffer• 使用Unsafe使页对齐• 内存连续

• Ops约6800万

Page 20: Java Concurrent Optimization: Concurrent Queue

第六步:yield() vs LockSupport.parkNanos(1)

Java并发编程优化之阻塞队列

Data eXchange Platform| zhouchen.zm

• 减少StoreLoad

• 减少CPU相干性的噪声,从而提高cache命中

• Ops约1亿1000万

Page 21: Java Concurrent Optimization: Concurrent Queue

其它优化

Java并发编程优化之阻塞队列

Data eXchange Platform| zhouchen.zm

• 环形队列预分配,零GC

• 批量生产及消费

• Wait free

• Ops可达2亿2000万!

• CPU亲缘

Page 22: Java Concurrent Optimization: Concurrent Queue

思考

Java并发编程优化之阻塞队列

Data eXchange Platform| zhouchen.zm

• 多消费者

• 多生产者

Page 23: Java Concurrent Optimization: Concurrent Queue

工具

Java并发编程优化之阻塞队列

Data eXchange Platform| zhouchen.zm

• top

• vmstat

• lscpu

• perf

• Valgrind tools suite

• OProfile

• SystemTap

• numactl

• Intel Vtune

• Intel PTU

• Intel PCM + ksysguard

• MAT

Page 24: Java Concurrent Optimization: Concurrent Queue

代码

Java并发编程优化之阻塞队列

Data eXchange Platform| zhouchen.zm

$git clone https://github.com/coderplay/javaopt.git

$java –cp bin javaopt.queue.QueuePerfTest n

Page 25: Java Concurrent Optimization: Concurrent Queue

推荐读物

Java并发编程优化之阻塞队列

Data eXchange Platform| zhouchen.zm

• What every programmer should know about memory

• Intel® 64 and IA-32 Architectures Software Developer Manuals

• The Art of Multiprocessor Programming

• The JSR-133 Cookbook for Compiler Writers (Java Memory Model)

• 本人博客: http://coderplay.javaeye.com

Page 26: Java Concurrent Optimization: Concurrent Queue

Q & A

Data eXchange Platform| zhouchen.zm

Java并发编程优化之阻塞队列

作者:周忱 | 数据平台-DXP微博:@MinZhou

邮箱:[email protected]