ceph performance on openstack - barcelona summit
TRANSCRIPT
Ceph Performance on OpenStack (Over 50,000 25,000 Benchmarks!)
Open Standard Cloud Association(OSCA)
Takehiro Kudou, Hitachi Solutions ,Ltd.
(http://www.slideshare.net/tkkd/)
Takanori Suzuki, Dell Japan Inc.
OpenStack Summit Barcelona 2016 #vBrownBag
OSCA Introduction
Founded in 2012, the Open Standard Cloud Association (OSCA) partners with the Japan’s leading companies and public organization to solve the technology problems and accelerate next-Gen open standard cloud technology and commercial adoption.
Today, OSCA v2. 0 expands the scope of activity of the IoT solution.
OSCA Activity
OSCA is always open for everyone
• Proof of Concept, Technical Blog, Whitepaper
• Seminar, Event (Lightning Talk, Networking event for Engineers)
Current working project
• Performance assessments (Ceph, I/O on Docker, NFVI for vCPE)
• Others (Networking OS comparison on Whitebox switch, BigData/IoT Labs
OpenStack working group Initiatives
• OpenStack on Ceph performance assessment / Design Guide
• OpenStack for NFV solution
• Physical/VM to OpenStack Migration Guide
• Network Design Guide for OpenStack
• Neutron plug-in OVS vs MidoNet comparison
• Swift Step by Step Guide
© Hitachi Solutions, Ltd. 2016. All rights reserved.
Lead Engineer, Research & Development Department,
Hitachi Solutions, Ltd.
Oct. 27, 2016
Takehiro Kudou
Ceph Performance on OpenStack
(Over 50,000 25,000 Benchmarks!) ~Benchmark Result~
OpenStack Summit Barcelona 2016 #vBrownBag
© Hitachi Solutions, Ltd. 2016. All rights reserved.
1. Benchmark Method
2. Result and Analysis on Ceph 1.3
3. Result on Ceph 2.0 BlueStore
Tasting of Ceph 2.0 BlueStore.
5
Contents
6 © Hitachi Solutions, Ltd. 2016. All rights reserved.
1. Benchmark Method
7 © Hitachi Solutions, Ltd. 2016. All rights reserved.
1-1 Benchmark Environments
Instances on
RHEL-OSP7(Juno)/RHEL-OSP9(Mitaka) - 3 OpenStack Compute Servers
-- 9 instances per Compute Server
(total 27 instances)
OSD Servers
[RHCS1.3(Hammer)/RHCS2.0(Jewel)] - 6 Ceph OSD Servers (total 18 OSD Disks)
-- [ONLY 1.3]Journal: 320GB SSD x1
(50GB x 3 partitions)
-- OSD Disk: 600GB SAS 10Krpm x3 (JBOD)
■ Benchmark from 27 instances to 18 OSD Disks
8 © Hitachi Solutions, Ltd. 2016. All rights reserved.
1-2 Benchmark Procedure
ssh (user@instance IP) fio -rw=[read/write] -size=1G -ioengine=libaio -iodepth=4 -invalidate=1 -direct=1 -
name=test.bin -runtime=120 -bs=[BlockSize]k -numjobs=[Jobs] -group_reporting > (file name) & ssh・・・・
Benchmark Duration: 2min for each, totally 96 hours.
- VM: 1 | 2 | 3 | 9 | 27
- [read/write]: randread | randwrite
- [BlockSize]: 4 | 16 | 32 | 64 | 128
- [Jobs]: 1 | 4 | 8 | 16
- 3 loops
- OSD Servers: 3 | 4 | 5 | 6
- Ceph 1.3 | Ceph 2.0
■Fio Parameter Options
(1+2+3+9+27)
x 2
x 5
x 4
x 3
x 4
x 2
=40,320 20160
+10,000 5,000
test data We encountered
a trouble!
9 © Hitachi Solutions, Ltd. 2016. All rights reserved.
2. Result and Analysis on Ceph 1.3
10 © Hitachi Solutions, Ltd. 2016. All rights reserved.
2-1 Read Benchmark Result
No impacts if
number of OSD
servers has
changed.
Total throughput
exceed 20GB/s
(160Gbps)
NICs:10Gbps x3
It seems something
(memory cache?)
had influence to
benchmark data.
11 © Hitachi Solutions, Ltd. 2016. All rights reserved.
2-2 Write Benchmark Result
Performance
linearly increased .
Performance is not
good!
SSD journal cache
seems not to
impact both IOPS
and throughput in
this case.
zoom
12 © Hitachi Solutions, Ltd. 2016. All rights reserved.
2-3 Cause of Write Slow Problem
Write Access
to HDD
Increase
Write Queue
Continuous
Incoming Data
Force Sync
Wait for
Sync Finished
OSD Disk Parameter
--filestore_max_sync_interval 10
(Default 5 seconds)
Performance concerns
- HDD rpms
- Number of HDD
If not,
Journal SSD does NOT work effectively
13 © Hitachi Solutions, Ltd. 2016. All rights reserved.
3. Result on Ceph 2.0 BlueStore Tasting of Ceph 2.0 BlueStore
14 © Hitachi Solutions, Ltd. 2016. All rights reserved.
3-1 Ceph 2.0 BlueStore Overview
・BlueStore, a new OSD backend
- Red Hat Ceph Storage 2 (Jewel)’s
Tech Preview function.
- Direct access to Block Device
(Journal bypass)
・Benchmark Environment
- RHEL-OSP9 (Mitaka)
- RHCS2(Jewel) with BlueStore
・There are critical bugs
Ref.)http://www.slideshare.net/sageweil1/bluestore-a-new-faster-storage-backend-for-ceph
http://redhatstorage.redhat.com/2016/06/23/the-milestone-of-red-hat-ceph-storage-2/
15 © Hitachi Solutions, Ltd. 2016. All rights reserved.
3-2 Trouble 1
[cloud-user@cephbs-15 ~]$ fio
Segmentation fault
[cloud-user@cephbs-15 ~]$ md5sum /usr/bin/fio
0ff2a797ba777aced3c7979a1309ff6c /usr/bin/fio
[cloud-user@cephbs-16 ~]$ md5sum /usr/bin/fio
4f50ea445bd7a8aaae17abcd323dc3c5 /usr/bin/fio
■ Fio binary was broken.
The data from 3000(0x16) were LOST!!!!
Hit the bug that causes dirty blob!! Ref.) Re: segfault in bluestore during random writes (http://www.spinics.net/lists/ceph-devel/msg31384.html)
os/bluestore: refactor dirty blob tracking along with some related fixes #10215 (https://github.com/ceph/ceph/pull/10215)
・Hex dump comparison
16 © Hitachi Solutions, Ltd. 2016. All rights reserved.
3-3 Trouble 2
・Half of OSDs were BROKEN!!
17 © Hitachi Solutions, Ltd. 2016. All rights reserved.
Summary
・Ceph 1.3
- Read Performance : Extremely high (Hitting Memory Cache?)
- Write Performance : Not so good under heavy I/O
Journal cache is not effective
with heavy & long time write access pattern.
・Ceph 2.0’s BlueStore
- Not mature enough in tech preview
- Recommend to wait until it becomes a bit more stable
Special Thanks
•Hirotada Sasaki, Red Hat K.K.
•Masayoshi Hibino, Dell Japan Inc.
•Kazuho Hirahara, Hitachi Solutions, Ltd.
You can download benchmark graphs of Ceph 1.3. (Japanese document)
http://ja.community.dell.com/techcenter/m/mediagallery/3739/download
Ceph Performance on OpenStack
(Over 50,000 Benchmarks!)
Open Standard Cloud Association(OSCA)
Linux is a trademark of Linus Torvalds. The OpenStack(R) Word Mark and OpenStack Logo are either registered trademarks/service marks or trademarks/service marks of the OpenStack Foundation in the United States and other countries and are used with the OpenStack Foundation's permission. We are not affiliated with, endorsed or sponsored by the OpenStack Foundation, or the OpenStack community. OSCA™(Open Standard Cloud Association)is a trademark of Dell Japan Inc. PowerEdge, Dell and the Dell logo are trademarks of Dell Inc. RED HAT is a registered trademark of Red Hat, Inc. Other company and product names mentioned in this document may be the trademarks of their respective owners.
This session's article, graph and drawing are provided for the ONLY purpose of reference information. The information is private data that we evaluated with a SPECIFIC circumstance. We NEVER guarantee the information.
The rights of this session's article, graph and drawing are reserved by OSCA, Hitachi Solutions, Ltd, Red Hat K.K and Dell Japan Inc. No reproduction is allowed without previous permission.
OpenStack Summit Barcelona 2016 #vBrownBag
(http://www.slideshare.net/tkkd/)