suse enterprise storage · 2017. 3. 15. · suse storage architectural benefits exabyte scalability...

36
SUSE Enterprise Storage Your First Ceph Cluster Michal Jura Senior Software Engineer Linux HA/Cloud Developer [email protected]

Upload: others

Post on 23-Sep-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SUSE Enterprise Storage · 2017. 3. 15. · SUSE Storage architectural benefits Exabyte scalability No bottlenecks or single points of failure Industry-leading functionality Remote

SUSE Enterprise Storage

Your First Ceph Cluster

Michal JuraSenior Software Engineer

Linux HA/Cloud Developer

[email protected]

Page 2: SUSE Enterprise Storage · 2017. 3. 15. · SUSE Storage architectural benefits Exabyte scalability No bottlenecks or single points of failure Industry-leading functionality Remote

2

Agenda

● Ceph overview by engineer eye?● Hardware planning● Deploy Ceph● Deploy SUSE OpenStack Cloud and SUSE Storage

Page 3: SUSE Enterprise Storage · 2017. 3. 15. · SUSE Storage architectural benefits Exabyte scalability No bottlenecks or single points of failure Industry-leading functionality Remote

Brief intro to SUSE Storage / Ceph

Page 4: SUSE Enterprise Storage · 2017. 3. 15. · SUSE Storage architectural benefits Exabyte scalability No bottlenecks or single points of failure Industry-leading functionality Remote

4

SUSE Storage

● SUSE Storage is based upon Ceph● SUSE Storage 1.0 was already released

● Based upon Ceph Firefly v0.80 release

● Latest release Ceph Giant v0.87 and Ceph Hammer v0.94

Page 5: SUSE Enterprise Storage · 2017. 3. 15. · SUSE Storage architectural benefits Exabyte scalability No bottlenecks or single points of failure Industry-leading functionality Remote

5

SUSE Storage architectural benefits

● Exabyte scalability● No bottlenecks or single points of failure

● Industry-leading functionality● Remote replication, erasure coding

● Cache tiering

● Unified block, file and object interface

● Thin provisioning, copy on write

● 100% software based; can use commodity hardware● Automated management

● Self-managing, self-healing

Page 6: SUSE Enterprise Storage · 2017. 3. 15. · SUSE Storage architectural benefits Exabyte scalability No bottlenecks or single points of failure Industry-leading functionality Remote

6

Expected use cases

● Scalable cloud storage● Provide block storage for the cloud

● Allowing host migration

● Cheap archival storage● Using erasure encoding (like RAID5/6)

● Scalable object store● This is what Ceph is built upon

Page 7: SUSE Enterprise Storage · 2017. 3. 15. · SUSE Storage architectural benefits Exabyte scalability No bottlenecks or single points of failure Industry-leading functionality Remote

7

More exciting things about Ceph

● Tunable for multiple use cases:● for performance

● for price

● for recovery

● Configurable redundancy:● at the disk level

● at the host level

● at the rack level

● at the room level

● ...

Page 8: SUSE Enterprise Storage · 2017. 3. 15. · SUSE Storage architectural benefits Exabyte scalability No bottlenecks or single points of failure Industry-leading functionality Remote

8

Let's do it again

● Only two main components:● MON (Cluster Monitor Deamon) for cluster state

● OSD (Object Storage Daemon) for storing data

● Hash-based data distribution (CRUSH)● (Usually) No need to ask where data is

● Simplifies data balancing

● Ceph clients communicate with OSD directly

Page 9: SUSE Enterprise Storage · 2017. 3. 15. · SUSE Storage architectural benefits Exabyte scalability No bottlenecks or single points of failure Industry-leading functionality Remote

Hardware planning

Page 10: SUSE Enterprise Storage · 2017. 3. 15. · SUSE Storage architectural benefits Exabyte scalability No bottlenecks or single points of failure Industry-leading functionality Remote

10

Questions to answer

• How much net storage, in what tiers?

• How many IOPS?‒ Aggregated

‒ Per VM (average)

‒ Per VM (peak)

• What to optimize for?‒ Cost

‒ Performance

Page 11: SUSE Enterprise Storage · 2017. 3. 15. · SUSE Storage architectural benefits Exabyte scalability No bottlenecks or single points of failure Industry-leading functionality Remote

11

Network

• Choose the fastest network you can afford

• Switches should be low latency with fully meshed backplane

• Separate public and cluster network

• Cluster network should typically be twice the public bandwidth‒ Incoming writes are replicated over the cluster network

‒ Re-balancing and re-mirroring take utilize the cluster network

Page 12: SUSE Enterprise Storage · 2017. 3. 15. · SUSE Storage architectural benefits Exabyte scalability No bottlenecks or single points of failure Industry-leading functionality Remote

12

Networking (public and private)

• Ethernet (1, 10, 40 GbE)‒ Reasonably inexpensive (except for 40 GbE)

‒ Can easily be bonded for availability

‒ Use jumbo frames

• Infiniband‒ High bandwidth

‒ Low latency

‒ Typically more expensive

‒ No support for RDMA yet in Ceph, need to use IPoIB

Page 13: SUSE Enterprise Storage · 2017. 3. 15. · SUSE Storage architectural benefits Exabyte scalability No bottlenecks or single points of failure Industry-leading functionality Remote

13

Storage node

• CPU‒ Number and speed of cores

• Memory

• Storage controller‒ Bandwidth, performance, cache size

• SSDs for OSD journal‒ SSD to HDD ratio

• HDDs‒ Count, capacity, performance

Page 14: SUSE Enterprise Storage · 2017. 3. 15. · SUSE Storage architectural benefits Exabyte scalability No bottlenecks or single points of failure Industry-leading functionality Remote

14

Best practices and magic numbers

• Ceph-OSD sizing‒ Disks

‒ 8-10 SAS HDDs per 1 x 10G NIC

‒ ~12 SATA HDDs per 1 x 10G NIC

‒ 1 x SSD for write journal per 4-6 OSD drives

‒ RAM

‒ 1GB of RAM per 1 TB of OSD storage space

‒ CPU

‒ 0,5 CPU core's/1 Ghz of a core per OSD disk (1-2 CPU cores for SSD drives)

• Ceph-MON sizing‒ 1 ceph-mon node per 15-20 OSD node (min 3 per cluster)

Page 15: SUSE Enterprise Storage · 2017. 3. 15. · SUSE Storage architectural benefits Exabyte scalability No bottlenecks or single points of failure Industry-leading functionality Remote

15

IOPS calculation more magic numbers

• For typical “unpredictable” workload we usually assume:‒ 70/30 read/write IOPS proportion

‒ ~4-8KB random read pattern

• To estimate Ceph IOPS efficiency we usually tak‒ 4-8K Random Read – 0.88

‒ 4-8K Random Write – 0.64

Based on benchmark data and semi-empirical evidence

Page 16: SUSE Enterprise Storage · 2017. 3. 15. · SUSE Storage architectural benefits Exabyte scalability No bottlenecks or single points of failure Industry-leading functionality Remote

16

Performance optimized Ceph

• Ceph-MON‒ CPU 6 cores

‒ RAM 65GB

‒ HDD: 1x1TB SATA

‒ NIC: 1 x 1 Gb/s, 1 x 10 GB/s

• Ceph-OSD‒ CPU 2x6 cores

‒ RAM 64GB

‒ HDD (OS drives): 2xSATA 500GB

‒ HDD (OSD drives): 20xSAS 1,8TB (10k RPM, 2.5 inch)

‒ SSD (write journals): 4 x SSD 128GB

‒ NIC: 1 x 1Gb/s, 4 x 10Gb/s (2 bonds – 1 for ceph-public and 1 for ceph-replication)

• Compute size of Ceph cluster‒ 1000/ (20 * 1,8TB * 0,85) * 3 = 98 servers to serve 1 petabyte net

Page 17: SUSE Enterprise Storage · 2017. 3. 15. · SUSE Storage architectural benefits Exabyte scalability No bottlenecks or single points of failure Industry-leading functionality Remote

17

Exected performance level

• 500 VM in Data Center

• Conservative drive rating: 250 IOPs

• Cluster read rating:‒ 250 * 20 * 98 * 0.88 = 431 200 IOPs

‒ Approx 800 IOPs per VM capacity is available

• Cluster writing rating:‒ 250 * 20 * 98 * 0.88 = 313 600 IOPs

‒ Approx 600 IOPs per VM capacity is available

Page 18: SUSE Enterprise Storage · 2017. 3. 15. · SUSE Storage architectural benefits Exabyte scalability No bottlenecks or single points of failure Industry-leading functionality Remote
Page 19: SUSE Enterprise Storage · 2017. 3. 15. · SUSE Storage architectural benefits Exabyte scalability No bottlenecks or single points of failure Industry-leading functionality Remote
Page 20: SUSE Enterprise Storage · 2017. 3. 15. · SUSE Storage architectural benefits Exabyte scalability No bottlenecks or single points of failure Industry-leading functionality Remote
Page 21: SUSE Enterprise Storage · 2017. 3. 15. · SUSE Storage architectural benefits Exabyte scalability No bottlenecks or single points of failure Industry-leading functionality Remote

21

Adding more nodes

• Capacity increases

• Total throughput increases

• IOPS increase

• Redundancy increases

• Latency unchanged

• Eventually: network topology limitations

• Temporary impact during re-balancing

Page 22: SUSE Enterprise Storage · 2017. 3. 15. · SUSE Storage architectural benefits Exabyte scalability No bottlenecks or single points of failure Industry-leading functionality Remote

22

Adding more disks to a node

• Capacity increases

• Redundancy increases

• Throughput might increase

• IOPS might increase

• Internal node bandwidth is consumed

• Higher CPU and memory load

• Cache contention

• Latency unchanged

Page 23: SUSE Enterprise Storage · 2017. 3. 15. · SUSE Storage architectural benefits Exabyte scalability No bottlenecks or single points of failure Industry-leading functionality Remote

23

OSD file system

• btrfs‒ Typically better write

throughput performance

‒ Higher CPU utilization

‒ Feature rich

‒ Compression, checksums, copy on write

‒ The choice for the future!

• XFS‒ Good all around choice

‒ Very mature for data partitions

‒ Typically lower CPU utilization

‒ The choice for today!

Page 24: SUSE Enterprise Storage · 2017. 3. 15. · SUSE Storage architectural benefits Exabyte scalability No bottlenecks or single points of failure Industry-leading functionality Remote

24

Impact of caches

• Cache on the client side‒ Typically, biggest impact on performance

‒ Does not help with write performance

• Server OS cache‒ Low impact: reads have already been cached on the client

‒ Still, helps with readahead

• Caching controller, battery backed:‒ Significant impact for writes

Page 25: SUSE Enterprise Storage · 2017. 3. 15. · SUSE Storage architectural benefits Exabyte scalability No bottlenecks or single points of failure Industry-leading functionality Remote

25

Impact of SSD journals

• SSD journals accelerate bursts and random write IO

• For sustained writes that overflow the journal, performance degrades to HDD levels

• SSDs help very little with read performance

• SSDs are very costly‒ ... and consume storage slots -> lower density

• A large battery-backed cache on the storage controller is highly recommended if not using SSD journals

Page 26: SUSE Enterprise Storage · 2017. 3. 15. · SUSE Storage architectural benefits Exabyte scalability No bottlenecks or single points of failure Industry-leading functionality Remote

26

Hard disk parameters

• Capacity matters‒ Often, highest density is

not most cost effective

• On-disk cache matters less

• Reliability advantage of Enterprise drives typically marginal compared to cost‒ Buy more drives instead

• RPM:‒ Increase IOPS &

throughput

‒ Increases power consumption

‒ 15k drives quite expensive still

Page 27: SUSE Enterprise Storage · 2017. 3. 15. · SUSE Storage architectural benefits Exabyte scalability No bottlenecks or single points of failure Industry-leading functionality Remote

27

Impact of redundancy choices

• Replication:‒ n number of exact, full-

size copies

‒ Potentially increased read performance due to striping

‒ Increased cluster network utilization for writes

‒ Rebuilds can leverage multiple sources

‒ Significant capacity impact

• Erasure coding:‒ Data split into k parts

plus m redundancy codes

‒ Better space efficiency

‒ Higher CPU overhead

‒ Significant CPU and cluster network impact, especially during rebuild

‒ Cannot directly be used to with block devices (see next slide)

Page 28: SUSE Enterprise Storage · 2017. 3. 15. · SUSE Storage architectural benefits Exabyte scalability No bottlenecks or single points of failure Industry-leading functionality Remote

28

Cache tiering

• Multi-tier storage architecture:‒ Pool acts as a transparent write-back overlay for another

‒ e.g., SSD 3-way replication over HDDs with erasure coding

‒ Can flush either on relative or absolute dirty levels, or age

‒ Additional configuration complexity and requires workload-specific tuning

‒ Also available: read-only mode (no write acceleration)

‒ Some downsides (no snapshots)

• A good way to combine the advantages of replication and erasure coding

Page 29: SUSE Enterprise Storage · 2017. 3. 15. · SUSE Storage architectural benefits Exabyte scalability No bottlenecks or single points of failure Industry-leading functionality Remote

29

Federated gateways

Page 30: SUSE Enterprise Storage · 2017. 3. 15. · SUSE Storage architectural benefits Exabyte scalability No bottlenecks or single points of failure Industry-leading functionality Remote

30

SUSE Enterprise Storage 2.0

• The new and faster civetweb integration for RADOS Gateway

• iSCSI Ceph Gateway

• Network-based key server for encryption of OSD disks and journals

• This release is based on the stable Ceph 0.94 Hammer release

Page 31: SUSE Enterprise Storage · 2017. 3. 15. · SUSE Storage architectural benefits Exabyte scalability No bottlenecks or single points of failure Industry-leading functionality Remote

Deploy SUSE Enterprise Storage

Page 32: SUSE Enterprise Storage · 2017. 3. 15. · SUSE Storage architectural benefits Exabyte scalability No bottlenecks or single points of failure Industry-leading functionality Remote

32

Deploying SUSE Enterprise Storage

• Ceph-deploy method

Page 33: SUSE Enterprise Storage · 2017. 3. 15. · SUSE Storage architectural benefits Exabyte scalability No bottlenecks or single points of failure Industry-leading functionality Remote

33

Deploying SUSE Enterprise Storage

• Crowbar provisioning and orchestration system

Page 34: SUSE Enterprise Storage · 2017. 3. 15. · SUSE Storage architectural benefits Exabyte scalability No bottlenecks or single points of failure Industry-leading functionality Remote

34

About Ceph layout

● Ceph needs 1 or more mon nodes● In production 3 nodes are the minimum

● Ceph needs 3 or more osd nodes● Can be fewer in testing

● Each osd should manage a minimum of 15 Gb

● Smaller is possible

Page 35: SUSE Enterprise Storage · 2017. 3. 15. · SUSE Storage architectural benefits Exabyte scalability No bottlenecks or single points of failure Industry-leading functionality Remote

Corporate HeadquartersMaxfeldstrasse 590409 NurembergGermany

+49 911 740 53 0 (Worldwide)www.suse.com

Join us on:www.opensuse.org

35

Page 36: SUSE Enterprise Storage · 2017. 3. 15. · SUSE Storage architectural benefits Exabyte scalability No bottlenecks or single points of failure Industry-leading functionality Remote

Unpublished Work of SUSE LLC. All Rights Reserved.This work is an unpublished work and contains confidential, proprietary and trade secret information of SUSE LLC. Access to this work is restricted to SUSE employees who have a need to know to perform tasks within the scope of their assignments. No part of this work may be practiced, performed, copied, distributed, revised, modified, translated, abridged, condensed, expanded, collected, or adapted without the prior written consent of SUSE. Any use or exploitation of this work without authorization could subject the perpetrator to criminal and civil liability.

General DisclaimerThis document is not to be construed as a promise by any participating company to develop, deliver, or market a product. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. SUSE makes no representations or warranties with respect to the contents of this document, and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. The development, release, and timing of features or functionality described for SUSE products remains at the sole discretion of SUSE. Further, SUSE reserves the right to revise this document and to make changes to its content, at any time, without obligation to notify any person or entity of such revisions or changes. All SUSE marks referenced in this presentation are trademarks or registered trademarks of Novell, Inc. in the United States and other countries. All third-party trademarks are the property of their respective owners.