Download - Ceph Object Storage at Spreadshirt
Ceph Object Storage at Spreadshirt
June 2015
Jens Hadlich Chief Architect
About Spreadshirt
2
Spread it with Spreadshirt
A global e-commerce platform for everyone to create, sell and buy ideas on clothing and accessories across many points of sale. • 12 languages, 11 currencies • 19 markets • 150+ shipping regions
• community of >70.000 active sellers • € 72M revenue (2014) • >3.3M items shipped (2014)
Object Storage at Spreadshirt
• What? – Store and read primarily user generated content, mostly images
• Typical sizes: – a few dozen KB, a few MB
• Some 10s of terabyte (TB) of data • Read > Write
• „Never change a running system“? – Currently solution from the early days with big storage + lots of files /
directories doesn‘t work anymore • Regular UNIX tools get unusable in practice • Not designed for „the cloud“ (e.g. replication is an issue)
– Growing number of users à more content – Build a truly global platform (multiple regions and data centers)
3
Ceph
• Why Ceph? – Vendor independent – Open source – Runs on commodity hardware – Local installation for minimal latency – Existing knowledge and experience – S3-API
• Simple bucket-to-bucket replication – A good fit also for < Petabyte – Easy to add more storage – (Can be used later for block storage)
4
Ceph Object Storage Architecture
5
Overview
Ceph Object Gateway
Monitor
Cluster Network
Public Network
OSD OSD OSD OSD OSD
Monitor Monitor
A lot of nodes and disks
Client HTTP (S3 or SWIFT API)
RADOS (reliable autonomic distributed object store)
Ceph Object Storage Architecture
6
A little more detailled
Monitor
Cluster Network
Public Network
Client
RadosGW
HTTP (S3 or SWIFT API)
Monitor Monitor
Some SSDs (for journals) More HDDs JBOD (no RAID)
OSD node
Ceph Object Gateway
librados
Odd number (Quorum)
OSD node OSD node OSD node OSD node
1G
10G (the more the better)
...
RADOS (reliable autonomic distributed object store)
OSD node
Ceph Object Storage Architecture
7
Initial Setup (planned)
Cluster Network (OSD Replication)
Cluster nodes 3 x SSD (journal / index) 9 x HDD (data)
3 Monitors
2 x 1G, IPv4
2 x 10G, IPv6
Public Network
Client HTTP (S3 or SWIFT API)
HAProxy
RadosGW
Monitor
RadosGW
Monitor
RadosGW
Monitor
RadosGW RadosGW
2 x 10G, IPv6 Cluster Network
RadosGW on each node
Ceph Object Storage Performance
8
Some smoke tests
• How fast is RadosGW? Get an impression. – Response times (read / write)
• Average? • Percentiles (P99)?
– Compared to AWS S3?
• A very minimalistic test setup – 3 VMs (KVM) all with RadosGW, Monitor and OSD
• 2 Cores, 4GB RAM, 1 OSD each (15 GB + 5GB), 10G Network between nodes, HAProxy (round-robin), LAN, HTTP
– No further optimizations
Ceph Object Storage Performance
9
Some smoke tests
• How fast is RadosGW? – Random read and write – Object size: 4 KB
• Results: Pretty promising! – E.g. 16 parallel threads, read:
• Avg 9 ms • P99 49 ms • > 1.300 requests/s
Ceph Object Storage Performance
10
Some smoke tests
• Compared to Amazon S3? – Comparing apples and oranges (unfair, but interresting)
• http vs. https, LAN vs. WAN etc.
• Reponse times – Random read, object size: 4KB, 4 parallel threads, location: Leipzig Ceph S3 AWS S3
eu-central-1 eu-west-1
Location Leipzig Frankfurt Ireland Avg 6 ms 25 ms 56 ms P99 47 ms 128 ms 374 ms Requests/s 405 143 62
Global Availability
11
• 1 Ceph cluster per data center
• S3 bucket-to-bucket replication
• Multiple regions, local delivery
To be continued ...
+ = ?
Thank You! [email protected]