a simple object storage system for web applications - usenix · 2019-12-18 · gluster lustre ibrix...
TRANSCRIPT
![Page 1: A simple object storage system for web applications - USENIX · 2019-12-18 · Gluster Lustre IBrix Scalable NAS Isilon Onstor Parallel file systems pNFS Oceanstore. Page 4 First](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed7a66748b98015c202101d/html5/thumbnails/1.jpg)
A simple object storage
system for web
applications
December 12, 2012
![Page 2: A simple object storage system for web applications - USENIX · 2019-12-18 · Gluster Lustre IBrix Scalable NAS Isilon Onstor Parallel file systems pNFS Oceanstore. Page 4 First](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed7a66748b98015c202101d/html5/thumbnails/2.jpg)
Page 2
Motivation
Most web content is static and shared
Traditional NAS systems inefficient and costly for
content distribution
Every interface to content is unique per application
![Page 3: A simple object storage system for web applications - USENIX · 2019-12-18 · Gluster Lustre IBrix Scalable NAS Isilon Onstor Parallel file systems pNFS Oceanstore. Page 4 First](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed7a66748b98015c202101d/html5/thumbnails/3.jpg)
Page 3
Background circa 2006
Google file system
Cluster file systems
Gluster
Lustre
IBrix
Scalable NAS
Isilon
Onstor
Parallel file systems
pNFS
Oceanstore
![Page 4: A simple object storage system for web applications - USENIX · 2019-12-18 · Gluster Lustre IBrix Scalable NAS Isilon Onstor Parallel file systems pNFS Oceanstore. Page 4 First](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed7a66748b98015c202101d/html5/thumbnails/4.jpg)
Page 4
First attempt – IBrix
Commodity hardware
Scalable metadata
Scalable cluster
Good resilience
Problems
Hierarchical metadata
Weak metadata replication
Client software required
Client and server version mismatches
![Page 5: A simple object storage system for web applications - USENIX · 2019-12-18 · Gluster Lustre IBrix Scalable NAS Isilon Onstor Parallel file systems pNFS Oceanstore. Page 4 First](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed7a66748b98015c202101d/html5/thumbnails/5.jpg)
Page 5
Second attempt – Object store
Purpose built
Commodity hardware
Open source software components
Linux
Tomcat
JAVA
MySQL
Simple external API
Manageability prioritized
![Page 6: A simple object storage system for web applications - USENIX · 2019-12-18 · Gluster Lustre IBrix Scalable NAS Isilon Onstor Parallel file systems pNFS Oceanstore. Page 4 First](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed7a66748b98015c202101d/html5/thumbnails/6.jpg)
Page 6
Requirements
Shared nothing components
Scalable metadata
Separate metadata and data system components
Asymmetric components allowed
Multi-site capable
RESTful external API
POST
GET
DELETE
![Page 7: A simple object storage system for web applications - USENIX · 2019-12-18 · Gluster Lustre IBrix Scalable NAS Isilon Onstor Parallel file systems pNFS Oceanstore. Page 4 First](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed7a66748b98015c202101d/html5/thumbnails/7.jpg)
Page 7
Requirements
Multi-tenant
Strong data protection –
Availability
Durability
Background checking and recovery
External security but internal access control
Extended object metadata
Modular
Performance monitoring – external system
Hardware monitoring – internal and external together
![Page 8: A simple object storage system for web applications - USENIX · 2019-12-18 · Gluster Lustre IBrix Scalable NAS Isilon Onstor Parallel file systems pNFS Oceanstore. Page 4 First](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed7a66748b98015c202101d/html5/thumbnails/8.jpg)
Page 8
Implementation
HSS Load Balancer VIP
HSS RW MySQL Load Balancer VIP HSS RO MySQL Load Balancer VIP
HSS Admin MySQL Load Balancer VIP
HSS Storage Nodes
Admin Console
MySQL Replication
MySQL Replication
Admin Tasks
HTTP RequestsHTTP Return
HTTP Requests
User/Application Clients
HTTP Requests
HTTP Return
HTTP Requests
HTTP ReturnHTTP Requests
HTTP Return
![Page 9: A simple object storage system for web applications - USENIX · 2019-12-18 · Gluster Lustre IBrix Scalable NAS Isilon Onstor Parallel file systems pNFS Oceanstore. Page 4 First](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed7a66748b98015c202101d/html5/thumbnails/9.jpg)
Page 9
Write example
POST request to VIP from client
Load balancer selects storage server
Calculate OID
Write file locally
Update DB with new OID and server owner
Create second replica copy
Update DB with OID and second server owner
Return OID to client
Set replication flag in DB to create third replica
![Page 10: A simple object storage system for web applications - USENIX · 2019-12-18 · Gluster Lustre IBrix Scalable NAS Isilon Onstor Parallel file systems pNFS Oceanstore. Page 4 First](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed7a66748b98015c202101d/html5/thumbnails/10.jpg)
Page 10
Read example
GET request to VIP from client
Load balancer selects storage server
Storage server checks local cache for OID
Cache miss causes OID lookup in DB
DB returns location of all replicas
Storage server retrieves one of the replicas
Storage server returns the file to the requestor
If the file is above the redirect threshold send 302 redirect
![Page 11: A simple object storage system for web applications - USENIX · 2019-12-18 · Gluster Lustre IBrix Scalable NAS Isilon Onstor Parallel file systems pNFS Oceanstore. Page 4 First](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed7a66748b98015c202101d/html5/thumbnails/11.jpg)
Page 11
Common failures
DB unavailable for write – 502 server error
Write failure of initial file – 500 server error
Write failure of second replica – retry
File not in DB – 404 not found
File retrieved corrupt or unavailable
Use different replica
Schedule replication to proper number of required replicas
![Page 12: A simple object storage system for web applications - USENIX · 2019-12-18 · Gluster Lustre IBrix Scalable NAS Isilon Onstor Parallel file systems pNFS Oceanstore. Page 4 First](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed7a66748b98015c202101d/html5/thumbnails/12.jpg)
Page 12
Features
Automatic file expiration configurable by application
OID can be specified for application flexibility
Frequently accessed files are cached on all servers
Usage accounting
![Page 13: A simple object storage system for web applications - USENIX · 2019-12-18 · Gluster Lustre IBrix Scalable NAS Isilon Onstor Parallel file systems pNFS Oceanstore. Page 4 First](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed7a66748b98015c202101d/html5/thumbnails/13.jpg)
Page 13
Some statistics
99.5% of all requests take less than 100ms
99.9% of all requests take less than 500ms
Over 200M requests in a single day
Over 400M objects managed
165TB of objects served per month
40+ applications storing files
![Page 14: A simple object storage system for web applications - USENIX · 2019-12-18 · Gluster Lustre IBrix Scalable NAS Isilon Onstor Parallel file systems pNFS Oceanstore. Page 4 First](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed7a66748b98015c202101d/html5/thumbnails/14.jpg)
Page 14
Future enhancements
Containers for objects – improve performance and
reliability
Better geographic awareness – location affinity and
latency improvements
Storage tiers – better resource allocation and
performance
Improved modularity – different storage and metadata
backends
![Page 15: A simple object storage system for web applications - USENIX · 2019-12-18 · Gluster Lustre IBrix Scalable NAS Isilon Onstor Parallel file systems pNFS Oceanstore. Page 4 First](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed7a66748b98015c202101d/html5/thumbnails/15.jpg)
Page 15
Demo
Store a file through basic web UI
See where it is stored
Retrieve the copies
Delete the file
Fail to retrieve the deleted file
Look at some of the admin UI
![Page 16: A simple object storage system for web applications - USENIX · 2019-12-18 · Gluster Lustre IBrix Scalable NAS Isilon Onstor Parallel file systems pNFS Oceanstore. Page 4 First](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed7a66748b98015c202101d/html5/thumbnails/16.jpg)
Page 16
Questions?