Yosuke Hara Nov 14th, 2012
On the Road
1Wednesday, November 14, 12
Table of contents
1. Dive into the Web
2. The Seed And the Sower
3. To the Growth
4. To the Bloom (the Future)
2Wednesday, November 14, 12
1. Dive into the Web
3Wednesday, November 14, 12
The Internet of things: http://www.isgtw.org/visualization/internet-things4Wednesday, November 14, 12
As of 2011
60 SECONDS - THINGS THAT HAPPEN ON INTERNET EVERY SIXTY SECONDS: http://www.go-gulf.com/blog/60-seconds5Wednesday, November 14, 12
X3.0
Amazon S3 - 905 Billion Objects and 650,000 Requests/Second: http://bit.ly/HqAlYV
650,000 req/sec
6Wednesday, November 14, 12
7Wednesday, November 14, 12
2. The Seed and the Sower
8Wednesday, November 14, 12
Awareness
9Wednesday, November 14, 12
The Seed and the Sower
Set up a hypothesis
Awareness - “Why?”
Survey
Unclear Convince
10Wednesday, November 14, 12
1. Low ROI2. Possibility of SPOF3. Storage Expansion is difficult during increasing data
?
The Seed and the Sower
11Wednesday, November 14, 12
Will face the situation
The Seed and the Sower
12Wednesday, November 14, 12
Need to store and managehuge amount of files at low-cost.
The Seed and the Sower
13Wednesday, November 14, 12
3. To the Growth
14Wednesday, November 14, 12
Architecture
15Wednesday, November 14, 12
Architecture
法隆寺: http://bit.ly/UnzytRHōryū-ji: http://bit.ly/fbfOrF
“法隆寺 - Hōryū-ji”
16Wednesday, November 14, 12
17Wednesday, November 14, 12
Architecture is “POWER”
18Wednesday, November 14, 12
For whom?For what?
How?
Find!
19Wednesday, November 14, 12
Architecture
AvailabilityFault-tolerance, Automatically Recover
Split brain measures,I/O strategy,
Structure of the system layers
ScalabilityElastic storage-cluster
Able to join/leave nodes
PerformanceCache mechanism, I/O strategyStructure of the system layers
Need to # of servers
AdministrationManage and Monitor nodes in the cluster
Architecture
Use CaseFor the Web Applications
For the Cloud
20Wednesday, November 14, 12
HIGH Availability(Reliability)
HIGH Cost Performance HIGH Scalability
Architecture
To Realize
21Wednesday, November 14, 12
LeoFS-Manager
LeoFS-Gateway
LeoFS-Storage
REST over HTTP (80/443) RPC
(4369)
Request from Web Application(s) or Browser
META Object Store
Storage Engine/Router
META Object Store META Object Store
RPC (4369)
Storage Engine/Router Storage Engine/Router
Load Balancer
S3-API
Monitor(SNMP)
GUI Console
(4000,4010,4020)
(10020, 10021)
Architecture
Gateway (Stateless Proxy)
HTTP Request/Response Handling+
w/Object Cache
ManagerSystem Management
Ring MonitorNode State Monitor
StorageObject Storage, Meta data Storage
+Replicator/Recoverer, Qeueue
22Wednesday, November 14, 12
LeoFS-Manager
LeoFS-Gateway
LeoFS-Storage
REST over HTTP (80/443) RPC
(4369)
Request from Web Application(s) or Browser
META Object Store
Storage Engine/Router
META Object Store META Object Store
RPC (4369)
Storage Engine/Router Storage Engine/Router
Load Balancer
S3-API
Monitor(SNMP)
GUI Console
(4000,4010,4020)
(10020, 10021)
Architecture
23Wednesday, November 14, 12
Estimation / Review
Design
Simulation
Architecture
Basic Detail
24Wednesday, November 14, 12
GatewayStateless ProxyHTTP
Storage Cluster
Erlang RPC
Erlang RPC
Manager Cluster
State/Process MonitorErlang RPC
Object Cache
Architecture
25Wednesday, November 14, 12
LeoFS Architecture - Gateway / Storage
Storage Engine
Object Storage Metadata
StorageRPC RPC
membership (fault-detection)
redundant-manager replicator
queue
read-repairer
Gateway REST over HTTP (S3-API)
redundant-manager membership (fault-detection)
get put delete head
RPC
26Wednesday, November 14, 12
Cowboy: Erlang light-weight HTTP-Server - http://http://www.ninenines.eu/
Gateway
From ApplicationsS3-API
Object Cache
Replicate when using RPC
Consistent HashingHorizontal Distribution
Storage Nodes
[ LRU, Slab allocator, Skip graph ]“Cowboy”
“Stateless Proxy”
LeoFS Architecture - LeoFS Gateway
27Wednesday, November 14, 12
...
LeoFS Storage
leo-object-storage
LeoFS Architecture - LeoFS Storage Engine
Metadata : Keeps an in-memory index of all data.Object Storage : Log structured (append-only) object store.
Request From Gateway
replicatorrepairer
queue...
28Wednesday, November 14, 12
LeoFS Architecture - LeoFS Storage Engine - Data Structure
Offset Version Time-stamp{VNodeId, Key}
<Metadata>
Checksum
for Sync
KeySize CustomMeta Size File Size
for Retrieve an File (Object)
Footer (8B)
Checksum KeySize DataSize Offset Version Time-stamp
{VNodeId,Key} User-Meta Footer
<Object>
Header (Metadata - Fixed length) Body (Variable Length)
User-MetaSize
ActualFile
Supe
r-bl
ock
Obj
ect-1
Obj
ect-2
Obj
ect-3
<Object Container>
Obj
ect-4
Obj
ect-5
29Wednesday, November 14, 12
"Less is more" and "God is in the details"
Ludwig Mies van der Rohe: http://en.wikipedia.org/wiki/Ludwig_Mies_van_der_Rohe30Wednesday, November 14, 12
Strategy
31Wednesday, November 14, 12
“Gradual Development”
32Wednesday, November 14, 12
Estimation / Review
Implement / Fix / Improvement
Benchmark / TEST
Strategy
Incubation Production
33Wednesday, November 14, 12
Strategy
“SCRUM”
Scrum (development): http://en.wikipedia.org/wiki/Scrum_(development)34Wednesday, November 14, 12
Plan
35Wednesday, November 14, 12
Media Platform
Application / Log Collector
Search / Analysis
PaaS / IaaS
Plan
“Final image”
DATA-HUB
36Wednesday, November 14, 12
Media Platform
Application / Log Collector
Search / Analysis
PaaS / IaaS
Plan
“Final image”
37Wednesday, November 14, 12
Estimation
Implement / Fix / Improvement
Benchmark / TEST
Plan
Prototype1,2 Phase1 Phase2
38Wednesday, November 14, 12
ARIA (Photo Storage) released
LeoFS (Cloud Storage System) released
FROM “Photo Storage” TO “Cloud Storage”
Feb 2010 June 2011 July 2012 Nov 2012
Started R&D
a. Object Cache b. Compactionc. Rebalanced. S3 Compatibility
e. Large-object Supportf. Multi-tenantg. Multi-layer cache
Core Functions
Plan
39Wednesday, November 14, 12
5MB 100MB a few GB
From Photo Storage To Cloud Storage
1st step as Cloud StorageSpecialize in “Photo”
2011 - Phase1
40Wednesday, November 14, 12
5MB 100MB a few GB
From Photo Storage To Cloud Storage
Aim to “DATA-HUB” in the CloudHandle various unstructured data
2012 - Phase2
41Wednesday, November 14, 12
S3FS-C
Goal
42Wednesday, November 14, 12
Goal
43Wednesday, November 14, 12
4. To the Bloom
44Wednesday, November 14, 12
Fusion & Improvement
Amazon DynamoQuery Model, ACID
Facebook HaystackStorage Engine
SEDA: An Architecture for Well-Conditioned,
Scalable Internet Services
Hybrid = P2P + ManagerStorage Cluster / Manager
Split brain measures
Amazon S3 APIGateway, Manager
To the Bloom
SEDA: http://www.eecs.harvard.edu/~mdw/papers/seda-sosp01.pdf
Amazon Dynamo: http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/decandia07dynamo.pdf
Facebook Haystack: http://www.facebook.com/note.php?note_id=76191543919
45Wednesday, November 14, 12
Amazon DynamoQuery Model, ACID
Facebook HaystackStorage Engine
SEDA: An Architecture for Well-Conditioned,
Scalable Internet Services
Hybrid = P2P + ManagerStorage Cluster / Manager
Split brain measures
Amazon S3 APIGateway, Manager
To the Bloom
SEDA: http://www.eecs.harvard.edu/~mdw/papers/seda-sosp01.pdf
Amazon Dynamo: http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/decandia07dynamo.pdf
Facebook Haystack: http://www.facebook.com/note.php?note_id=76191543919
46Wednesday, November 14, 12
“Connecting the dots”
47Wednesday, November 14, 12
Richard St. John's 8 secrets of success
Richard St. John's 8 secrets of success: http://www.ted.com/talks/richard_st_john_s_8_secrets_of_success.html
48Wednesday, November 14, 12
Wrap Up
49Wednesday, November 14, 12
Wrap Up
1. Awareness - Why? Why? Why?
2. Architecture is “POWER”
3. Gradual Development
4. Fusion and Improvement
5. 8 secrets of success
50Wednesday, November 14, 12
Thank you for your time
LeoFS - http://www.leofs.org
51Wednesday, November 14, 12