implement object storage with smr based key-value store · pdf fileimplement object storage...
TRANSCRIPT
![Page 1: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low](https://reader031.vdocuments.net/reader031/viewer/2022011723/5aa88c287f8b9a95188bae1b/html5/thumbnails/1.jpg)
2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
Implement Object Storage with SMR based Key-Value Store
[email protected] [email protected]
Huawei Technologies Co.
![Page 2: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low](https://reader031.vdocuments.net/reader031/viewer/2022011723/5aa88c287f8b9a95188bae1b/html5/thumbnails/2.jpg)
2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
Agenda
Object Storage Market Overview Object Storage Design with SMR based Key-Value Store
Summary Future Works
![Page 3: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low](https://reader031.vdocuments.net/reader031/viewer/2022011723/5aa88c287f8b9a95188bae1b/html5/thumbnails/3.jpg)
2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
Big Data
Cloud
BYOD Media &Entertainment
Virtualization SDS 2020 Total Capacity 40ZB(Gartner)
Massive Data Storage Trend
A Revolution That Will Transform How We Live, Work, and Think ······ ——Kenneth Cukier
![Page 4: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low](https://reader031.vdocuments.net/reader031/viewer/2022011723/5aa88c287f8b9a95188bae1b/html5/thumbnails/4.jpg)
2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
Components and Characteristics of Massive Data
Data Components
Video
Music
Picture
Data file
Seldom updated
Undefined value
Large capacity and high growth speed
Long storage time
25%
75% (unstructured data)
Object Storage Technology matches these requirements
![Page 5: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low](https://reader031.vdocuments.net/reader031/viewer/2022011723/5aa88c287f8b9a95188bae1b/html5/thumbnails/5.jpg)
2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
SMR matches Object Storage Market
Object Storage Requirement Huge volume need large capacity drive. Competitive TCO need cheap storage media Write once few modification matches SMR write out-of-place feature
SMR Technology Background
Type – Drive managed,Host Aware, Host Managed Standard – ZBC, ZAC Industry – all HDD vendors will release SMR 2015~2016
Huawei cooperation with HDD vendors on SMR: http://events.linuxfoundation.org/sites/events/files/slides/SMR%20in%20Linux%20Systems%20-%20Vault.pdf http://www.hgst.com/company/media-room/press-releases/HGST-Delivers-Worlds-First-10TB-Enterprise-HDD-for-Active-Archive-Applications
![Page 6: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low](https://reader031.vdocuments.net/reader031/viewer/2022011723/5aa88c287f8b9a95188bae1b/html5/thumbnails/6.jpg)
2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
Agenda
Object Storage Market Overview Object Storage Design with SMR based Key-Value Store
Summary and Future Works
![Page 7: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low](https://reader031.vdocuments.net/reader031/viewer/2022011723/5aa88c287f8b9a95188bae1b/html5/thumbnails/7.jpg)
2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
Huawei Object Storage Architecture
Standalone Key Value Store, Provide simple KV access on HDD/SDD
Distributed Object Pools, Provides redundant KV access, like replica and Erasure code
Service, Provide S3/Swift like access
Infrastructure
Standalone Key-Value Store
Services (S3/Swift)
Protocols Cluster C
ontrol
Cloud M
anagement
Distributed Object Pools
![Page 8: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low](https://reader031.vdocuments.net/reader031/viewer/2022011723/5aa88c287f8b9a95188bae1b/html5/thumbnails/8.jpg)
2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
Why Key-Value(KV) based not Logical Block Address(LBA)
LBA=Logical Block Address KV=Key Value
metadata
Complicated (Huge Metadata)
Simple (LBA)
Complicated (KV)
Simple Name Policy
Standalone Drive Layer
Distribution Layer
get() / put()
![Page 9: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low](https://reader031.vdocuments.net/reader031/viewer/2022011723/5aa88c287f8b9a95188bae1b/html5/thumbnails/9.jpg)
2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
Huawei Key Value Store(KVS) Data Model
container … …
Object Object Object … …
container
SSD/HDD KVS
container
One KVS has multiple containers/Pools Every Container has it own policy, like key size, value size, shared allocation/ reserved allocation, delete policy, etc… Access Object by KV API, Object can store metadata
![Page 10: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low](https://reader031.vdocuments.net/reader031/viewer/2022011723/5aa88c287f8b9a95188bae1b/html5/thumbnails/10.jpg)
2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
Key Value Store(KVS) API
Pool (Container) operations: Create(name, config_file, pl_id) Destroy(name) Open(name, mode, pl_id) Close(pl_id) Set_prop(pl_id, prop, value) Get_prop(pl_id, prop, value) Get_stats(pl_id, stats) Xcopy(src, dest, flag, regex, regex_len) …
Object operations:
put (pl_id, key, value, kv_props, put_opts) get(pl_id, key, value, kv_props, get_opts) del(pl_id, key, value, kv_props, del_opts) Iter_open(pl_id, flag, regex, regex_len, limit, *iter_id) Iter_next(pl_id, iter_id, kvarray) Iter_close(pl_id, iter_id)
![Page 11: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low](https://reader031.vdocuments.net/reader031/viewer/2022011723/5aa88c287f8b9a95188bae1b/html5/thumbnails/11.jpg)
2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
KVS core --- LDB (Log structured DB) Modules
OM, Operate & Maintenance Layout & Scheduler provides SMR write-out-of place allocation and IO stack.
KV Space Management (Layout)
IO Scheduler
Key Index &Cache
KV Record Manager
KV Background
Task OM
Object Semantic Layer
![Page 12: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low](https://reader031.vdocuments.net/reader031/viewer/2022011723/5aa88c287f8b9a95188bae1b/html5/thumbnails/12.jpg)
2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
SMR IO stack
SMR lib
ioctl(*, SG_IO, *)
sense code parser
Application(LDB) smr_read(), smr_write(),
…
SCSI low level driver
SCSI middle level driver (include libata)
SD
libAIO
Asyc IO
aio_read(), aio_write() ...
AIO for parallel Access
SMR lib for SMR new commands in user space
AIO only get EIO when error, SMR lib can get sense code then parse detail error
![Page 13: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low](https://reader031.vdocuments.net/reader031/viewer/2022011723/5aa88c287f8b9a95188bae1b/html5/thumbnails/13.jpg)
2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
LDB KV access Overview
KV Record Manage
1. get(pl_id, * key, * value, * props, * getopts);
Key Index &Cache
key1 kvr_offset1 Key2 key_offset2 key3 key_offset3
2. find kvr_offset with key.
3. read KVR from drive based on kvr_offset.
Drive is divided into zones, and zone size align media characteristics (256M for SMR). Store KVR(Key Value Record) in Zone.
Zone Zone Zone … Zone
KVR KVR KVR KVR KVR KVR KVR KVR … KVR
![Page 14: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low](https://reader031.vdocuments.net/reader031/viewer/2022011723/5aa88c287f8b9a95188bae1b/html5/thumbnails/14.jpg)
2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
Zone based Layout
Zone Type Sub Zone Type Function
Super Zone Super Zone Store meta data information
Data Zone
Index Zone Store memory index checkpoint to make boot faster
KVR Zone Store generic Object KVR information
Tombstone Zone Store deleted Object KVR information
Reserve Zone Reserve Zone For Add new functions in the future
Super Zone
Data Zone
Date Zone … Data
Zone Reserved
Zone
![Page 15: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low](https://reader031.vdocuments.net/reader031/viewer/2022011723/5aa88c287f8b9a95188bae1b/html5/thumbnails/15.jpg)
2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
Super Zone
One Super Zone has many super block (SB), every SB stores LDB metadata of one specified time, seq is used to record the sequence of SB.
There are more than one super zone, and the super block is stored sequentially with log style, not overwrite; after write new super zone the old super zone can be reset and reuse again.
Super Zone
seq … zone _size
Zone _nr … space
_info zone _info … pool
_info tail
_seq
Super Block1
Super Block2 … … Super
Block n
![Page 16: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low](https://reader031.vdocuments.net/reader031/viewer/2022011723/5aa88c287f8b9a95188bae1b/html5/thumbnails/16.jpg)
2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
Detail Information of SB
Space Info Allocation statistics KVR statistics garbage information
Zone Info
Index zone info Data zone info Tombstone info
Pool Info
Pool numbers Pool name, id, capacity Pool key hints, value hints, policies
![Page 17: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low](https://reader031.vdocuments.net/reader031/viewer/2022011723/5aa88c287f8b9a95188bae1b/html5/thumbnails/17.jpg)
2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
KVR in Zone
KVR1 KVR2 KVR3 … KVRn Disk Index Table
Data Zone
generation … pool_id key value meta pre_kvr checksum
Each Object is stored as KVR, and KVR allocation is log style as SMR required write out-of-place. KVR has pool_id field, then multiple pool’s KVR allocation can share one zone. Each KVR can store meta-data, upper layer application will leverage. KVR has pre_kvr field when delete/overwrite exist key, at that time it will generate a tombstone KVR in tombstone zone. At the end of each data zone, put all the KVR index together as disk index table, for recovery oriented design.
![Page 18: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low](https://reader031.vdocuments.net/reader031/viewer/2022011723/5aa88c287f8b9a95188bae1b/html5/thumbnails/18.jpg)
2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
Index Zone Definition
IZ Head Cell1 … Celln … IZ
Taile Pad
IZ, Index Zone Seq, identify an fully index zone entry. magic, index zone entry magic number Bucket, Memory index is organized as bucket, each bucket is about1MB To make boot faster, memory index will make checkpoint and store into cell with log style.
Index Zone
seq magic … Index Table Bucket
![Page 19: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low](https://reader031.vdocuments.net/reader031/viewer/2022011723/5aa88c287f8b9a95188bae1b/html5/thumbnails/19.jpg)
2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
SMR drive ZONE layout
0TB(OD)
x TB (ID)
0TB(OD)
xTB (ID)
SMR Drive Throughput depends on ZONE layout on CHS (Cylinder, Head, Sector). Two kinds of zone layout, performance differs at OD(Outer Disc), MD(Middle Disc), ID(Inner Disc)
Access Zones at OD/MD/ID, then measure performance, random 4K accesses at the head of each zone. And test SMR related command latency.
Vendor1 (HA)
Vendor2 (HM)
Vendor3 (HM)
report zone 20ms 24ms 388ms open zone X X X close zone X X X write point
reset 428ms 1353ms 456ms
finish zone X X X
![Page 20: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low](https://reader031.vdocuments.net/reader031/viewer/2022011723/5aa88c287f8b9a95188bae1b/html5/thumbnails/20.jpg)
2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
SMR drive IOPS
0100200300400500600700800
ODMDID
Different IOPS at OD/MD/ID
![Page 21: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low](https://reader031.vdocuments.net/reader031/viewer/2022011723/5aa88c287f8b9a95188bae1b/html5/thumbnails/21.jpg)
2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
SMR drive Throughput
0
50
100
150
200
250
ODMDID
Different throughput at OD/MD/ID
MB/s
![Page 22: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low](https://reader031.vdocuments.net/reader031/viewer/2022011723/5aa88c287f8b9a95188bae1b/html5/thumbnails/22.jpg)
2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
LDB KV Throughput 5 threads (HA SMR vs. CMR)
Test Environment: • ARM 7+1 GB • Linux 2.6.34 • LDB • KV test tool
0
20
40
60
80
100
120
1MB 512KB 256KB 64KB
(MB
/s)
5 threads write
HA SMR
CMR
0
10
20
30
40
50
60
70
80
90
1MB 512KB 256KB 64KB
(MB
/s)
5 threads random read
HA SMR
CMR
HA SMR throughput is half of CMR now
![Page 23: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low](https://reader031.vdocuments.net/reader031/viewer/2022011723/5aa88c287f8b9a95188bae1b/html5/thumbnails/23.jpg)
2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
LDB KV Latency 5 threads (HA SMR vs. CMR)
0
20
40
60
80
100
120
140
1MB 512KB 256KB 64KB 4KB
HA-SMR PUT-avg(ms)
HA SMR SEQ GET-avg(ms)
HA SMR RANDOM GET-avg(ms)
CMR PUT-avg(ms)
CMR SEQ GET-avg(ms)
CMR RANDOM GET-avg(ms)
Now, HA SMR latency is higher than CMR(xx ms level), and lots of jitter
![Page 24: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low](https://reader031.vdocuments.net/reader031/viewer/2022011723/5aa88c287f8b9a95188bae1b/html5/thumbnails/24.jpg)
2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
LDB KV Throughput & Latency over HM SMR
Test Environment: • X86+4 GB (HM need HBA FW & driver modification) • Linux 3.0.76 • LDB • KV test tool
0
20
40
60
80
100
120
HM PUT (MB/s) HM SEQ GET(MB/s)
HM RANDOMGET
(MB/s)
1MB
512KB
256KB
64KB
4KB
0
5
10
15
20
25
PUTavg(ms)
SET GETavg(ms)
RANDOM GETavg(ms)
1MB
512KB
256KB
64KB
4KB
HM SMR throughput is close to CMR, even better some time
HM SMR latency like CMR, and there are few jitter
![Page 25: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low](https://reader031.vdocuments.net/reader031/viewer/2022011723/5aa88c287f8b9a95188bae1b/html5/thumbnails/25.jpg)
2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
Agenda
Object Storage Market Overview Object Storage Design with SMR based Key-Value Store
Summary and Future Works
![Page 26: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low](https://reader031.vdocuments.net/reader031/viewer/2022011723/5aa88c287f8b9a95188bae1b/html5/thumbnails/26.jpg)
2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
Summary
LDB is log structure and in memory index design, most operations are “1 memory access + 1 disk access” , and all writes(include random) are sequential.
Memory Index Table can be swap to disk depends on memory size configuration
Recovery oriented design atomic, write out-of-place not write-in-place Super block tracks zone allocation and pool configuration Memory index table checkpoint KVR(s) in one zone are packed together
![Page 27: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low](https://reader031.vdocuments.net/reader031/viewer/2022011723/5aa88c287f8b9a95188bae1b/html5/thumbnails/27.jpg)
2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
Future Work 1: SMR drive related
SMR drives have new sense code, how to do SMR handle error?
Check libata translate sense code well for ZAC?
Standards for these sense code and translation…
IOCTL is synchronous IO model, check the performance?
Confirm IOCTL works well with NCQ?
![Page 28: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low](https://reader031.vdocuments.net/reader031/viewer/2022011723/5aa88c287f8b9a95188bae1b/html5/thumbnails/28.jpg)
2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
Future Work 2: Integrate with Applications
Applications Delete and update data will cause garbage collection (GC), and GC on SMR will use reset write pointer. How to design efficient GC? reset write pointer may cause FTI (Far Track Interference ) and drives have new sense code, how to do SMR handle error?
Application may put KVR with meta-data, how to implement application aware metadata processing?
![Page 29: Implement Object Storage with SMR based Key-Value Store · PDF fileImplement Object Storage with SMR based Key-Value Store ... Object Storage Design with SMR based Key- ... SCSI low](https://reader031.vdocuments.net/reader031/viewer/2022011723/5aa88c287f8b9a95188bae1b/html5/thumbnails/29.jpg)
2015 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
29
Thank You