h base 使用初体验
TRANSCRIPT
![Page 1: H base 使用初体验](https://reader034.vdocuments.net/reader034/viewer/2022052200/554bc574b4c90530298b54ba/html5/thumbnails/1.jpg)
HBase 使用初体验
2011-05-18
叔宝@搜索中心
![Page 2: H base 使用初体验](https://reader034.vdocuments.net/reader034/viewer/2022052200/554bc574b4c90530298b54ba/html5/thumbnails/2.jpg)
Agenda
• QCon NoSQL简介
• HBase 介绍
• HBase@搜索中心
• 遇到的问题
![Page 3: H base 使用初体验](https://reader034.vdocuments.net/reader034/viewer/2022052200/554bc574b4c90530298b54ba/html5/thumbnails/3.jpg)
NoSQL@Qcon
• NoSQLs : Key/Value– Facebook ,Twitter: HBase
– Sina : Mysql->Mysql + Redis
– 豆瓣 : BeansDB
– 淘宝 : tair
– 百度 : bailingDB
– 人人 : Nuclear
– QQ Mail : SimpleDB
– 视觉中国 : MongoDB
线上每月25T消息hbase + haystack5w qps
记录为几K~几M之间
Dynamo
支持业务的cache
千亿网页存储
![Page 4: H base 使用初体验](https://reader034.vdocuments.net/reader034/viewer/2022052200/554bc574b4c90530298b54ba/html5/thumbnails/4.jpg)
HBase 介绍
• Background
• Data Model
• Architecture
• Features
• Hbase API
![Page 5: H base 使用初体验](https://reader034.vdocuments.net/reader034/viewer/2022052200/554bc574b4c90530298b54ba/html5/thumbnails/5.jpg)
Background
• BigTable
• Hadoop
![Page 6: H base 使用初体验](https://reader034.vdocuments.net/reader034/viewer/2022052200/554bc574b4c90530298b54ba/html5/thumbnails/6.jpg)
Background
![Page 7: H base 使用初体验](https://reader034.vdocuments.net/reader034/viewer/2022052200/554bc574b4c90530298b54ba/html5/thumbnails/7.jpg)
What is HBase
• Column-oriented semi-structured data store
• Distributed
• Layered over HDFS
• Tolerant of machine failure
• Strong consistency
![Page 8: H base 使用初体验](https://reader034.vdocuments.net/reader034/viewer/2022052200/554bc574b4c90530298b54ba/html5/thumbnails/8.jpg)
From facebook
![Page 9: H base 使用初体验](https://reader034.vdocuments.net/reader034/viewer/2022052200/554bc574b4c90530298b54ba/html5/thumbnails/9.jpg)
Data Model
• Basic concept
– Table
– Row(全局有序)
– Column = Family + qualifier(不固定)
– Timestamp(version)
– Cell
– region
![Page 10: H base 使用初体验](https://reader034.vdocuments.net/reader034/viewer/2022052200/554bc574b4c90530298b54ba/html5/thumbnails/10.jpg)
Data Model(cnt.)
Row(Uid)
Info Friends
Name Sex Age … 1 3 4 …
1 John M 23 Bf
2 Smith
3 Lily F 22 Gf Sister
4 Lucy F 22 Sister
Table : User-Friends
![Page 11: H base 使用初体验](https://reader034.vdocuments.net/reader034/viewer/2022052200/554bc574b4c90530298b54ba/html5/thumbnails/11.jpg)
Data Model(cnt.)
• Physical Storage
Row(Uid)
Info Friends
Name Sex Age … 1 3 4 …
1 John M 23 Bf
2 Smith
3 Lily F 22 Gf Sister
4 Lucy F 22 Sister
![Page 12: H base 使用初体验](https://reader034.vdocuments.net/reader034/viewer/2022052200/554bc574b4c90530298b54ba/html5/thumbnails/12.jpg)
Architecture
![Page 13: H base 使用初体验](https://reader034.vdocuments.net/reader034/viewer/2022052200/554bc574b4c90530298b54ba/html5/thumbnails/13.jpg)
Architecture(cnt)
• Client– Read/Write Data(RegionServer)– Schema Manager(Master)
• Zookeeper– Master选举和恢复– 定位Root region– RegionServer上下线感知
• Master– Region assign(balancer)– Metadata operation
• RegionServer– 用户IO请求– Split/Compact Region
![Page 14: H base 使用初体验](https://reader034.vdocuments.net/reader034/viewer/2022052200/554bc574b4c90530298b54ba/html5/thumbnails/14.jpg)
Where is ipad@query table?
• 3 级查找
-ROOT-.META1.
META2zookeeper
.META1.
ipad
query
![Page 15: H base 使用初体验](https://reader034.vdocuments.net/reader034/viewer/2022052200/554bc574b4c90530298b54ba/html5/thumbnails/15.jpg)
Write
MemStore
Seq#1,Table13,Region11,…
Seq#2,Table5,Region2,…
HLog
StoreFile StoreFile Small
writer
Compaction
StoreFile
![Page 16: H base 使用初体验](https://reader034.vdocuments.net/reader034/viewer/2022052200/554bc574b4c90530298b54ba/html5/thumbnails/16.jpg)
Features
• Scalability
• High performance
• Reliability
• Hbase API
![Page 17: H base 使用初体验](https://reader034.vdocuments.net/reader034/viewer/2022052200/554bc574b4c90530298b54ba/html5/thumbnails/17.jpg)
Scalability
• 扩容– 传统方案:分库分表
– HBase : 直接新增机器
– RegionServer• Split
• Load balance
– HDFS
• Schema变化– 传统方案:停机维护
– HBase : 动态增删列(族)
![Page 18: H base 使用初体验](https://reader034.vdocuments.net/reader034/viewer/2022052200/554bc574b4c90530298b54ba/html5/thumbnails/18.jpg)
High performance
• 随机读
– Key/Value
– Cache(客户端cache+MemStore+Blocking)
– 按列存储
– Split/balance
– BloomFilter
![Page 19: H base 使用初体验](https://reader034.vdocuments.net/reader034/viewer/2022052200/554bc574b4c90530298b54ba/html5/thumbnails/19.jpg)
High performance(cnt.)
• 随机写
– WAL
– Cache(MemStore)
– Compact/Split/balance
![Page 20: H base 使用初体验](https://reader034.vdocuments.net/reader034/viewer/2022052200/554bc574b4c90530298b54ba/html5/thumbnails/20.jpg)
High performance(cnt.)
• 范围查询Scan
– Row全局有序
![Page 21: H base 使用初体验](https://reader034.vdocuments.net/reader034/viewer/2022052200/554bc574b4c90530298b54ba/html5/thumbnails/21.jpg)
Reliability(Fault-tolerance)
• Layered On HDFS
• WAL
• Automatic Failover
– Region Server
– Master
![Page 22: H base 使用初体验](https://reader034.vdocuments.net/reader034/viewer/2022052200/554bc574b4c90530298b54ba/html5/thumbnails/22.jpg)
HBase API
• HBase shell (like mysql/hive)
• Java API
• Thrift
• REST
• Jython,Scala,Groovy DSL, Cascading, Pig, Hive…
![Page 23: H base 使用初体验](https://reader034.vdocuments.net/reader034/viewer/2022052200/554bc574b4c90530298b54ba/html5/thumbnails/23.jpg)
Java API
• Get
• Put
• Delete
• Scan
• HBaseAdmin
• MapReduce
![Page 24: H base 使用初体验](https://reader034.vdocuments.net/reader034/viewer/2022052200/554bc574b4c90530298b54ba/html5/thumbnails/24.jpg)
HBase@搜索中心
• 一淘– 网页(文本信息+全网B2C商品)
– 图片(缩略图+价格图)
– 外网合作商家数据
• 主要操作:– 网页选取
– 链接提取
– 图片处理
– Index Build(全量+增量)
![Page 25: H base 使用初体验](https://reader034.vdocuments.net/reader034/viewer/2022052200/554bc574b4c90530298b54ba/html5/thumbnails/25.jpg)
HBase@搜索中心(cnt.)
• 数据平台
– 卖家数据存储
• 基本信息
• 每天流量来源
• 反作弊
• 相关性
– Query数据存储
![Page 26: H base 使用初体验](https://reader034.vdocuments.net/reader034/viewer/2022052200/554bc574b4c90530298b54ba/html5/thumbnails/26.jpg)
HBase@搜索中心(cnt.)
• Dump中心
– 宝贝数据
– 用户数据
– …
• 解决增量问题
![Page 27: H base 使用初体验](https://reader034.vdocuments.net/reader034/viewer/2022052200/554bc574b4c90530298b54ba/html5/thumbnails/27.jpg)
遇到的问题/经验分享
• 表Schema设计
– Key : 查找某个query某个类目id下的数据
– Column Family
– 每天一表or总共一张表
• 压缩
• Region pre-sharding
• 机房数据迁移(bulk loader)
• WAL的影响评估
![Page 28: H base 使用初体验](https://reader034.vdocuments.net/reader034/viewer/2022052200/554bc574b4c90530298b54ba/html5/thumbnails/28.jpg)
相关学习资料
• Soure Code
• BigTable 论文
• Website Book
• Wiki
• 菜鸟看Hbase by 毕玄
• HBase @ Hadoop Day Seattle
• Facebook's New Realtime Analytics System: HBase to Process 20 Billion Events Per Day
• QCon北京 2011 总结