october 2013 hug: hbase 0.96

52
0.96.0 Bay Area Hadoop User Group, October 16 th , 2013

Upload: yahoo-developer-network

Post on 26-Jan-2015

110 views

Category:

Technology


3 download

DESCRIPTION

The next major version - 0.96- of Apache HBase have several new features. The "Singularity", because you will have to start and stop your cluster to upgrade to 0.96. 0.96 requires Apache Hadoop 1.0.0 at least, and supported on Hadoop 2.0.0 as well. 0.96 uses protobufs all the time. All of its serializations to ZooKeeper, to the filesystem, and over rpc are protobufs. It runs on JDK7. Metrics have been edited and converted to use Hadoop Metrics2. It has HBase Snapshots and PrefixTreeCompression, etc. This presentation captures a high-level overview of what's new in HBase 0.96.

TRANSCRIPT

Page 1: October 2013 HUG: HBase 0.96

0.96.0

Bay Area Hadoop User Group, October 16th, 2013

Page 2: October 2013 HUG: HBase 0.96

Michael Stack <[email protected]>

• 0.96.0 Release Manager• Chair of Apache HBase PMC*• Apache Hadoop PMC• Engineer at Cloudera in San Francisco

* Project Management Committee

Page 3: October 2013 HUG: HBase 0.96

HBase?

Page 4: October 2013 HUG: HBase 0.96

"...scalable, distributed datastore."

Page 5: October 2013 HUG: HBase 0.96

"...open source, distributed, scalable, consistent, low latency, random access non-relational database..."

Page 6: October 2013 HUG: HBase 0.96

Inspiration

A Google Technology described in a 2006 paper, by Chang et al.?

Page 7: October 2013 HUG: HBase 0.96

●Apache Top-level Project○hbase.apache.org●Up out of Apache Hadoop contrib●Project goal: “Billions of rows X millions of columns on clusters of ‘commodity hardware”●HBase persists all data to HDFS●Uses Apache ZooKeeper○Cluster coordination

Page 8: October 2013 HUG: HBase 0.96

When would I use it?

Page 9: October 2013 HUG: HBase 0.96

BIG DATA

Random read/writes

Page 10: October 2013 HUG: HBase 0.96

SCALING!

Page 11: October 2013 HUG: HBase 0.96

Who uses it?

Page 12: October 2013 HUG: HBase 0.96
Page 13: October 2013 HUG: HBase 0.96

Who runs the project?

Page 14: October 2013 HUG: HBase 0.96

Diverse team*

* http://hbase.apache.org/team-list.html

COMMITTERS!

Preferably ALIVE!

Page 15: October 2013 HUG: HBase 0.96
Page 16: October 2013 HUG: HBase 0.96

•Release every month• Each more stable•& more performant•Some features…• Wire compatible between releases

•Currently at 0.94.12

Page 17: October 2013 HUG: HBase 0.96

http://www.flickr.com/photos/sysli/3026288256/sizes/o/in/photostream/

Page 18: October 2013 HUG: HBase 0.96
Page 19: October 2013 HUG: HBase 0.96
Page 20: October 2013 HUG: HBase 0.96

(Self-)Migration

Page 21: October 2013 HUG: HBase 0.96

Downstreamers● Minimal API disturbance

–None?–Last-minute feedback

●Hive, Sqoop, OpenTSDB● Deprecations

Page 22: October 2013 HUG: HBase 0.96

Stats● >2k issues fixed

– >1500 in 0.96.x only● Currently 6th Release Candidate● Branched 7months ago● 18months in the making

Page 23: October 2013 HUG: HBase 0.96

Requirements● Hadoop 1.0.3+● Hadoop 2.1.0-beta+● Must choose one

Page 24: October 2013 HUG: HBase 0.96

Big Themes● Stability● Operability

–Insight, tools● Scalability● Evolvability

Page 25: October 2013 HUG: HBase 0.96

http://www.flickr.com/photos/allspaw/5815258929/sizes/o/in/photostream/

Page 26: October 2013 HUG: HBase 0.96

http://www.flickr.com/photos/allspaw/5815258929/sizes/o/in/photostream/

Page 27: October 2013 HUG: HBase 0.96

http://www.flickr.com/photos/38595542@N02/3690830720/sizes/o/in/photostream/

Page 28: October 2013 HUG: HBase 0.96

• Dedicated meta WAL

• Don't put WAL replicas on local node– 33% of reads have to timeout

• Lowered ZK timeout– 30s instead of 180s

• Watcher script kills znode– Detection time approaches 0

• Faster assignment

HBase

Page 29: October 2013 HUG: HBase 0.96

• HDFS-4721 Speed up lease/block recovery when DN fails and a block goes into recovery– Do not recover on STALE DNs

• HDFS-3703 Decrease the datanode failure detection time– Avoid reading STALE DNs

• HDFS-3912 Detecting and avoiding stale datanodes for writing

HDFS

Page 30: October 2013 HUG: HBase 0.96

● Faster WAL replay/Distributed WAL Replay– No intermediate files

● No wait on NN– Committed

● Experimental● Regions online immediately for Writes

– Read older consistent view● “Favored Nodes”

Coming...

Page 31: October 2013 HUG: HBase 0.96
Page 32: October 2013 HUG: HBase 0.96
Page 33: October 2013 HUG: HBase 0.96

One rationale for pb: http://goo.gl/N0HO6n

Page 34: October 2013 HUG: HBase 0.96

• System tables• Filesystem• Up in zookeeper• Over the wire

Page 35: October 2013 HUG: HBase 0.96

RPC• Implements Protobuf Service

●Specification!• Data on the sideoEncodingoCompression

PB DATA

Page 36: October 2013 HUG: HBase 0.96

Scalability• e.g. Replicating 1k to 1k & heading north

• HBASE-8778 Region assigments scan table directory making them slow for huge tables

• HBASE-9208 ReplicationLogCleaner slow at large scale

• HBASE-8877 Reentrant row locks

Page 37: October 2013 HUG: HBase 0.96
Page 38: October 2013 HUG: HBase 0.96

Snapshots• By TableoSnapshot, clone, restore, export

• InexpensiveoJust metadata

• Good for...oBackupsoReplicationoOffline processing

Page 39: October 2013 HUG: HBase 0.96

Integration Tests• Cluster test module

o Standalone or clustero Sizeable

x data x runtime

• "Borrows" test types from all overo Netflix "ChaosMonkey"o Apache Accumulo linked-list dataloss

checkerhbase-it/src/test/java//org/apache/hadoop/hbase/mapreduce/IntegrationTestBulkLoad.java

hbase-it/src/test/java//org/apache/hadoop/hbase/mapreduce/IntegrationTestImportTsv.java

hbase-it/src/test/java//org/apache/hadoop/hbase/mttr/IntegrationTestMTTR.java

hbase-it/src/test/java//org/apache/hadoop/hbase/test/IntegrationTestBigLinkedList.java

hbase-it/src/test/java//org/apache/hadoop/hbase/test/IntegrationTestLoadAndVerify.java

hbase-it/src/test/java//org/apache/hadoop/hbase/trace/IntegrationTestSendTraceRequests.java

Page 40: October 2013 HUG: HBase 0.96

StochasticLoadBalancer

• Region Count

• Locality

• Movement Cost

• Table Count

• Regions/Table/RegionServer

• Read/Write Counts

• Memstore Size

• Storefile Size

Page 41: October 2013 HUG: HBase 0.96

Tracing• Review HDFS-5274 Add Tracing to HDFS!

Page 42: October 2013 HUG: HBase 0.96

Namespaces• Grouping of tables

– Like database in mysql

• System/User– hbase:meta

• Quota• Coming

– Security by ns– Grouping on cluster by ns

Page 43: October 2013 HUG: HBase 0.96

Metrics2● Radical revamp● Module of Interfaces

–H1 and H2 Impls modules● Categories/Naming/Patterns

Page 44: October 2013 HUG: HBase 0.96

API● Client/Dev● Hadoop Annotations

– Stable/Evolving/Private● Cell Interface

– KeyValue deprecated

Page 45: October 2013 HUG: HBase 0.96
Page 46: October 2013 HUG: HBase 0.96

Miscellaneous• X-Row (in-region) Transactions• Hardened Assignment• Hardened Replication• New UI• Online Merge• Finer grained ACLs• More Coprocessor hooks

Page 47: October 2013 HUG: HBase 0.96

More Misc.• Maven modularized• Client-side Types• Revamped defaults• Compactionso Pluggableo Smarter triggers

• Windows!

Page 48: October 2013 HUG: HBase 0.96

0.96.1, 0.96.2, etc.● Bug fixes● Performance fixes● ONLY!● No features!

Page 49: October 2013 HUG: HBase 0.96
Page 50: October 2013 HUG: HBase 0.96

• Right after 0.96.0– Month or two

• Rolling upgrade from 0.96.0

• In-line Cell-tags• Quota/Groupings• Reverse Scan

Page 51: October 2013 HUG: HBase 0.96

1.0.0?

Page 52: October 2013 HUG: HBase 0.96

Thank [email protected]