hbase mhug 2015
TRANSCRIPT
Page 2 © Hortonworks Inc. 2014
What is HBase?
• HBase is a NoSQL database that stores its data in HDFS • Inherits the characteristics of HDFS:
– Distributed – Linearly scalable – Reliable – Big Data!
• Column-oriented • Use HBase when you need random, realtime (as opposed
to batch) read/write access to your Big Data
Page 3 © Hortonworks Inc. 2014
HBase is not…
• Not a relational database • Not a standalone solution – it relies on HDFS • Not a replacement for a traditional RDBMS • Not optimized for classic, traditional applications • Not ACID compliant
Page 4 © Hortonworks Inc. 2014
HBase Use Cases
Page 4
Flexible Schema
Huge Data Volume
High Read Rate High Write Rate
Machine-‐Generated Data
Distributed Messaging
Real-‐Time Analy@cs
Object Store
User Profile Management
Page 5 © Hortonworks Inc. 2014
Other Example HBase Use Cases • Facebook messaging and counts • SalesForce Dashboards • Time series data (OpenTSDB)
• Oil companies streaming Sensor Data for RealTime Comparisons • Exposing Machine Learning models (like risk sets)
• Enable Storm to change models and access 1000’s without going offline • High Performance Blob Storage
• Imagine Shack • Geospatial indexing
• Farm Field Analysis (Down to the sqft) • Indexing the Internet
• Graph Problems • Ticker Plants
• Streaming stock ticker data to 10,000’s of end clients
Page 5
Page 6 © Hortonworks Inc. 2014
What data semantics does HBase provide?
GET, PUT, DELETE key-value operations SCAN for queries INCREMENT server-side atomic operations Row-level write atomicity MapReduce integration
Page 6
Page 7 © Hortonworks Inc. 2014
Logical ArchitectureDistributed, persistent partitions of a BigTable
ab
dc
ef
hg
ij
lk
mn
po
Table A
Region 1
Region 2
Region 3
Region 4
Region Server 7Table A, Region 1Table A, Region 2
Table G, Region 1070Table L, Region 25
Region Server 86Table A, Region 3Table C, Region 30Table F, Region 160Table F, Region 776
Region Server 367Table A, Region 4Table C, Region 17Table E, Region 52
Table P, Region 1116
Legend: - A single table is partitioned into Regions of roughly equal size. - Regions are assigned to Region Servers across the cluster. - Region Servers host roughly the same number of regions.
Page 7
Page 8 © Hortonworks Inc. 2014 Page 8
Physical ArchitectureDistribution and Data Path
...
ZooKeeper
ZooKeeper
ZooKeeper
HBaseClient
JavaApp
HBaseClient
JavaApp
HBaseClient
HBase Shell
HBaseClient
REST/ThriftGateway
HBaseClient
JavaApp
HBaseClient
JavaApp
RegionServer
DataNode
RegionServer
DataNode
...
RegionServer
DataNode
RegionServer
DataNode
HBaseMaster
NameNode
Legend: - An HBase RegionServer is collocated with an HDFS DataNode. - HBase clients communicate directly with Region Servers for sending and receiving data. - HMaster manages Region assignment and handles DDL operations. - Online configuration state is maintained in ZooKeeper. - HMaster and ZooKeeper are NOT involved in data path.
Page 9 © Hortonworks Inc. 2014 Page 9
Logical Data ModelA sparse, multi-dimensional, sorted map
Legend: - Rows are sorted by rowkey. - Within a row, values are located by column family and qualifier. - Values also carry a timestamp; there can me multiple versions of a value. - Within a column family, data is schemaless. Qualifiers and values are treated as arbitrary bytes.
1368387247 [3.6 kb png data]"thumb"cf2b
a
cf1
1368394583 71368394261 "hello"
"bar"
1368394583 221368394925 13.61368393847 "world"
"foo"
cf21368387684 "almost the loneliest number"1.0001
1368396302 "fourth of July""2011-07-04"
Table A
rowkey columnfamily
columnqualifier timestamp value
Page 10 © Hortonworks Inc. 2014
Apache Phoenix A SQL Skin for HBase • Provides a SQL interface for managing data in HBase. • Large subset of SQL:1999 mandatory featureset.
• Create tables, insert and update data and perform low-latency point lookups through JDBC. • Phoenix JDBC driver easily embeddable in any app that supports JDBC.
Phoenix Makes HBase Better • Oriented toward online / semi-transactional apps. • If HBase is a good fit for your app, Phoenix makes it even better.
• Phoenix gets you out of the “one table per query” model many other NoSQL stores force you into.
Page 11 © Hortonworks Inc. 2014
Apache Phoenix: Current Capabilities
Feature Supported? Common SQL Datatypes Yes Inserts and Updates Yes SELECT, DISTINCT, GROUP BY, HAVING Yes NOT NULL and Primary Key constrants Yes Inner and Outer JOINs Yes Views Yes Subqueries Yes Robust Secondary Indexes Yes
Page 12 © Hortonworks Inc. 2014
Phoenix Provides Familiar SQL Constructs Compare: Phoenix versus Native API
Code Notes // HBase Native API. HBaseAdmin hbase = new HBaseAdmin(conf); HTableDescriptor desc = new HTableDescriptor("us_population"); HColumnDescriptor state = new HColumnDescriptor("state".getBytes()); HColumnDescriptor city = new HColumnDescriptor("city".getBytes()); HColumnDescriptor population = new HColumnDescriptor("population".getBytes()); desc.addFamily(state); desc.addFamily(city); desc.addFamily(population); hbase.createTable(desc);
// Phoenix DDL. CREATE TABLE us_population ( state CHAR(2) NOT NULL, city VARCHAR NOT NULL, population BIGINT CONSTRAINT my_pk PRIMARY KEY (state, city));
• Familiar SQL syntax. • Provides additional constraint
checking.