hbase mhug 2015

© Hortonworks Inc. 2014

Online Data with HBase


What is HBase?

•  HBase is a NoSQL database that stores its data in HDFS •  Inherits the characteristics of HDFS:

– Distributed – Linearly scalable – Reliable – Big Data!

•  Column-oriented •  Use HBase when you need random, realtime (as opposed

to batch) read/write access to your Big Data


HBase is not…

•  Not a relational database •  Not a standalone solution – it relies on HDFS •  Not a replacement for a traditional RDBMS •  Not optimized for classic, traditional applications •  Not ACID compliant


HBase Use Cases

Page 4

Flexible Schema

Huge Data Volume

High Read Rate High Write Rate

Machine-‐Generated Data

Distributed Messaging

Real-‐Time Analy@cs

Object Store

User Profile Management


Other Example HBase Use Cases •  Facebook messaging and counts •  SalesForce Dashboards •  Time series data (OpenTSDB)

•  Oil companies streaming Sensor Data for RealTime Comparisons •  Exposing Machine Learning models (like risk sets)

•  Enable Storm to change models and access 1000’s without going offline •  High Performance Blob Storage

•  Imagine Shack •  Geospatial indexing

•  Farm Field Analysis (Down to the sqft) •  Indexing the Internet

•  Graph Problems •  Ticker Plants

•  Streaming stock ticker data to 10,000’s of end clients

Page 5


What data semantics does HBase provide?

GET, PUT, DELETE key-value operations SCAN for queries INCREMENT server-side atomic operations Row-level write atomicity MapReduce integration

Page 6


Logical ArchitectureDistributed, persistent partitions of a BigTable

ab

dc

ef

hg

ij

lk

mn

po

Table A

Region 1

Region 2

Region 3

Region 4

Region Server 7Table A, Region 1Table A, Region 2

Table G, Region 1070Table L, Region 25

Region Server 86Table A, Region 3Table C, Region 30Table F, Region 160Table F, Region 776

Region Server 367Table A, Region 4Table C, Region 17Table E, Region 52

Table P, Region 1116

Legend: - A single table is partitioned into Regions of roughly equal size. - Regions are assigned to Region Servers across the cluster. - Region Servers host roughly the same number of regions.

Page 7

© Hortonworks Inc. 2014 Page 8

Physical ArchitectureDistribution and Data Path

...

ZooKeeper

ZooKeeper

ZooKeeper

HBaseClient

JavaApp

HBaseClient

JavaApp

HBaseClient

HBase Shell

HBaseClient

REST/ThriftGateway

HBaseClient

JavaApp

HBaseClient

JavaApp

RegionServer

DataNode

RegionServer

DataNode

...

RegionServer

DataNode

RegionServer

DataNode

HBaseMaster

NameNode

Legend: - An HBase RegionServer is collocated with an HDFS DataNode. - HBase clients communicate directly with Region Servers for sending and receiving data. - HMaster manages Region assignment and handles DDL operations. - Online configuration state is maintained in ZooKeeper. - HMaster and ZooKeeper are NOT involved in data path.

© Hortonworks Inc. 2014 Page 9

Logical Data ModelA sparse, multi-dimensional, sorted map

Legend: - Rows are sorted by rowkey. - Within a row, values are located by column family and qualifier. - Values also carry a timestamp; there can me multiple versions of a value. - Within a column family, data is schemaless. Qualifiers and values are treated as arbitrary bytes.

1368387247 [3.6 kb png data]"thumb"cf2b

a

cf1

1368394583 71368394261 "hello"

"bar"

1368394583 221368394925 13.61368393847 "world"

"foo"

cf21368387684 "almost the loneliest number"1.0001

1368396302 "fourth of July""2011-07-04"

Table A

rowkey columnfamily

columnqualifier timestamp value


Apache Phoenix A SQL Skin for HBase •  Provides a SQL interface for managing data in HBase. •  Large subset of SQL:1999 mandatory featureset.

•  Create tables, insert and update data and perform low-latency point lookups through JDBC. •  Phoenix JDBC driver easily embeddable in any app that supports JDBC.

Phoenix Makes HBase Better •  Oriented toward online / semi-transactional apps. •  If HBase is a good fit for your app, Phoenix makes it even better.

•  Phoenix gets you out of the “one table per query” model many other NoSQL stores force you into.


Apache Phoenix: Current Capabilities

Feature Supported? Common SQL Datatypes Yes Inserts and Updates Yes SELECT, DISTINCT, GROUP BY, HAVING Yes NOT NULL and Primary Key constrants Yes Inner and Outer JOINs Yes Views Yes Subqueries Yes Robust Secondary Indexes Yes


Phoenix Provides Familiar SQL Constructs Compare: Phoenix versus Native API

Code Notes // HBase Native API. HBaseAdmin hbase = new HBaseAdmin(conf); HTableDescriptor desc = new HTableDescriptor("us_population"); HColumnDescriptor state = new HColumnDescriptor("state".getBytes()); HColumnDescriptor city = new HColumnDescriptor("city".getBytes()); HColumnDescriptor population = new HColumnDescriptor("population".getBytes()); desc.addFamily(state); desc.addFamily(city); desc.addFamily(population); hbase.createTable(desc);

// Phoenix DDL. CREATE TABLE us_population ( state CHAR(2) NOT NULL, city VARCHAR NOT NULL, population BIGINT CONSTRAINT my_pk PRIMARY KEY (state, city));

•  Familiar SQL syntax. •  Provides additional constraint

checking.

hbase mhug 2015

Technology

region assignment

region servers host

hbase clients

hbase regionserver

data path

online data

hbase use cases page

sensor data