hbase mhug 2015

12
Page 1 © Hortonworks Inc. 2014 Online Data with HBase

Upload: joseph-niemiec

Post on 07-Aug-2015

213 views

Category:

Technology


0 download

TRANSCRIPT

Page 1 © Hortonworks Inc. 2014

Online Data with HBase

Page 2 © Hortonworks Inc. 2014

What is HBase?

•  HBase is a NoSQL database that stores its data in HDFS •  Inherits the characteristics of HDFS:

– Distributed – Linearly scalable – Reliable – Big Data!

•  Column-oriented •  Use HBase when you need random, realtime (as opposed

to batch) read/write access to your Big Data

Page 3 © Hortonworks Inc. 2014

HBase is not…

•  Not a relational database •  Not a standalone solution – it relies on HDFS •  Not a replacement for a traditional RDBMS •  Not optimized for classic, traditional applications •  Not ACID compliant

Page 4 © Hortonworks Inc. 2014

HBase Use Cases

Page 4

Flexible  Schema  

Huge  Data  Volume  

High  Read  Rate   High  Write  Rate  

Machine-­‐Generated  Data  

Distributed  Messaging  

Real-­‐Time  Analy@cs  

Object  Store  

User  Profile  Management  

Page 5 © Hortonworks Inc. 2014

Other Example HBase Use Cases •  Facebook messaging and counts •  SalesForce Dashboards •  Time series data (OpenTSDB)

•  Oil companies streaming Sensor Data for RealTime Comparisons •  Exposing Machine Learning models (like risk sets)

•  Enable Storm to change models and access 1000’s without going offline •  High Performance Blob Storage

•  Imagine Shack •  Geospatial indexing

•  Farm Field Analysis (Down to the sqft) •  Indexing the Internet

•  Graph Problems •  Ticker Plants

•  Streaming stock ticker data to 10,000’s of end clients

Page 5

Page 6 © Hortonworks Inc. 2014

What data semantics does HBase provide?

GET, PUT, DELETE key-value operations SCAN for queries INCREMENT server-side atomic operations Row-level write atomicity MapReduce integration

Page 6

Page 7 © Hortonworks Inc. 2014

Logical ArchitectureDistributed, persistent partitions of a BigTable

ab

dc

ef

hg

ij

lk

mn

po

Table A

Region 1

Region 2

Region 3

Region 4

Region Server 7Table A, Region 1Table A, Region 2

Table G, Region 1070Table L, Region 25

Region Server 86Table A, Region 3Table C, Region 30Table F, Region 160Table F, Region 776

Region Server 367Table A, Region 4Table C, Region 17Table E, Region 52

Table P, Region 1116

Legend: - A single table is partitioned into Regions of roughly equal size. - Regions are assigned to Region Servers across the cluster. - Region Servers host roughly the same number of regions.

Page 7

Page 8 © Hortonworks Inc. 2014 Page 8

Physical ArchitectureDistribution and Data Path

...

ZooKeeper

ZooKeeper

ZooKeeper

HBaseClient

JavaApp

HBaseClient

JavaApp

HBaseClient

HBase Shell

HBaseClient

REST/ThriftGateway

HBaseClient

JavaApp

HBaseClient

JavaApp

RegionServer

DataNode

RegionServer

DataNode

...

RegionServer

DataNode

RegionServer

DataNode

HBaseMaster

NameNode

Legend: - An HBase RegionServer is collocated with an HDFS DataNode. - HBase clients communicate directly with Region Servers for sending and receiving data. - HMaster manages Region assignment and handles DDL operations. - Online configuration state is maintained in ZooKeeper. - HMaster and ZooKeeper are NOT involved in data path.

Page 9 © Hortonworks Inc. 2014 Page 9

Logical Data ModelA sparse, multi-dimensional, sorted map

Legend: - Rows are sorted by rowkey. - Within a row, values are located by column family and qualifier. - Values also carry a timestamp; there can me multiple versions of a value. - Within a column family, data is schemaless. Qualifiers and values are treated as arbitrary bytes.

1368387247 [3.6 kb png data]"thumb"cf2b

a

cf1

1368394583 71368394261 "hello"

"bar"

1368394583 221368394925 13.61368393847 "world"

"foo"

cf21368387684 "almost the loneliest number"1.0001

1368396302 "fourth of July""2011-07-04"

Table A

rowkey columnfamily

columnqualifier timestamp value

Page 10 © Hortonworks Inc. 2014

Apache Phoenix A SQL Skin for HBase •  Provides a SQL interface for managing data in HBase. •  Large subset of SQL:1999 mandatory featureset.

•  Create tables, insert and update data and perform low-latency point lookups through JDBC. •  Phoenix JDBC driver easily embeddable in any app that supports JDBC.

Phoenix Makes HBase Better •  Oriented toward online / semi-transactional apps. •  If HBase is a good fit for your app, Phoenix makes it even better.

•  Phoenix gets you out of the “one table per query” model many other NoSQL stores force you into.

Page 11 © Hortonworks Inc. 2014

Apache Phoenix: Current Capabilities

Feature Supported? Common SQL Datatypes Yes Inserts and Updates Yes SELECT, DISTINCT, GROUP BY, HAVING Yes NOT NULL and Primary Key constrants Yes Inner and Outer JOINs Yes Views Yes Subqueries Yes Robust Secondary Indexes Yes

Page 12 © Hortonworks Inc. 2014

Phoenix Provides Familiar SQL Constructs Compare: Phoenix versus Native API

Code Notes //  HBase  Native  API.  HBaseAdmin  hbase  =  new  HBaseAdmin(conf);  HTableDescriptor  desc  =  new  HTableDescriptor("us_population");  HColumnDescriptor  state  =  new  HColumnDescriptor("state".getBytes());  HColumnDescriptor  city  =  new  HColumnDescriptor("city".getBytes());  HColumnDescriptor  population  =  new  HColumnDescriptor("population".getBytes());  desc.addFamily(state);  desc.addFamily(city);  desc.addFamily(population);  hbase.createTable(desc);    

//  Phoenix  DDL.  CREATE  TABLE  us_population  (                  state  CHAR(2)  NOT  NULL,                  city  VARCHAR  NOT  NULL,                  population  BIGINT  CONSTRAINT  my_pk  PRIMARY  KEY  (state,  city));  

•  Familiar SQL syntax. •  Provides additional constraint

checking.