introduction to hbase. agenda what is hbase about rdbms overview of hbase why hbase instead of...

22
Introduction to Hbase

Upload: sherman-willis

Post on 04-Jan-2016

290 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface

Introduction to Hbase

Page 2: Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface

Agenda

What is Hbase

About RDBMS

Overview of Hbase

Why Hbase instead of RDBMS

Architecture of Hbase

Hbase interface

Summarize

Page 3: Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface

What is Hbase

Hbase is an open source, distributed sorted map modeled after Google's BigTable

Page 4: Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface

Open Source

Apache 2.0 License Committers and contributors from diverse organizations

like Facebook, Trend Micro etc.

Page 5: Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface

About RDBMS

Have a lot of Limitations Both read / write throughput not high (transactional

databases)

Specialized Hardware is quite expensive

Page 6: Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface

Background

Google releases paper on Bigtable – 2006

First usable Hbase – 2007

Hbase becomes Apache top-level project – 2010

Page 7: Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface

Overview of Hbase

Hbase is a part of Hadoop eco-system. Apache Hadoop is an open source system to reliably

store and process data across many commodity computers

Hadoop provides: Fault tolerance Scalability

Page 8: Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface

Hadoop's components

MapReduce (Process)

Fault tolerant distributed processing

HDFS (store) Self-healing High-bandwidth Clustered storage

Page 9: Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface

Difference Between Hadoop/HDFS and Hbase

HDFS is a distributed file system that is well suited for the storage of large files.

Hbase is built on top of HDFS and provides fast record lookups (and updates) for large tables.

HDFS has based on GFS.

Page 10: Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface

Hbase is

Distributed – uses HDFS for storage

Column – Oriented

Multi-Dimensional

Storage System

Page 11: Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface

Hbase is NOT A sql Database – No Joins, no query engine, no

datatypes, no sql

No Schema

Page 12: Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface

Storage Model

Column – oriented database (column families) Table consists of Rows, each which has a primary

key(row key)

Each Row may have any number of columns Table schema only defines Column families(column family

can have any number of columns)

Each cell value has a timestamp

Page 13: Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface

Static Columns

int varchar int varchar int

int varchar int varchar int

int varchar int varchar int

Page 14: Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface

Something different

Row1 → ColA = Value ColB = Value ColC = Value

Row2 → ColX = Value ColY = Value

Page 15: Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface

A Big Map Row Key + Column Key + timestamp => value

Row Key Column Key Timestamp Value

1 Info:name 1273516197868 Sakis

1 Info:age 1273871824184 21

1 Info:sex 1273746281432 Male

2 Info:name 1273863723227 Themis

2 Info:name 1273973134238 Andreas

Page 16: Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface

RDBMS vs Hbase

RDBMS Hbase

Data layout Row-oriented Column-oriented

Query language SQL Get/put/scan/etc *

Security Authentication/Authorization

Work in Progress

Max data size TBs Hundrends of PBs

Read / write throughputlimits

1000s queries/second Millions of queries persecond

Page 17: Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface

Terms and Daemons

Region

A subset of table's rows

Region Server(slave)

Serves data for reads and writes

Master Responsible for coordinating the slaves Assigns regions, detects failures of Region Servers Control some admin function

Page 18: Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface

Distributed coordination

To manage master election and server availability we use Zookeeper

Set up a cluster, provides distributed coordination primitives

An excellent tool for building cluster management systems

Page 19: Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface

Hbase Architecture

Page 20: Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface

Hbase Interface

Java

Thrift (Ruby, Php, Python, Perl, C++,..)

Hbase Shell

Page 21: Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface

Use Hbase if

You need random write, random read or both

You need to do many thousands of operations per sec on multiple TB of data

Your access patterns are simple

Page 22: Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface

Thank You