breaking with relational dbms and dating with hbase

40
1 Gaurav Kohli Xebia Breaking with DBMS and Dating with

Upload: gaurav-kohli

Post on 27-Jan-2015

112 views

Category:

Technology


2 download

DESCRIPTION

Session on Hbase at IndicThread Conference on Java, Dec 2010 http://j10.indicthreads.com/

TRANSCRIPT

Page 1: Breaking with relational dbms and dating with hbase

1

Gaurav KohliXebiaBreaking with DBMS and

Dating with

Page 2: Breaking with relational dbms and dating with hbase

2

me

Gaurav [email protected]

ConsultantXebia IT Architects

Page 3: Breaking with relational dbms and dating with hbase

3

Why are we here ?

Something about RDBMS

Limitations of RDBMS

Why Hbase or any NoSql solution

Overview of Hbase

Specific Use cases

Paradigm shift in Schema Design

Architecture of Hbase

Hbase Interface – Java API, Thrift

Conclusion

Page 4: Breaking with relational dbms and dating with hbase

4

Databases

Page 5: Breaking with relational dbms and dating with hbase

5

Relational Databases have a lot of

Page 6: Breaking with relational dbms and dating with hbase

6

Data Set going into PetaBytes

RDBMS don't scale inherently Scale up/Scale out ( Load Balancing + Replication)

Hard to shard / partition

Both read / write throughput not possible Transactional / Analytical databases

Specialized Hardware …... is very expensive Oracle clustering

Page 7: Breaking with relational dbms and dating with hbase

7

Master

Slave

Replication

Page 8: Breaking with relational dbms and dating with hbase

8

MySQL master becomes a problem All Slaves must have the same write capacity as master Single point of failure, no easy failover

Master

Reads

Writes

Slave nodes

Page 9: Breaking with relational dbms and dating with hbase

9

Master Master

Slave

Replication

Page 10: Breaking with relational dbms and dating with hbase

10

Page 11: Breaking with relational dbms and dating with hbase

11

Page 12: Breaking with relational dbms and dating with hbase

12

2006.11 Google releases paper on BigTable

2007.2 Initial HBase prototype created as Hadoop contrib.

2007.10 First usable HBase

2008.1 Hadoop become Apache top-level project and HBase becomes

subproject 2010.5~

Hbase becomes Apache top-level project 2010.6

Hbase 0.26.5 released.

2010.10

HBase 0.89.2010092 – third developer release

Page 13: Breaking with relational dbms and dating with hbase

13

Distributed uses HDFS for storage

Column-Oriented

Multi-Dimensional versions

High-Availability

High-Performance

Storage System

Page 14: Breaking with relational dbms and dating with hbase

14

A Sql Database

No Joins, no query engine, no datatypes, no sql No Schema

Denormalized data

Wide and sparsely populated data structure(key-value)

No DBA needed

Hbase is

Page 15: Breaking with relational dbms and dating with hbase

15

Bigness Big data, big number of users, big number of computers

Massive write performance Facebook needs 135 billion messages a month Twitter stores 7 TB data per day

Fast key-value access

Write availability

No Single point of failure

Page 16: Breaking with relational dbms and dating with hbase

16

Managing large streams of non-transactional data: Apache logs, application logs, MySQL logs, etc.

Real-time inserts, updates, and queries.

Fraud detection by comparing transactions to known patterns in real-time.

Analytics - Use MapReduce, Hive, or Pig to perform analytical queries

Specific

Page 17: Breaking with relational dbms and dating with hbase

17

Column-oriented database

Table are sorted by Row

Table schema only defines Column families column family can have any number of columns

Each cell value has a timestamp

Page 18: Breaking with relational dbms and dating with hbase

18

Page 19: Breaking with relational dbms and dating with hbase

19

Page 20: Breaking with relational dbms and dating with hbase

20

Sorted Map(

RowKey, List(

SortedMap(Column, List(

value, Timestamp)

))

)SortedMap(RowKey,List(SortedMap(Column,List(Value,Timestamp)))

Page 21: Breaking with relational dbms and dating with hbase

21

A BIG SORTED MAP Row Key+ Column Key + timestamp => value

Row Key Column Key Timestamp Value

1 info:name 1273516197868 Gaurav

1 info:age 1273871824184 28

1 info:age 1273871823022 34

1 info:sex 1273746281432 Male

2 info:name 1273863723227 Harsh

3 Info:name 1273822456433 Raman

2 Versionsof this row

Timestamp is a long valueColumn Qualifier/Name

Sorted by Row key andcolumn key

Column family

Student table

Page 22: Breaking with relational dbms and dating with hbase

22

Example of a Student and Subject

Student TablePK id

nameagesex

Example of a Student and Subject

Subject TablePK id

titleintroductionteacher_id

Student-Subject Tablestudent_id

subject_id

type

m n

Page 23: Breaking with relational dbms and dating with hbase

23

Example of a Student and Subject

RDBMS

key name age sex1 Gaurav 28 Male

id title introduction teacher_id1 Hbase Hbase is cool 10

Student table

Subject table

student_id subject_id type

1 1 elective

Student-Subject table

Page 24: Breaking with relational dbms and dating with hbase

24

Hbase

Student-Subject schema - Hbase

Row Key Column family Column Keys

student_id info name, age, sex

student_id subjects Subject Id's as qualifier(key)

Row Key Column family Column Keyssubject_id info title, introduction, teacher_id

subject_id students Student id's as qualifier(key)

Student table

Subject table

Page 25: Breaking with relational dbms and dating with hbase

25

Hbase

key info subjects1 info:name=Gaurav

info:age=28info:sex=Male

subjects:1=”elective”subjects:2=”main”

key info students1 info:title=Hbase

info:introduction=Hbase is coolinfo:teacher_id=10

students:1students:2

Student-Subject schema - HbaseStudent table

Subject table

Page 26: Breaking with relational dbms and dating with hbase

26

Attribute Possible Values Default

COMPRESSION NONE,GZ,LZO NONE

VERSIONS 1+ 3

TTL 1-2147483647(seconds) 2147483647

BLOCKSIZE 1 byte – 2 GB 64k

IN_MEMORY true,false false

BLOCKCACHE true,false true

Page 27: Breaking with relational dbms and dating with hbase

27

Region: Contiguous set of lexicographically sorted rows

hbase.hregion.max.filesize (default:256 Mb) Region hosted by Region Servers

Each Table is partitioned into Regions

Page 28: Breaking with relational dbms and dating with hbase

28

Regions and

row200

row201

row500

row1

new row

Page 29: Breaking with relational dbms and dating with hbase

29

Regions and

row200

row201

row350

row1

row 351

row 501

Page 30: Breaking with relational dbms and dating with hbase

30

Master

Zookeeper

RegionServers

HDFS

MapReduce

Page 31: Breaking with relational dbms and dating with hbase

31

Page 32: Breaking with relational dbms and dating with hbase

32

– Java API, Thrift...

Page 33: Breaking with relational dbms and dating with hbase

33

– Java API, Thrift... Java

Thrift ( Ruby, Php, Python, Perl, C++... )

REST

Groovy DSL

MapReduce

Hbase Shell

Page 34: Breaking with relational dbms and dating with hbase

34

– Java API, Thrift... Java

Get Put Delete Scan IncrementalColumnValue

Page 35: Breaking with relational dbms and dating with hbase

35

Page 36: Breaking with relational dbms and dating with hbase

36

Hbase v/s RDBMS Not a replacement Solves only a small subset(~5%)

Page 37: Breaking with relational dbms and dating with hbase

37

Where Sql makes life easy Joining Secondary Indexing Referential Integrity (updates) ACID

Where Hbase makes life easy Dataset scale Read/Write scale

Replication Batch analysis

Page 38: Breaking with relational dbms and dating with hbase

38

Page 39: Breaking with relational dbms and dating with hbase

39

Page 40: Breaking with relational dbms and dating with hbase

40

Hbase Apache (http://hbase.apache.org/)

Hbase Wiki (wiki.apache.org/hadoop/Hbase)

Hbase blog (blog.hbase.org)

Images from Google Search

http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html

http://highscalability.com/blog/2010/12/6/what-the-heck-are-you-actually-using-nosql-for.html