hbase: introduction to column oriented databases

15
HBase Introduction to column oriented databases Luís Cipriani @lfcipriani (twitter, linkedin, github, ...) 22o. GURU (2012-02-25) - Sao Paulo/Brazil sexta-feira, 24 de fevereiro de 12

Upload: luis-cipriani

Post on 20-Jun-2015

12.036 views

Category:

Technology


3 download

DESCRIPTION

Big Data is getting more attention each day, followed by new storage paradigms. This presentation shows a fast intro to HBase, a column oriented database used by Facebook and other big players to store and extract knowledge of high volume of data.

TRANSCRIPT

Page 1: Hbase: Introduction to column oriented databases

HBaseIntroduction to column oriented databases

Luís Cipriani @lfcipriani (twitter, linkedin, github, ...)22o. GURU (2012-02-25) - Sao Paulo/Brazil

sexta-feira, 24 de fevereiro de 12

Page 2: Hbase: Introduction to column oriented databases

ME

sexta-feira, 24 de fevereiro de 12

Page 3: Hbase: Introduction to column oriented databases

intro

“A BigTable HBase is a sparse, distributed, persistent

multidimensional sorted map”

http://research.google.com/archive/bigtable.html

sexta-feira, 24 de fevereiro de 12

Page 4: Hbase: Introduction to column oriented databases

intro > data model{ <-- table // ... "aaaaa" : { <-- row "A" : { <-- column family "foo" : { <-- column (qualifier)

15: "y", <-- timestamp, value 4: "m" } "bar" : {...} }, "B" : { "" : {...} } }, "aaaab" : { "A" : { "foo" : {...}, "bar" : {...}, "joe" : {...} }, "B" : { "" : {...} } }, // ...}

sexta-feira, 24 de fevereiro de 12

Page 5: Hbase: Introduction to column oriented databases

intro > data model

(Table, RowKey, Family, Column, Timestamp) → Value

sexta-feira, 24 de fevereiro de 12

Page 6: Hbase: Introduction to column oriented databases

intro > hadoop stack

• hadoop HDFS (or not)• hadoop MapReduce• hadoop ZooKeeper• hadoop HBase• hadoop Hue, Whirr, etc...

sexta-feira, 24 de fevereiro de 12

Page 7: Hbase: Introduction to column oriented databases

architecture

sexta-feira, 24 de fevereiro de 12

Page 8: Hbase: Introduction to column oriented databases

key design > read/write model

• randon reads (get)

• sequential reads (scan)• partial key scans

• writes (put = update)

sexta-feira, 24 de fevereiro de 12

Page 9: Hbase: Introduction to column oriented databases

key design > storage model

http://ofps.oreilly.com/titles/9781449396107/advanced.html

sexta-feira, 24 de fevereiro de 12

Page 10: Hbase: Introduction to column oriented databases

key design > strategies

• tall-narrow vs flat-wide

• partial key scans

• pagination

• time series• salting• field swap• randomization

• secondary indexes

sexta-feira, 24 de fevereiro de 12

Page 11: Hbase: Introduction to column oriented databases

key design > example

sexta-feira, 24 de fevereiro de 12

Page 12: Hbase: Introduction to column oriented databases

development

• installation modes• standalone, pseudo-distributed, distributed

• JRuby console

• Access• java/jruby API (more features)• entrypoints REST, Thrift, Avro, Protobuffers• there several other libs

sexta-feira, 24 de fevereiro de 12

Page 13: Hbase: Introduction to column oriented databases

cons

• complex config and maintenance• hot regions• no secondary index built-in• no transactions built-in• complex schema design

sexta-feira, 24 de fevereiro de 12

Page 14: Hbase: Introduction to column oriented databases

pros

• distributed• scalable (auto-sharding)• built on Hadoop stack• handles Big Data• high performance for write and read• no SPOF• fault tolerant, no data loss• active community

sexta-feira, 24 de fevereiro de 12

Page 15: Hbase: Introduction to column oriented databases

Reformulação Box de Login Abril ID

?

http://engineering.abril.com.br/

http://abr.io/hbase-intro

https://pinboard.in/u:lfcipriani/t:hbase/

http://hbase.apache.org/

http://shop.oreilly.com/product/0636920014348.do

sexta-feira, 24 de fevereiro de 12