hbase client apis (for webapps?)

35
HBase Client API (for webapps?) Nick Dimiduk Seattle Scalability Meetup 2013-03-27 1

Upload: nick-dimiduk

Post on 09-May-2015

5.650 views

Category:

Documents


9 download

DESCRIPTION

This talk examines HBase client options available to application developers working with HBase. The focus is framed on, but not limited to, building webapps.

TRANSCRIPT

Page 1: HBase Client APIs (for webapps?)

HBase Client API(for webapps?)

Nick DimidukSeattle Scalability Meetup

2013-03-27

1

Page 2: HBase Client APIs (for webapps?)

2

Page 3: HBase Client APIs (for webapps?)

3

Page 4: HBase Client APIs (for webapps?)

What are my choices?switch (technology) {

case ‘ ’: ...

case ‘ ’: ...

case ‘ ’: ...}

4

Page 5: HBase Client APIs (for webapps?)

Apache HBase

5

Page 6: HBase Client APIs (for webapps?)

Java client Interfaces

• Configuration holds details where to find the cluster and tunable settings. Roughly equivalent to JDBC connection string.

• HConnection represents connections to to the cluster.

• HBaseAdmin handles DDL operations (create, list, drop, alter, &c.)

• HTablePool connection pool for table handles.

• HTable (HTableInterface) is a handle on a single HBase table. Send "commands" to the table (Put, Get, Scan, Delete, Increment)

6

Page 7: HBase Client APIs (for webapps?)

Java client Example

public static final byte[] TABLE_NAME = Bytes.toBytes("twits");public static final byte[] TWITS_FAM = Bytes.toBytes("twits");

public static final byte[] USER_COL = Bytes.toBytes("user");public static final byte[] TWIT_COL = Bytes.toBytes("twit");

private HTablePool pool = new HTablePool();

https://github.com/hbaseinaction/twitbase/blob/master/src/main/java/HBaseIA/TwitBase/hbase/TwitsDAO.java#L23-L30

7

Page 8: HBase Client APIs (for webapps?)

Java client Exampleprivate static class Twit {

private Twit(Result r) { this( r.getColumnLatest(TWITS_FAM, USER_COL).getValue(), Arrays.copyOfRange(r.getRow(), Md5Utils.MD5_LENGTH, Md5Utils.MD5_LENGTH + longLength), r.getColumnLatest(TWITS_FAM, TWIT_COL).getValue()); }

private Twit(byte[] user, byte[] dt, byte[] text) { this( Bytes.toString(user), new DateTime(-1 * Bytes.toLong(dt)), Bytes.toString(text)); }

https://github.com/hbaseinaction/twitbase/blob/master/src/main/java/HBaseIA/TwitBase/hbase/TwitsDAO.java#L129-L143

8

Page 9: HBase Client APIs (for webapps?)

Java client Example

private static Get mkGet(String user, DateTime dt) { Get g = new Get(mkRowKey(user, dt)); g.addColumn(TWITS_FAM, USER_COL); g.addColumn(TWITS_FAM, TWIT_COL); return g;}

https://github.com/hbaseinaction/twitbase/blob/master/src/main/java/HBaseIA/TwitBase/hbase/TwitsDAO.java#L60-L65

9

Page 10: HBase Client APIs (for webapps?)

Ruby, Python client Interface

10

Page 11: HBase Client APIs (for webapps?)

Ruby, Python client InterfaceJRuby, Jython

: '(

11

Page 12: HBase Client APIs (for webapps?)

Thrift client Interface

1. Generate bindings

2. Run a “Gateway” between clients and cluster

3. ... profit?write code!

12

Page 13: HBase Client APIs (for webapps?)

Sidebar: Architecture Recap

HBase Cluster

HBase Clients

13

Page 14: HBase Client APIs (for webapps?)

Thrift Architecture

HBase Cluster

Thrift Clients

ThriftGateway

14

Page 15: HBase Client APIs (for webapps?)

Thrift client Interface

• Thrift gateway exposes a client to RegionServers

• stateless :D

• ... except for scanners :'(

15

Page 16: HBase Client APIs (for webapps?)

Thrift client Example

transport = TSocket.TSocket(host, port)transport = TTransport.TBufferedTransport(transport)protocol = TBinaryProtocol.TBinaryProtocol(transport)client = Hbase.Client(protocol)transport.open()

https://github.com/hbaseinaction/twitbase.py/blob/master/TwitBase.py#L17-L21

16

Page 17: HBase Client APIs (for webapps?)

Thrift client Example

columns = ['info:user','info:name','info:email']scanner = client.scannerOpen('users', '', columns)row = client.scannerGet(scanner)while row: yield user_from_row(row[0]) row = scannerGet(scanner)client.scannerClose(scanner)

https://github.com/hbaseinaction/twitbase.py/blob/master/TwitBase.py#L33-L39

17

Page 18: HBase Client APIs (for webapps?)

Thrift client Example

def user_from_row(row): user = {} for col,cell in row.columns.items(): user[col[5:]] = cell.value return "<User: {user}, {name}, {email}>".format(**user)

https://github.com/hbaseinaction/twitbase.py/blob/master/TwitBase.py#L26-L30

18

Page 19: HBase Client APIs (for webapps?)

REST client Interface

1. Stand up a "REST Gateway" between your application and the cluster

2. HTTP verbs translate (roughly) into table commands

3. decent support for basic DDL, HTable operations

19

Page 20: HBase Client APIs (for webapps?)

REST Architecture

HBase Cluster

RESTGatewayREST Clients

20

Page 21: HBase Client APIs (for webapps?)

REST client Interface

• REST gateway exposes a client to RegionServers

• stateless :D

• ... except for scanners :'(

21

Page 22: HBase Client APIs (for webapps?)

REST client Example

$ curl -H "Accept: application/json" http://host:port/{ "table": [ { "name": "followers" }, { "name": "twits" }, { "name": "users" } ]}

22

Page 23: HBase Client APIs (for webapps?)

REST client Example$ curl -H ... http://host:port/table/row [/family:qualifier]{ "Row": [ { "key": "VGhlUmVhbE1U", "Cell": [ { "$": "c2FtdWVsQGNsZW1lbnMub3Jn", "column": "aW5mbzplbWFpbA==", "timestamp": 1338701491422 }, { "$": "TWFyayBUd2Fpbg==", "column": "aW5mbzpuYW1l", "timestamp": 1338701491422 }, ] } ] }

23

Page 24: HBase Client APIs (for webapps?)

REST client Example

<Rows> <Row key="VGhlUmVhbE1U"> <Cells> <Cell column="aW5mbzplbWFpbA==" timestamp="1338701491422"> c2FtdWVsQGNsZW1lbnMub3Jn </Cell> <Cell ...> ... </Cells> </Row></Rows>

24

Page 25: HBase Client APIs (for webapps?)

Beyond Apache

25

Page 26: HBase Client APIs (for webapps?)

asynchbase

• Asynchronous non-blocking interface.

• Inspired by Twisted Python.

• Partial implementation of HTableInterface.

• HBaseClient provides entry-point to data.

https://github.com/OpenTSDB/asynchbasehttp://tsunanet.net/~tsuna/asynchbase/api/org/hbase/async/HBaseClient.html

26

Page 27: HBase Client APIs (for webapps?)

asynchbase

output to => [next state] /input => [this state] \ => [error state] Exception

BooleanPut response

Interpret response

3

UpdateResultobject

UpdateFailedException

27

Page 28: HBase Client APIs (for webapps?)

asynchbase Examplefinal Scanner scanner = client.newScanner(TABLE_NAME);scanner.setFamily(INFO_FAM);scanner.setQualifier(PASSWORD_COL);

ArrayList<ArrayList<KeyValue>> rows = null;ArrayList<Deferred<Boolean>> workers = new ArrayList<Deferred<Boolean>>();while ((rows = scanner.nextRows(1).joinUninterruptibly()) != null) { for (ArrayList<KeyValue> row : rows) { KeyValue kv = row.get(0); byte[] expected = kv.value(); String userId = new String(kv.key()); PutRequest put = new PutRequest( TABLE_NAME, kv.key(), kv.family(), kv.qualifier(), mkNewPassword(expected)); Deferred<Boolean> d = client.compareAndSet(put, expected) .addCallback(new InterpretResponse(userId)) .addCallbacks(new ResultToMessage(), new FailureToMessage()) .addCallback(new SendMessage()); workers.add(d); }}

https://github.com/hbaseinaction/twitbase-async/blob/master/src/main/java/HBaseIA/TwitBase/AsyncUsersTool.java#L151-L173

28

Page 29: HBase Client APIs (for webapps?)

OthersFull-blown schema

managementReduce day-to-day

developer pain

Spring-DataHadoop

[Orderly]

Phoenix

Kiji.org

https://github.com/ndimiduk/orderlyhttp://www.springsource.org/spring-data/https://github.com/forcedotcom/phoenix

http://www.kiji.org/29

Page 30: HBase Client APIs (for webapps?)

Apache Futures

• Protobuf wire messages (0.96)

• C client (TBD, HBASE-1015)

• HBase Types (TBD, HBASE-8089)

30

Page 31: HBase Client APIs (for webapps?)

So, Webapps?

http://www.amazon.com/Back-Point-Rapiers/dp/B0000271GC

31

Page 32: HBase Client APIs (for webapps?)

Software Architecture

• Isolate DAO from app logic, separation of concerns, &c.

• Separate environment configs from code.

• Watch out for resource contention.

32

Page 33: HBase Client APIs (for webapps?)

Deployment Architecture

• Cache everywhere.

• Know your component layers.

33

Page 34: HBase Client APIs (for webapps?)

HBase Warts

• Know thy (HBase) version 0.{92,94,96} !

• long-running client bug (HBASE-4805).

• Gateway APIs only as up to date as the people before you require.

• REST API particularly unpleasant for “Web2.0” folk.

34

Page 35: HBase Client APIs (for webapps?)

Thanks!

Nick Dimiduk github.com/ndimiduk @xefyr n10k.com

M A N N I N G

Nick Dimiduk Amandeep Khurana

FOREWORD BY Michael Stack

hbaseinaction.com

35