hbase crud java api

Post on 18-Jul-2015

183 Views

Category:

Software

11 Downloads

Preview:

Click to see full reader

TRANSCRIPT

HBase CRUDUse Java API for Create, Read, Update, Delete operations

Agenda

• Intro• Create• Insert• Update• Delete• Read – Table Scan• Read – Get Field• Conclusions

IntroA rowkey primarily represents each row uniquely in the HBase table, whereas otherkeys such as column family, timestamp, and so on are used to locate a piece of datain an HBase table. The HBase API provides the following methods to support theCRUD operations:• Put• Get• Delete• Scan• Increment

You could find source code for this presentation on github: https://github.com/EugeneYushin/HBase-CRUD

Create

Table creates in ‘Enabled’ state. Check table creation in Hue (Cloudera CDH 5.1.0) and hbase shell

Insert

Use HConnection.getTable() against HTablePool as last is deprecated in 0.94, 0.95/0.96, and removed in 0.98 .

Insert

All manipulations with table implements throughHTableInterface. HTable represents particular table inHbase.

The HTable class is not thread-safe as concurrentmodifications are not safe. Hence, a single instanceof HTable for each thread should be used in anyapplication. For multiple HTable instances with thesame configuration reference, the same underlyingHConnection instance can be used.

RowKey is main point to consider when configuringtable structure. Use compound RowKey with SHA1,MD5 hashing algorithms (with additional reversetimestamp part) as Hbase store data sorted.

Update

Data in Hbase is versioned, by default there’re last 3 values stored into column. Use HColumnDescriptor.setMaxVersions(n) method to overwrite this value.

Delete

Value for “user_name” qual changed to previous version.

Read – Table Scan

Table Scan...PaulRK Paul paul01@mail.com

Read – Get Field

Get particular Field...rowKey = MikeRK, user_name: MikerowKey = MikeRK, user_mail: mike@mail.com

Conclusions• HTable is expensive

Creating HTable instances also comes at a cost. Creating an HTable instance is a slow process as the creation of each HTable instance involves the scanning ofthe .META table to check whether the table actually exists, which makes the operation very costly. Hence, it is not recommended that you use a new HTableinstance for each request where the number of concurrent requests are very high

• Scan cashingA scan can be configured to retrieve a batch of rows in every RPC call it makes to HBase. This configuration can be done at a per-scanner level by using thesetCaching(int) API on the scan object. This configuration can also be set in the hbasesite.xml configuration file using the hbase.client.scanner.cachingproperty

• IncrementIncrement Column Value (ICV). It’s exposed as both the Increment command object like the others but also as a method on the HTableInterface. Thiscommand allows you to change an integral value stored in an HBase cell without reading it back first. The data manipulation happens in HBase, not in yourclient application, which makes it fast. It also avoids a possible race condition where some other client is interacting with the same cell.

• FilterA filter is a predicate that executes in HBase instead of on the client. When you specify a Filter in your Scan, HBase uses it to determine whether a recordshould be returned. This can avoid a lot of unnecessary data transfer. It also keeps the filtering on the server instead of placing that burden on the client. Thefilter applied is anything implementing the org.apache.hadoop.hbase.filter.Filter interface. HBase provides a number of filters, but it’s easy to implementyour own.

Thank you

ushin.evgenijhttps://www.linkedin.com/in/yushyn

top related