hbase coprocessor introduction
DESCRIPTION
Introduction to HBase Coprocessor, and thinking of something.TRANSCRIPT
![Page 1: HBase Coprocessor Introduction](https://reader036.vdocuments.net/reader036/viewer/2022081821/54c637af4a7959e43f8b463b/html5/thumbnails/1.jpg)
HBase Coprocessor Intro.
Anty Rao, Schubert ZhangAug. 29 2012
![Page 2: HBase Coprocessor Introduction](https://reader036.vdocuments.net/reader036/viewer/2022081821/54c637af4a7959e43f8b463b/html5/thumbnails/2.jpg)
Motivation• Distributed and parallel computation over data stored within HBase/Bigtable.
• Architecture: {HBase + MapReduce} vs. {HBase with Coprocessor}– {Loosely coupled} vs. {Built in}– E.g., simple additive or aggregating operations like summing, counting, and the like – pushing the
computation down to the servers where it can operate on the data directly without communication overheads can give a dramatic performance improvement over HBase’s already good scanning performance.
• To be a framework for both flexible and generic extension, and of distributed computation directly within the HBase server processes.– Arbitrary code can run at each tablet in each HBase server.– Provides a very flexible model for building distributed services.– Automatic scaling, load balancing, request routing for applications.
![Page 3: HBase Coprocessor Introduction](https://reader036.vdocuments.net/reader036/viewer/2022081821/54c637af4a7959e43f8b463b/html5/thumbnails/3.jpg)
Motivation (cont.) • To be a Data-Driven distributed and parallel service platform.
– Distributed parallel computation framework.– Distributed application service platform.
• High-level call interface for clients– Calls are addressed to rows or ranges of rows and the coprocessor client library
resolves them to actual locations;– Calls across multiple rows are automatically split into multiple parallelized RPC.
• Origin– Inspired by Google’s Bigtable Coprocessors.– Jeff Dean gave a talk at LADIS’09
• http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf, page 66-67
![Page 4: HBase Coprocessor Introduction](https://reader036.vdocuments.net/reader036/viewer/2022081821/54c637af4a7959e43f8b463b/html5/thumbnails/4.jpg)
HBase vs. Google Bigtable
• It is a framework that provides a library and runtime environment for executing user code within the HBase region server and master processes.
• Google coprocessors in contrast run co-located with the tablet server but outside of its address space.– https://issues.apache.org/jira/browse/HBASE-4047
![Page 5: HBase Coprocessor Introduction](https://reader036.vdocuments.net/reader036/viewer/2022081821/54c637af4a7959e43f8b463b/html5/thumbnails/5.jpg)
Google’s Bigtable Coprocessors
![Page 6: HBase Coprocessor Introduction](https://reader036.vdocuments.net/reader036/viewer/2022081821/54c637af4a7959e43f8b463b/html5/thumbnails/6.jpg)
Overview of HBase Coprocessor• Tow scopes
– System : loaded globally on all tables and regions.– Per-table: loaded on all regions for a table.
• Two types– Observers
• Like triggers in conventional databases. • The idea behind observers is that we can insert user code by overriding upcall methods provided by the
coprocessor framework. The callback functions are executed from core HBase code when certain events occur.
– Endpoints• Dynamic PRC endpoints that resemble stored procedures.• One can invoke an endpoint at any time from the client. The endpoint implementation will then be executed
remotely at the target region or regions, and results from those executions will be returned to the client.
• Difference of the tow types– Only endpoints return result to client.
![Page 7: HBase Coprocessor Introduction](https://reader036.vdocuments.net/reader036/viewer/2022081821/54c637af4a7959e43f8b463b/html5/thumbnails/7.jpg)
Observers• Currently, three observers interfaces provided
– RegionObserver• Provides hooks for data manipulation events, Get, Put, Delete, Scan, and so on. There is an instance
of a RegionObserver coprocessor for every table region and the scope of the observations they can make is constrained to that region
– WALObserver• Provides hooks for write-ahead log (WAL) related operations. This is a way to observe or intercept
WAL writing and reconstruction events. A WALObserver runs in the context of WAL processing. There is one such context per region server.
– MasterObserver• Provides hooks for DDL-type operation, i.e., create, delete, modify table, etc. The MasterObserver
runs within the context of the HBase master.
• Multiple Observers are chained to execute sequentially by order of assigned priorities.
![Page 8: HBase Coprocessor Introduction](https://reader036.vdocuments.net/reader036/viewer/2022081821/54c637af4a7959e43f8b463b/html5/thumbnails/8.jpg)
Observers: Example
![Page 9: HBase Coprocessor Introduction](https://reader036.vdocuments.net/reader036/viewer/2022081821/54c637af4a7959e43f8b463b/html5/thumbnails/9.jpg)
Observers: Example Codepackage org.apache.hadoop.hbase.coprocessor;
import java.util.List;import org.apache.hadoop.hbase.KeyValue;import org.apache.hadoop.hbase.client.Get;
// Sample access-control coprocessor. It utilizes RegionObserver// and intercept preXXX() method to check user privilege for the given table// and column family.public class AccessControlCoprocessor extends BaseRegionObserver { @Override public void preGet(final ObserverContext<RegionCoprocessorEnvironment> c,final Get get, final List<KeyValue> result) throws IOException throws IOException {
// check permissions.. if (!permissionGranted()) { throw new AccessDeniedException("User is not allowed to access."); } }
// override prePut(), preDelete(), etc.}
![Page 10: HBase Coprocessor Introduction](https://reader036.vdocuments.net/reader036/viewer/2022081821/54c637af4a7959e43f8b463b/html5/thumbnails/10.jpg)
Endpoint
• Resembling stored procedures• Invoke an endpoint at any time from the client.• The endpoint implementation will then be executed
remotely at the target region or regions.• Result from those executions will be returned to the client.
• Code implementation– Endpoint is an interface for dynamic RPC extension.
![Page 11: HBase Coprocessor Introduction](https://reader036.vdocuments.net/reader036/viewer/2022081821/54c637af4a7959e43f8b463b/html5/thumbnails/11.jpg)
Endpoints:How to implement a custom Coprocessor?
• Have a new protocol interface which extends CoprocessorProtocol.• Implement the Endpoint interface and the new protocol interface . The
implementation will be loaded into and executed from the region context.– Extend the abstract class BaseEndpointCoprocessor. This convenience class hide
some internal details that the implementer need not necessary be concerned about, such as coprocessor class loading.
• On the client side, the Endpoints can be invoked by two new HBase client APIs:– Executing against a single region:
• HTableInterface.coprocessorProxy(Class<T> protocol, byte[] row)
– Executing against a range of regions:• HTableInterface.coprocessorExec(Class<T> protocol, byte[] startKey, byte[] endKey,
Batch.Call<T,R> callable)
![Page 12: HBase Coprocessor Introduction](https://reader036.vdocuments.net/reader036/viewer/2022081821/54c637af4a7959e43f8b463b/html5/thumbnails/12.jpg)
Endpoints: Example
• Note that the HBase client has the responsibility for dispatching parallel endpoint invocations to the target regions, and for collecting the returned results to present to the application code.
• Like a lightweight MapReduce job: The “map” is the endpoint execution performed in the region server on every target region, and the “reduce” is the final aggregation at the client.
• The distributed systems programming details behind a clean API.
new Batch.Call (on all regions)
Batch.Call<ColumnAggregationProtocol, Long>(){ public Long call(ColumnAggregationProtocol instance) throws IOException { return instance.sum(FAMILY, QUALIFIER); } }
HTable
Map<byte[], Long> sumResults = table.coprocessorExec(ColumnAggregationProtocol.class, startRow, endRow)
Batch Results
Map<byte[], Long> sumResults
tableA, , 12345678
tableA, bbbb, 12345678
tableA, cccc, 12345678
tableA, dddd, 12345678
Endpoint
ColumnAggregationProtocol
Endpoint
ColumnAggregationProtocol
Endpoint
ColumnAggregationProtocol
Endpoint
ColumnAggregationProtocol
Client CodeClient CodeRegion Server 1
Region Server 2
![Page 13: HBase Coprocessor Introduction](https://reader036.vdocuments.net/reader036/viewer/2022081821/54c637af4a7959e43f8b463b/html5/thumbnails/13.jpg)
Step-1: Define protocol interface/** * A sample protocol for performing aggregation at regions. */public interface ColumnAggregationProtocol extends CoprocessorProtocol{ /** * Perform aggregation for a given column at the region. The aggregation * will include all the rows inside the region. It can be extended to allow * passing start and end rows for a fine-grained aggregation. * * @param family * family * @param qualifier * qualifier * @return Aggregation of the column. * @throws exception. */ public long sum(byte[] family, byte[] qualifier) throws IOException;}
![Page 14: HBase Coprocessor Introduction](https://reader036.vdocuments.net/reader036/viewer/2022081821/54c637af4a7959e43f8b463b/html5/thumbnails/14.jpg)
Step-2: Implement endpoint and the interface
public class ColumnAggregationEndpoint extends BaseEndpointCoprocessor implements ColumnAggregationProtocol{ @Override public long sum(byte[] family, byte[] qualifier) throws IOException { // aggregate at each region Scan scan = new Scan(); scan.addColumn(family, qualifier); long sumResult = 0;
InternalScanner scanner = ((RegionCoprocessorEnvironment) getEnvironment()).getRegion() .getScanner(scan);
try { List<KeyValue> curVals = new ArrayList<KeyValue>(); boolean done = false; do { curVals.clear(); done = scanner.next(curVals); KeyValue kv = curVals.get(0); sumResult += Bytes.toLong(kv.getBuffer(), kv.getValueOffset()); } while (done); } finally { scanner.close(); } return sumResult; }}
![Page 15: HBase Coprocessor Introduction](https://reader036.vdocuments.net/reader036/viewer/2022081821/54c637af4a7959e43f8b463b/html5/thumbnails/15.jpg)
Step-3 Deployment
• Two chooses– Load from configuration (hbase-site.xml, restart HBase)– Load from table attribute (disable and enable table)• From shell
![Page 16: HBase Coprocessor Introduction](https://reader036.vdocuments.net/reader036/viewer/2022081821/54c637af4a7959e43f8b463b/html5/thumbnails/16.jpg)
Step-4: Invoking
• On client side, invoking the endpoint
HTable table = new HTable(util.getConfiguration(), TEST_TABLE); Map<byte[], Long> results;
// scan: for all regions results = table.coprocessorExec(ColumnAggregationProtocol.class, ROWS[rowSeperator1 - 1], ROWS[rowSeperator2 + 1], new Batch.Call<ColumnAggregationProtocol, Long>() { public Long call(ColumnAggregationProtocol instance) throws IOException { return instance .sum(TEST_FAMILY, TEST_QUALIFIER); } }); long sumResult = 0; long expectedResult = 0; for (Map.Entry<byte[], Long> e : results.entrySet()) { sumResult += e.getValue(); }
![Page 17: HBase Coprocessor Introduction](https://reader036.vdocuments.net/reader036/viewer/2022081821/54c637af4a7959e43f8b463b/html5/thumbnails/17.jpg)
Server side execution
• Region Server provide environment to execute custom coprocessor in region context.
• Exec– Custom protocol
name– Method name– Method parameters
public interface HRegionInterface extends VersionedProtocol, Stoppable,Abortable{… ExecResult execCoprocessor(byte[] regionName, Exec call) throws IOException;…}
![Page 18: HBase Coprocessor Introduction](https://reader036.vdocuments.net/reader036/viewer/2022081821/54c637af4a7959e43f8b463b/html5/thumbnails/18.jpg)
Coprocessor Manangement
• Build your own Coprocessor– Write server-side coprocessor code like above example, compiled and packaged
as a jar file.• CoprocessorProtocol (e.g. ColumnAggregationProtocol)• Endpoint implementation (e.g. ColumnAggregationEndpoint)
• Coprocessor Deployment– Load from Configuration (hbase-site.xml, restart HBase)
• The jar file must be in classpath of HBase servers.• Global for all regions of all tables (system coprocessors).
– Load from table attribute (from shell)• per table basis• The jar file should be put into HDFS or HBase servers’ classpath firstly, and set in the table
attribute.
![Page 19: HBase Coprocessor Introduction](https://reader036.vdocuments.net/reader036/viewer/2022081821/54c637af4a7959e43f8b463b/html5/thumbnails/19.jpg)
Future Work based on Coprocessors• Parallel Computation Framework (our first goal!)
– Higher level of abstraction– E.g. MapReduce APIs similar.– Integration and implementation Dremel and/or dremel
computation model into HBase.
• Distributed application service platform (our second goal !?)– Higher level of abstraction– Data-driven distributed application architecture.– Avoid building similar distributed architecture repeatedly.
• HBase system enhancements– HBase internal measurements and statistics for administration.
• Support application like percolator– Observes and notifications.
• Others– External Coprocessor Host
(HBASE-4047)• separate processes
– Code Weaving (HBASE-2058)• protect against malicious actions
or faults accidentally introduced by a coprocessor.
– …
![Page 20: HBase Coprocessor Introduction](https://reader036.vdocuments.net/reader036/viewer/2022081821/54c637af4a7959e43f8b463b/html5/thumbnails/20.jpg)
Reference
• https://blogs.apache.org/hbase/entry/coprocessor_introduction