guide to mapr-db - industry's next generation data ...€¦ · mapr's implementation of...

MapR Administrator Training

April 2012

Version 3.1.1

Guide to MapR-DB

1. M7 - Native Storage for MapR Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.1 Setting Up MapR-FS to Use Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.1.1 Mapping Table Namespace Between Apache HBase Tables and MapR Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.1.2 Working With MapR Tables and Column Families . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.1.2.1 Bulk Loading and MapR-DB Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.1.2.2 Schema Design for MapR Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141.1.2.3 Supported Regular Expressions in MapR Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

1.2 MapR Table Support for Apache HBase API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181.3 Using AsyncHBase with MapR Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

1.3.1 Using OpenTSDB with AsyncHBase and MapR Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241.4 Protecting Table Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251.5 Displaying Table Region Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271.6 Integrating Hive and MapR Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281.7 Migrating Between Apache HBase Tables and MapR Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331.8 Language Support for MapR Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

M7 - Native Storage for MapR TablesStarting in version 3.0, the MapR distribution for Hadoop integrates native tables stored directly in MapR-FS.

This page contains the following topics:

About MapR TablesMapR-FS Handles Structured and Unstructured DataBenefits of Integrated Tables in MapR-FSThe MapR Implementation of HBaseEffects of Decoupling API and ArchitectureThe HBase Data Model

Using MapR and Apache HBase Tables TogetherCurrent LimitationsAdministering MapR TablesRelated Topics

About MapR TablesIn the 3.0 release of the MapR distribution for Hadoop, MapR-FS enables you to create and manipulate tables in many of the same ways that youcreate and manipulate files in a standard UNIX file system. This document discusses how to set up your MapR installation to use MapR tables.For users experienced with standard Apache HBase, this document describes the differences in capabilities and behavior between MapR tablesand Apache HBase tables.

MapR-FS Handles Structured and Unstructured DataThe 3.0 release of the MapR distribution for Hadoop features a unified architecture for files and tables, providing distributed data replication forstructured and unstructured data. Tables enable you to manage data, as opposed to the unstructured data management provided bystructuredfiles. The structure for structured data management is defined by a , a set of rules that defines the relationships in the structure.data model

By design, the data model for tables in MapR focuses on columns, similar to the open-source standard Apache HBase system. Like ApacheHBase, MapR tables store data structured as a nested sequence of key/value pairs, where the value in one pair serves as the key for anotherpair. With a properly licensed MapR installation, you can use MapR tables exclusively or work in a mixedApache HBase works with MapR tables.environment with Apache HBase tables.

MapR tables are implemented directly within MapR-FS, yielding a familiar, open-standards API that provides a high-performance datastore fortables. MapR-FS is written in C and optimized for performance. As a result, MapR-FS runs significantly faster than JVM-based Apache HBase.The diagram below compares the application stacks for different HBase implementations.

Benefits of Integrated Tables in MapR-FSThe MapR cluster architecture provides the following benefits for table storage, providing an enterprise-grade HBase environment.

MapR clusters with HA features recover instantly from node failures.MapR provides a unified namespace for tables and files, allowing users to group tables in directories by user, project, or any other usefulgrouping.Tables are stored in volumes on the cluster alongside unstructured files. Storage policy settings for apply to tables as well asvolumesfiles.Volume mirrors and snapshots provide flexible, reliable read-only access.Table storage and MapReduce jobs can co-exist on the same nodes without degrading cluster performance.The use of MapR tables imposes no administrative overhead beyond administration of the MapR cluster.Node upgrades and other administrative tasks do not cause downtime for table storage.

The MapR Implementation of HBaseMapR's implementation supports the core HBase API. Programmers who are used to writing code for the HBase API will have immediate, intuitiveaccess to MapR tables. MapR delivers faithfully on the original vision for Google's BigTable paper, using the open-standard HBase API.

MapR's implementation of the HBase API provides enterprise-grade high availability (HA), data protection, and disaster recovery features fortables on a distributed Hadoop cluster. MapR tables can be used as the underlying key-value store for Hive, or any other application requiring ahigh-performance, high-availability key-value datastore. Because MapR uses the open-standard HBase API, many legacy HBase applications can

.continue to run on MapR without modification

MapR has extended to work with MapR tables in addition to Apache HBase tables. Similar to development for Apache HBase, thehbase shellsimplest way to create tables and column families in MapR-FS, and put and get data from them, is to use . MapR tables can behbase shellcreated from the (MCS) user interface or from the Linux , without the need to coordinate with a databaseMapR Control System command lineadministrator. You can treat a MapR table just as you would a file, specifying a path to a location in a directory, and the table appears in the samenamespace as your regular files. You can also create and manage for your table from the MCS or directly from the command line.column families

During or other specific scenarios where you need to refer to a MapR table of the same name as an Apache HBase table in thedata migrationsame cluster, you can to enable that operation.map the table namespace

To summarize:

The MapR table API works with the core HBase API.MapR tables implement the HBase feature set.You can use MapR tables as the datastore for Hive applications.

Effects of Decoupling API and ArchitectureThe following features of MapR tables result from decoupling the HBase API from the Apache HBase architecture:

MapR's High Availability (HA) cluster architecture eliminates the RegionServer component of traditional Apache HBase architecture,which was a single point of failure and bottleneck for scalability. In MapR-FS, MapR tables are HA at all levels, similar to other serviceson a MapR cluster.MapR tables can have up to 64 column families, with no limit on number of columns.MapR-FS manages how data is laid out on disk without large variations in user-visible latency. MapR table regions are split as neededand deleted data is purged.Crash recovery is significantly faster than Apache HBase.

The HBase Data ModelApache HBase stores structured data as a nested series of maps. Each map consists of a set of key-value pairs, where the value can be the keyin another map. Keys are kept in strict lexicographical order: 1, 10, and 113 come before 2, 20, and 213.

In descending order of granularity, the elements of an HBase entry are:

Key: Keys define the rows in an HBase table.Column family: A column family is a key associated with a set of columns. Specify this association according to your individual use case,creating sets of columns. A column family can contain an arbitrary number of columns. MapR tables support up to 64 column families.Column: Columns are keys that are associated with a series of timestamps that define when the value in that column was updated.Timestamp: The timestamp in a column specifies a particular data write to that column.

The Apache HBase API exposes many low-level administrative functions that can be tuned for performance or reliability. The reliabilityand functionality of MapR tables renders these low-level functions moot, and these low-level calls are not supported for MapR tables. S

for detailed information.ee MapR Table Support for Apache HBase API

http://doc.mapr.com/display/MapR/Managing+Data+with+Volumes

http://doc.mapr.com/display/MapR/MapR+Control+System

http://doc.mapr.com/display/MapR/table

http://doc.mapr.com/display/MapR/table+cf

Value: The data written to that column at the specific timestamp.

This structure results in versioned values that you can access flexibly and quickly. Because Apache HBase and MapR tables are , any ofsparsethe column values for a given key can be null.

Example HBase Table

This example uses JSON notation for representational clarity. In this example, timestamps are arbitrarily assigned.Expand this section to see the JSON code sample.

{ "arbitraryFirstKey" : { "firstColumnFamily" : { "firstColumn" : { 10 : "valueFive", 7 : "valueThree", 4 : "valueOne", } "secondColumn" : { 16 : "valueEight", 1 : "valueSeven", } } "secondColumnFamily" : { "firstColumn" : { 37 : "valueFive", 23 : "valueThree", 11 : "valueSeven", 4 : "valueOne", } "secondColumn" : { 15 : "valueEight", } } } "arbitrarySecondKey" : { "firstColumnFamily" : { "firstColumn" : { 10 : "valueFive", 4 : "valueOne", } "secondColumn" : { 16 : "valueEight", 7 : "valueThree", 1 : "valueSeven", } } "secondColumnFamily" : { "firstColumn" : { 23 : "valueThree", 11 : "valueSeven", } } }}

HBase queries return the most recent timestamp by default. A query for the value in "arbitrarySecondKey"/"secondColumnFamily:firstColumn"returns . Specifying a timestamp with a query for "arbitrarySecondKey"/"secondColumnFamily:firstColumn"/11 returns .valueThree valueSeven

Using MapR and Apache HBase Tables TogetherMapR table storage is independent from Apache HBase table storage, enabling a single MapR cluster to run both systems. Users typically runboth systems concurrently, particularly during the migration phase. Alternately, you can leave Apache HBase running for existing applications,and use MapR tables for new applications. You can set up for your cluster to run both MapR tables and Apache HBasenamespace mappingstables concurrently, during or on an ongoing basis.migration

Current LimitationsCustom HBase filters are not supported.User permissions for column families are not supported. User permissions for tables and columns are supported.HBase authentication is not supported.HBase replication is handled with .Mirror VolumesBulk loads using the HFiles workaround are not supported and not necessary.HBase coprocessors are not supported.Filters use a different regular expression library from . See java.util.regex.Pattern Supported Regular Expressions in MapR

for a complete list of supported regular expressions.Tables

Administering MapR TablesThe and the provide a compact set of features for . In a traditionalMapR Control System command-line interface adding and managing tablesHBase environment, cluster administrators are typically involved in provisioning tables and column families, because of limitations on the numberof tables and column families that Apache HBase can support. MapR supports a virtually unlimited number of tables with up to 64 column families,reducing administrative overhead.

HBase programmers can use the API function calls to create as many tables and column families as needed for the particular application.Programmers can also use tables to store intermediate data in a multi-stage MapReduce program, then delete the tables without assistance froman administrator. See for more information.Working With MapR Tables and Column Families

Related TopicsSetting Up MapR-FS to Use TablesWorking With MapR Tables and Column FamiliesMapping Table Namespace Between Apache HBase Tables and MapR TablesProtecting Table DataMigrating Between Apache HBase Tables and MapR Tables

Setting Up MapR-FS to Use Tables

This page describes how to begin using tables natively with MapR-FS. This page contains the following topics:

InstallationEnabling Access to MapR Tables via HBase APIs, hbase shell, and MapReduce JobsMapR Tables and Apache HBase Tables on the Same Cluster

Set Up User Directories for MapR TablesConfiguring Maximum Row Sizes for MapR Tables

Maximum Row Sizes for HBase APIsTroubleshooting RPC Errors Related to Row Size

Related Topics

InstallationAs of version 3.0 of the MapR distribution, MapR-FS provides storage for structured table data. No additional installation steps are required toinstall table capabilities. However, you must after you've completed the to enable table features.apply an appropriate license installation process

You can also set up a to connect to your MapR cluster and access table.client-only node

Before using MapR Tables, verify that the MapR File System has at least 4GB of memory assigned. Edit the value of the service.co property in the file to at least 4GB.mmand.mfs.heapsize.min warden.conf

http://doc.mapr.com/display/MapR/Mirror+Volumes


http://doc.mapr.com/display/MapR/API+Reference


http://doc.mapr.com/display/MapR/Bringing+Up+the+Cluster#BringingUptheCluster-InstallingtheClusterLicense

http://doc.mapr.com/display/MapR/Advanced+Installation+Topics

http://doc.mapr.com/display/MapR/Setting+Up+the+Client#SettingUptheClient-client

http://doc.mapr.com/display/MapR/warden.conf

Enabling Access to MapR Tables via HBase APIs, , and MapReduce Jobshbase shell

You can use the HBase API and the command to access your MapR tables. MapR has extended the HBase component to handlehbase shellaccess to both MapR tables and Apache HBase tables. MapR tables do not support low-level HBase API calls that are used to manipulate thestate of an Apache HBase cluster. See the page for a full list of supported HBase API and shellMapR Table Support for Apache HBase APIcommands.

To enable the HBase API and access, install the package on every node in the cluster. The HBase component ofhbase shell mapr-hbasethe MapR distribution for Hadoop is typically installed under . /opt/mapr/hbase To ensure that your existing HBase applications and workflowwork properly, install the mapr-hbase package that provides the same version number of HBase as your existing Apache HBase.

See for information about MapR installation procedures, including the proper repositories.Installing MapR Software setting up

MapR Tables and Apache HBase Tables on the Same Cluster

Apache HBase can run on MapR's distribution of Hadoop, and users can store table data in both Apache HBase tables as well as MapR tablesconcurrently. Apache HBase and MapR store table data separately. However, the same mechanisms (HBase APIs and ) are usedhbase shellto access data in both systems. On clusters that run Apache HBase on top of MapR, you can set up a to specify whether anamespace mappinggiven table identifier maps to a MapR table or an Apache HBase table.

Set Up User Directories for MapR TablesBecause MapR tables, like files, are created by users, MapR tracks table activity in a user's home directory on the cluster. Create a homedirectory at on your cluster for each user that will access MapR tables. After the cluster on NFS, create these/user/<username> mountingdirectories with the standard Linux command in the cluster's directory structure.mkdir

When a user does not have a corresponding directory on the cluster, querying MapR for a list of tables that belong to that userfoo /user/foogenerates an error reporting the missing directory.

Configuring Maximum Row Sizes for MapR TablesMapR tables support rows up to 2 GB in size. Rows in excess of 100MB may show decreased performance. The default maximum row size atinstallation is 16MB. You can configure this maximum by changing the value of the mfs.db.max.rowsize.kb value with the maprcli config

command, as in the following command:save

maprcli config save -values {"mfs.db.max.rowsize.kb":<valu }e in KB>

To view the current setting, use the following command:

maprcli -json | grep mfs.db.max.rowsconfig load ize.kb

Maximum Row Sizes for HBase APIs

The following table lists the maximum row sizes supported by specific HBase APIs:

API Maximum Size Comment

put() 2GB

get() 2GB

scan() 2GB

checkAndPut() 16MB The maximum is due to a limitation in protobuf.

The version of HBase provided by MapR has been modified to work with MapR tables in addition to Apache HBase. Do not downloadand install stock Apache HBase on a MapR cluster that uses MapR tables.

If you use fat JARs to deploy your application as a single JAR including all dependencies, be aware that the fat JAR may containversions of HBase that override the installed MapR versions, leading to problems. Check your fat JARs for the presence of stock HBaseto prevent this problem.

http://doc.mapr.com/display/MapR/Installing+MapR+Software

http://doc.mapr.com/display/MapR/Preparing+Packages+and+Repositories

http://doc.mapr.com/display/MapR/Accessing+Data+with+NFS

http://doc.mapr.com/display/MapR/config+save

http://doc.mapr.com/display/MapR/config+save

http://doc.mapr.com/display/MapR/config+load

append() 16MB The maximum is due to a limitation in protobuf.

increment() 16MB The maximum is due to a limitation in protobuf.

Troubleshooting RPC Errors Related to Row Size

Sending data in excess of , which is specified by the value of the parameter, results in an maxdbrowsize mfs.db.max.rowsize.kb E error. MapR-FS logs an error message with the keyword, the current row size, and the maximum supported row size.2BIG largerow

Requesting data in excess of , which is specified by the value of the parameter, results in anmaxdbrowsize mfs.db.max.rowsize.kb error. MapR-FS logs an error message with the keyword, the current row size, and the maximum supported row size.EFBIG largerow

Sending or requesting data in excess of the RPC payload size (2GB) or protobuf-limited size (16MB) while the value of imaxdbrowsizes larger than 2GB results in an error. The client returns information on the current row size and the maximum supported row size.E2BIGSending or requesting data in excess of when the value of is larger than 16MB logs an INFO messagemaxdbrowsize maxdbrowsizeon the server with the keyword.largerowWhen the row size for a spill caused by multiple separate insert operations for the same key exceeds the value of maxdbrowsize,MapR-FS logs a non-fatal error. The insert operations proceed.

Related TopicsMapping Table Namespace Between Apache HBase Tables and MapR TablesProtecting Table Data

Mapping Table Namespace Between Apache HBase Tables and MapR TablesMapR's implementation of the HBase API differentiates between Apache HBase tables and MapR tables, based on the table name. In certaincases, such as migrating code from Apache HBase tables to MapR tables, users need to force the API to access a MapR table, even though thetable name could map to an Apache HBase table. The property allows you to map Apache HBase tablehbase.table.namespace.mappingsnames to MapR tables. This property is typically set in the configuration file

./opt/mapr/hadoop/hadoop-<version>/conf/core-site.xml

In general, if a table name includes a slash ( ), the name is assumed to be a path to a MapR table, because slash is not a valid character for/Apache HBase table names. In the case of "flat" table names without a slash, namespace conflict is possible, and you might need to use tablemappings.

Table Mapping Naming Conventions

A table mapping takes the form , where is the table name to redirect and is the modification made to the name. The value in :name map name map can be a literal string or contain the wildcard. When mapping a name with a wild card, the mapping is treated as a directory. Requests toname *

tables with names that match the wild card are sent to the directory in the mapping.

When mapping a name that is a literal string, you can choose from two different behaviors:

End the mapping with a slash to indicate that this mapping is to a directory. For example, the mapping sendsmytable1:/user/aaa/requests for table to the full path .mytable1 /user/aaa/mytable1End the mapping without a slash, which creates an alias and treats the mapping as a full path. For example, the mapping mytable1:/u

sends requests for table to the full path .ser/aaa mytable1 /user/aaa

Mappings and Table Listing Behaviors

When you use the command without specifying a directory, the command's behavior depends on two factors:list

Whether a table mapping existsWhether Apache HBase is installed and running

Here are three different scenarios and the resulting command behavior for each.list

There is a table mapping for *, as in .*:/tablesIn this case, the command lists the tables in the mapped directory.listThere is no mapping for *, and Apache HBase is installed and running.In this case, the command lists the HBase tables.listThere is no mapping for *, and Apache HBase is not installed or is not running.In this case, the shell will try to connect to an HBase cluster, but will not be able to. After a few seconds, it will give up and fall back tolisting the M7 tables in the user's home directory.

Table Mapping Examples

Example 1: Map all HBase tables to MapR tables in a directory

http://doc.mapr.com/display/MapR/core-site.xml

In this example, any flat table name is treated as a MapR table in the directory .foo /tables_dir/foo

<property> <name>hbase.table.namespace.mappings</name> <value>*:/tables_dir</value></property>

Example 2: Map specific Apache HBase tables to specific MapR tables

In this example, the Apache HBase table name is treated as a MapR table at . The Apache Hbase table namemytable1 /user/aaa/mytable1 is treated as a MapR table at . All other Apache HBase table names are treated as stock Apache HBasemytable2 /user/bbb/mytable2

tables.

<property> <name>hbase.table.namespace.mappings</name> <value>mytable1:/user/aaa/,mytable2:/user/bbb/</value></property>

Example 3: Combination of specific table names and wildcards

Mappings are evaluated in order. In this example, the flat table name is treated as a MapR table at . The flatmytable1 /user/aaa/mytable1table name is treated as a MapR table at . Any other flat table name is treated as a MapR table at mytable2 /user/bbb/mytable2 foo /tabl

.es_dir/foo

<property> <name>hbase.table.namespace.mappings</name> <value>mytable1:/user/aaa/,mytable2:/user/bbb/,*:/tables_dir</value></property>

Working With MapR Tables and Column FamiliesAbout MapR TablesFilesystem Operations

Setting PermissionsRead and WriteMoveRemoveCopy and Recursive/Directory Copy

Developing For MapR Tables

About MapR Tables

The MapR Data Platform stores tables in the same namespace as files. You can move, delete, and set attributes for a table similarly to a file. Allfilesystem operations remain accessible with the command.hadoop fs

You can create MapR tables using the (MCS) and the interface in addition to the normal HBase shell or HBaseMapR Control System maprcliAPI methods.

When creating a MapR table, specify a location in the MDP directory structure in addition to the name of the table. A user can create a MapRtable anywhere on the cluster that the user has write access.

Volume properties, such as replication factor or rack topology, that apply to the specified location also apply to tables stored at that location. Youcan move a table with the Linux command or the command.mv hadoop fs -mv

Administrators may choose to pre-create tables for a project in order to enforce a designated naming convention, or to store tables in a desiredlocation in the cluster.

Because all data stored in a column family is compressed together, encapsulating similar kinds of data within a column family can

http://doc.mapr.com/display/MapR/hadoop+fs



The number of tables that can be stored on a MapR cluster is constrained only by the number of open file handles and storage space availability.Each table can have up to 64 column families.

You can add, edit and delete column families in a MapR table with the (MCS) and the interface. You can alsoMapR Control System maprcliadd column families to MapR tables with the HBase shell or API.

When you use Direct Access NFS or the command to access a MapR cluster, tables and files are listed together. Because thehadoop fs -lsclient's Linux commands are not table-aware, other Linux file manipulation commands, notably file read and write commands, are not available forMapR tables.

Some Apache HBase table operations are not applicable or required for MapR tables, notably manual compactions, table enables, and tabledisables. HBase API calls that perform such operations on a MapR table result in the modification being silently ignored. When appropriate, themodification request is cached in the client and returned by API calls to enable legacy HBase applications to run successfully.

In addition, the command displays recently-accessed MapR tables, rather than listing tables across the entire filemaprcli table listrecentsystem.

See for a complete list of supported operations.MapR Table Support for Apache HBase API

Filesystem Operations

This section describes the operations that you can perform on MapR tables through a Linux command line when you access the cluster throughNFS or with the commands.hadoop fs

Setting Permissions

MapR tables do not support setting user permissions through the UNIX command or the analogue. Instead, starting inchmod hadoop fs -cmodversion 3.1 of the MapR distribution for Hadoop, MapR table access is controlled with (ACEs).Access Control Expressions

Read and Write

You cannot perform read or write operations to a MapR table from a Linux filesystem context. Among other things, you cannot use the commcatand to insert text to a table or search through a table with the command. The MapR software returns an error when an application attemptsgrepto read or write to a MapR table.

Move

You can move a MapR table within a volume with the command over NFS or with the command. These moves are subjectmv hadoop fs -mvto the standard . Moves across volumes are not currently supported.permissions restrictions

Remove

You can remove a table with the command over NFS or with the command. These commands remove the table from therm hadoop fs -rmnamespace and asynchronously reclaims the disk space. You can remove a directory that includes both files and tables with the or rm -r hadoo

commands.p fs -rmr

Copy and Recursive/Directory Copy

Table copying at the filesystem level is not supported in this release. See forMigrating Between Apache HBase Tables and MapR Tablesinformation on copying tables using the HBase shell.

Example: Creating a MapR Table

With the HBase shell

This example creates a table called in directory with a column family called , using system defaults. In thisdevelopment /user/foo stageexample, we first start the HBase shell from the command line with , and then use the command to create the table.hbase shell create

improve compression.


http://doc.mapr.com/display/MapR/table+cf

http://doc.mapr.com/display/MapR/Enabling+Table+Authorization+with+Access+Control+Expressions

1. 2. 3. 4.

$ hbase shellHBase Shell; enter 'help<RETURN>' for list of supported commands.Type "exit<RETURN>" to leave the HBase ShellVersion 0.94.1-SNAPSHOT, rUnknown, Mon Dec 17 09:23:31 PST 2012

hbase(main):001:0>hbase(main):001:0> create '/user/foo/development', 'stage'

With the MapR Control System

In the MCS pane under the group, click . The tab appears in the main window.Navigation MapR Data Platform Tables TablesClick the button.New TableType a complete path for the new table.Click . The MCS displays a tab for the new table.OK

The screen-capture below demonstrates the creation of a table in location .table01 /user/analysis/tables/

With the MapR CLI

Use the command at a command line. For details, type at a command line.maprcli table create maprcli table create -helpThe following example demonstrates creation of a table in cluster location . The cluster table02 /user/analysis/tables/ my.cluster

is mounted at ..com /mnt/mapr/

$ maprcli table create -path /user/analysis/tables/table02$ ls -l /mnt/mapr/my.cluster.com/user/analysis/tableslrwxr-xr-x 1 mapr mapr 2 Oct 24 16:14 table01 -> mapr::table::2056.62.17034lrwxr-xr-x 1 mapr mapr 2 Oct 24 16:13 table02 -> mapr::table::2056.56.17022$ maprcli table listrecentpath/user/analytics/tables/table01/user/analytics/tables/table02

Example: Adding a column family

With the HBase shell

This example adds a column family called to the table , using system defaults. In this example, we first start thestatus development

http://doc.mapr.com/display/MapR/table+create

1. 2.

3. 4. 5. 6.

HBase shell from the command line with , and then use the command to add the column family.hbase shell alter

$ hbase shellHBase Shell; enter 'help<RETURN>' for list of supported commands.Type "exit<RETURN>" to leave the HBase ShellVersion 0.94.1-SNAPSHOT, rUnknown, Mon Dec 17 09:23:31 PST 2012

hbase(main):001:0>hbase(main):001:0> alter '/user/foo/development', {NAME => 'status'}

With the MapR Control System

In the MCS pane under the group, click . The tab appears in the main window.Navigation MapR Data Platform Tables TablesFind the table you want to work with, using one of the following methods.

Scan for the table under on the tab.Recently Opened Tables TablesEnter a regular expression for part of the table pathname in the field and click .Go to table Go

Click the desired table name. A tab appears in the main MCS pane, displaying information for the specific table.TableClick the tab.Column FamiliesClick . The dialog appears.New Column Family Create Column FamilyEnter values for the following fields:

Column Family Name - Required.Max Versions - The maximum number of versions of a cell to keep in the table.Min Versions - The minimum number of versions of a cell to keep in the table.Compression - The compression algorithm used on the column family's data. Select a value from the drop-down. Thedefault value is , which uses the same compression type as the table. Available compression methods are LZF,InheritedLZ4, and ZLib. Select to disable compression.OFFTime-To-Live - The minimum time-to-live for cells in this column family. Cells older than their time-to-live stamp are purgedperiodically.In memory - Preference for a column family to reside in memory for fast lookup.

You can change any column family properties at a later time using the MCS or from the command line.maprcli table cf edit

The screen-capture below demonstrates the creation of a column family to table at location userinfo /user/analysis/tables/table0.1

With the MapR CLI

Use the command at a command line. For details see or type maprcli table cf create table cf create maprcli table cf at a command line. The following example demonstrates addition of a column family named in table create -help casedata /user/ana

, using lz4 compression, and keeping a maximum of 5 versions of cells in the column family.lysis/tables/table01

http://doc.mapr.com/display/MapR/table+cf+edit

http://doc.mapr.com/display/MapR/table+cf+create

1. 2.

$ maprcli table cf create -path /user/analysis/tables/table01 \ -cfname casedata -compression lzf -maxversions 5$ maprcli table cf list -path /user/analysis/tables/table01inmemory cfname compression ttl maxversions minversionstrue userinfo lz4 0 3 0false casedata lzf 0 5 0$

You can change any column family properties at a later time using the command.maprcli table cf edit

Developing For MapR Tables

When you use to manage your application development, look for the following section in your POM file:Maven

<dependency> <groupId>com.mapr.fs</groupId> <artifactId>mapr-hbase</artifactId> <version>1.0.3-mapr-3.0.2</version> <scope>provided</scope></dependency>

Add the following section to your POM file immediately following the above section:

<dependency>

<groupId>org.apache.hbase</groupId>

<artifactId>hbase</artifactId>

<version>0.94.12-mapr-1310-m7-3.0.2</version>

<scope>provided</scope>

</dependency>

This section makes the calls use the correct version of the implementation.ClassLoader

Bulk Loading and MapR-DB Tables

The most common way of loading data to a MapR table is with a put operation. At large scales, bulk loads offer a performance advantage over putoperations.

Bulk loading can be performed as a full bulk load or as an incremental bulk load. Full bulk loads offer the best performance advantage for emptytables. Incremental bulk loads can add data to existing tables concurrently with other table operations, with better performance than put operations.

Bulk Load Process Flow

Once your source data is in the MapR-FS layer, bulk loading uses a MapReduce job to perform the following steps:

Transform the source data into the native file format used by MapR tables.Notify the database of the location of the resulting files.

A full bulk load operation can only be performed to an empty table and skips the write-ahead log (WAL) typical of Apache HBase and MapR tableoperations, resulting in increased performance. Incremental bulk load operations do use the WAL.

Creating a MapR Table with Full Bulk Load Support

When you create a new MapR table with the maprcli table create command, specify the value of the -bulkload parameter as true.

When you create a new MapR table from the hbase shell, specify BULKLOAD as true, as in the following example:

create '/a0','f1', BULKLOAD => 'true'

When you create a new MapR table from the MapR Control System (MCS), check the Bulk Load box under Table Properties.

http://doc.mapr.com/display/MapR/Maven+Repository+and+Artifacts+for+MapR

http://doc.mapr.com/display/MapR/table+create

Performing Bulk Load Operations

Note: You can only perform a full bulk load to empty tables that have the bulk load attribute set. You can only set this attribute during tablecreation. The alter operation will not set this attribute to true on an existing table.Warning: Your table is unavailable for normal client operations, including put, get, and scan operations, while a full bulk load operation is inprogress. To keep your table available for client operations, use an incremental bulk load.

Note: Attempting a full bulk load to a table that does not have the bulk load attribute set will result in an incremental bulk load being performedinstead.

You can use incremental bulk loads to ingest large amounts of data to an existing table. Tables remain available for standard client operationssuch as put, get, and scan while the bulk load is in process. A table can perform multiple incremental bulk load operations simultaneously.

Bulk loading is supported for the following tools, which can be used for both full or incremental bulk load operations:

The CopyTable tool uses a MapReduce job to copy a MapR table.hbase com.mapr.fs.hbase.mapreduce.CopyTable -src /table1 -dst /table2The CopytableTest tool copies a MapR table without using MapReduce.hbase com.mapr.fs.CopyTableTest -src /table1 -dst /table2The ImportTsv tool imports a tab-separated values file into a MapR table.importtsv-Dimporttsv.columns=HBASE_ROW_KEY,CF-1:custkey,CF-1:orderstatus,CF-1:totalprice,CF-1:orderdate,CF-1:orderpriority -Dimporttsv.separator='|' -Dimporttsv.bulk.output=/dummy /table1 /ordersThe ImportFiles tool imports HFile or Result files into a MapR table.hbase com.mapr.fs.hbase.mapreduce.ImportFiles -Dmapred.reduce.tasks=2 -inputDir /test/tabler.kv-table /table2 -format ResultCustom MapReduce jobs can use bulk loads with the configureIncrementalLoad() method from the HFileOutputFormat class.HTable table = new HTable(jobConf, tableName);HFileOutputFormat.configureIncrementalLoad(mrJob, table);

After completing a full bulk load operation, take the table out of bulk load mode to restore normal client operations. You can do this from thecommand line or the HBase shell with the following commands:# maprcli table edit -path /user/juser/mytable -bulkload false (command line)hbase shell> alter '/user/juser/mytable', 'f2', BULKLOAD => 'false' (hbase shell)

Writing Custom MapReduce Jobs Using Bulk Loads

The class on MapR clusters distinguishes between Apache HBase tables and MapR tables, behaving appropriately for eachHFileOutputFormattype. Existing workflows that rely on the tools, support both types of tables withoutHFileOutputFormat class, such as the importtsv and copytablefurther configuration.

Writing a Bulk Load MapReduce Job for a Pre-split Table

The MapReduce job for bulk loading to a pre-split table has a number of reducers that is determined by the number of splits in the table. You canset up the partitions for the reducers to match the table regions with the HFileOutputFormat.configureIncrementalLoad() API, as withApache HBase. The following sample code shows how to set up these partitions:

HTable table = new HTable(conf, tableName);job.setReducerClass(KeyValueSortReducer.class);job.setMapOutputKeyClass(ImmutableBytesWritable.class);job.setMapOutputValueClass(KeyValue.class);HFileOutputFormat.configureIncrementalLoad(job, table);

Writing a Bulk Load MapReduce Job for an Auto-split Table

To set up reducer partitions for an automatically split table, where the number of regions is unknown, there are two approaches:

Create a partition file and define a starting key for each reducer. With this approach, an even distribution of load on the reducers dependson the qualities of the input data.Use an input sampler to create the partition sampler. The input sampler randomly walks through the input data and generates an optimalsplit based on the samples.

Example 1: Using a Partition File

job.setPartitionerClass(TotalOrderPartitioner.class);Path partitionFile = new Path(job.getWorkingDirectory(), "partitions");TotalOrderPartitioner.setPartitionFile(conf, partitionFile);// User method to write the split keys to partition filewritePartitions(conf, partitionFile, startKeys);job.setOutputFormatClass(HFileOutputFormat.class);HFileOutputFormat.configureMapRTablePath(job, dstTableName);

Example 2: Using an Input Sampler to Create the Partition File

TotalOrderPartitioner.setPartitionFile(conf, partitionFile);// Setup a partitioner with sampling.InputSampler.Sampler<ImmutableBytesWritable, KeyValue> sampler = newInputSampler.SplitSampler<ImmutableBytesWritable, KeyValue>(1000);// Let the input sampler write the data to partition fileInputSampler.writePartitionFile(job, sampler);job.setOutputFormatClass(HFileOutputFormat.class);HFileOutputFormat.configureMapRTablePath(job, tableName);

Schema Design for MapR Tables

http://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.html

Your database schema defines the data in your tables and how they are related. The choices you make when specifying how to arrange your datain keys and columns, and how those columns are grouped in families, can have a significant effect on query performance.

Row key designComposite keys

Column family designColumn design

Schemas for MapR tables follow the same general principles as schemas for standard Apache HBase tables, with one important difference.Because MapR tables can use up to 64 column families, you can make more extensive use of the advantages of column families:

Segregate related data into column families for more efficient queriesOptimize column-family specific parameters to tune for performanceGroup related data for more efficient compression

Naming your identifiers: Because the names for the row key, column family, and column identifiers are associated with every value in a table,these identifiers are replicated potentially billions of times across your tables. Keeping these names short can have a significant effect on yourtables' storage demands.

Access times for MapR tables are fastest when a single record is looked up based on the full row key. Partial scans within a column family aremore demanding on the cluster's resources. A full-table scan is the least efficient way to retrieve data from your table.

Row key design

Because records in Apache HBase tables are stored in lexicographical order, using a sequential generation method for row keys can lead to a hot problem. As new rows are created, the table splits. Since the new records are still being created sequentially, all the new entries are stillspot

directed to a single node until the next split, and so on. In addition to concentrating activity on a single region, all the other splits remain at halftheir maximum size.

With MapR tables, the cluster handles sequential keys and table splits to keep potential hotspots moving across nodes, decreasing the intensityand performance impact of the hotspot.

To spread write and insert activity across the cluster, you can randomize sequentially generated keys by hashing the keys, inverting the byteorder. Note that these strategies come with trade-offs. Hashing keys, for example, makes table scans for key subranges inefficient, since thesubrange is spread across the cluster.

Instead of hashing the entire key, you can the key by prepending a few bytes of the hash to the actual key. For a key based on a timestamp,saltfor instance, a timestamp value of has an MD5 hash that ends with . By making the key for that row , you1364248490 ffe5 ffe51364248490avoid hotspotting. Since the first four digits are known to be the hash salt, you can derive the original timestamp by dropping those digits.

Be aware that a row key is immutable once created, and cannot be renamed. To change a row key's name, the original row must be deleted andthen re-created with the new name.

Composite keys

Rows in a MapR table can only have a single row key. You can create to approximate multiple keys in a table. A composite keycomposite keyscontains several individual fields joined together, for example and . You can then scan for the specific segments of theuserID applicationIDcomposite row key that represent the original, individual field.

Because rows are stored in sorted order, you can affect the results of the sort by changing the ordering of the fields that make up the compositerow key. For example, if your application IDs are generated sequentially but your user IDs are not, using a composite key of userID+applicationIDwill store all rows with the same user ID closely together. If you know the userID for which you want to retrieve rows, you can specify the first userI

row and the first row as the start and stop rows for your scan, then retrieve the rows you're interested in without scanning the entireD userID+1table.

When designing a composite key, consider how the data will be queried during production use. Place the fields that will be queried the most oftentowards the front of the composite key, bearing in mind that sequential keys will generate hot spotting.

Column family design

Scanning an entire table for matches can be very performance-intensive. enable you to group related sets of data and restrictColumn familiesqueries to a defined subset, leading to better performance. You can also make a specified column family remain in memory to further increase thespeed at which the system accesses that data. When you design a column family, think about what kinds of queries are going to be used the mostoften, and group your columns accordingly.

You can specify compression settings for individual column families, which lets you choose the settings that prioritize speed of access or efficientuse of disk space, according to your needs.

Be aware of the approximate number of rows in your column families. This property is called the column family's . When column familiescardinalityin the same table have very disparate cardinalities, the sparser table's data can be spread out across multiple nodes, due to the denser tablerequiring more splits. Scans on the sparser column family can take longer due to this effect. For example, consider a table that lists productsacross a small range of numbers, but with a row for the unique serial numbers for each individual product manufactured within a givenmodelmodel. Such a table will have a very large difference in cardinality between a column family that relates to the model number compared to a

column family that relates to the serial number. Scans on the model-number column family will have to range across the cluster, since thefrequent splits required by the comparatively large numbers of serial-number rows will spread the model-number rows out across many regions onmany nodes.

Column design

MapR tables split at the row level, not the column level. For this reason, extremely wide tables with very large numbers of columns can sometimesreach the recommended size for a table split at a comparatively small number of rows.

Because MapR tables are , you can add columns to a table at any time. Null columns for a given row don't take up any storage space.sparse

Supported Regular Expressions in MapR Tables

MapR tables support the regular expressions provided by the , as well as a subset of the completePerl- Compatible Regular Expressions libraryset of regular expressions supported in . For more information on Perl compatible regular expressions, issue the java.util.regex.Pattern m

command from a terminal prompt.an pcrepattern

Applications for Apache HBase that use regular expressions not supported in MapR tables will need to be rewritten to use supported regularexpressions.

The tables in the following sections define the subset of Java regular expressions supported in MapR tables.

Characters

Pattern Description

x The character x

\\ The backslash character

\0n The character with octal value 0n (0 <= n <= 7)

\0nn The character with octal value 0nn (0 <= n <= 7)

\xhh The character with hexadecimal value 0xhh

\t The tab character ('\u0009')

\n The newline (line feed) character ('\u000A')

\r The carriage-return character ('\u000D')

\f The form-feed character ('\u000C')

\a The alert (bell) character ('\u0007')

\e The escape character ('\u001B')

\cx The control character corresponding to x

Character Classes

Pattern Description

[abc] a, b, or c (simple class)

[Supported Regular Expressions in MapR Tables^abc] Any character except a, b, or c (negation)

[a-zA-Z] a through z or A through Z, inclusive (range)

Predefined Character Classes

Pattern Description

. Any character (may or may not match line terminators)

\d A digit: [0-9]

\D A non-digit: [Supported Regular Expressions in MapR Tables^0-9]

In general, design your schema to prioritize more rows and fewer columns.

http://www.pcre.org/pcre.txt

\s A whitespace character: [ \t\n\x0B\f\r]

\S A non-whitespace character: [Supported Regular Expressions in MapR Tables^\s]

\w A word character: [a-zA-Z_0-9]

\W A non-word character: [Supported Regular Expressions in MapR Tables^\w]

Classes for Unicode Blocks and Categories

Pattern Description

\p{Lu} An uppercase letter (simple category)

\p{Sc} A currency symbol

Boundaries

Pattern Description

^ The beginning of a line

$ The end of a line

\b A word boundary

\B A non-word boundary

\A The beginning of the input

\G The end of the previous match

\Z The end of the input but for the final terminator, if any

\z The end of the input

Greedy Quantifiers

Pattern Description

X? X, once or not at all

X* X, zero or more times

X+ X, one or more times

X{n} X, exactly n times

X{n,} X, at least n times

X{n,m} X, at least n but not more than m times

Reluctant Quantifiers

Pattern Description

X?? X, once or not at all

X*? X, zero or more times

X+? X, one or more times

X{n}? X, exactly n times

X{n,}? X, at least n times

X{n,m}? X, at least n but not more than m times

Possessive Quantifiers

Pattern Description

X?+ X, once or not at all

X*+ X, zero or more times

X++ X, one or more times

X{n}+ X, exactly n times

X{n,}+ X, at least n times

X{n,m}+ X, at least n but not more than m times

Logical Operators

Pattern Description

XY X followed by Y

X|Y Either X or Y

(X) X, as a capturing group

Back References

Pattern Description

\n Whatever the nth capturing group matches

Quotation

Pattern Description

\ Nothing, but quotes the following character

\Q Nothing, but quotes all characters until \E

\E Nothing, but ends quoting started by \Q

Special Constructs

Pattern Description

(?:X) X, as a non-capturing group

(?=X) X, via zero-width positive lookahead

(?!X) X, via zero-width negative lookahead

(?<=X) X, via zero-width positive lookbehind

(?<!X) X, via zero-width negative lookbehind

(?>X) X, as an independent, non-capturing group

MapR Table Support for Apache HBase APIThis page lists the supported interfaces for accessing MapR tables. This page contains the following topics:

Compatibility with the Apache HBase APIMapR Tables and Filters

HBase Shell Commands

Compatibility with the Apache HBase API

The API for accessing MapR tables works the same way as the Apache HBase API. Code written for Apache HBase can be easily ported to useMapR tables.

MapR tables do not support low-level HBase API calls that are used to manipulate the state of an Apache HBase cluster. HBase API calls that arenot supported by MapR tables report successful completion to allow legacy code written for Apache HBase to continue executing, but do notperform any actual operations.

For details on the behavior of each function, refer to the .Apache HBase API documentation

HBaseAdmin API Available for MapRTables?

Comments

void addColumn(String tableName,HColumnDescriptor column)

Yes

void close() Yes

void createTable()(HTableDescriptor desc,byte[][] splitKeys)

Yes This call is synchronous.

void createTableAsync() (HTableDescriptordesc, byte[][] splitKeys)

Yes For MapR tables, this call is identical to createTable.

void deleteColumn (byte[] family, byte[]qualifier, long timestamp)

Yes

void deleteTable(String tableName) Yes

HTableDescriptor[] deleteTables(Patternpattern)

Yes

Configuration getConfiguration() Yes

HTableDescriptor getTableDescriptor (byte[]tableName)

Yes

HTableDescriptor[] getTableDescriptors(List<String> tableNames)

Yes

boolean isTableAvailable(String tableName) Yes

boolean isTableDisabled(String tableName) Yes

boolean isTableEnabled(String tableName) Yes

HTableDescriptor[] listTables() Yes

void modifyColumn(String tableName,HColumnDescriptor descriptor)

Yes

void modifyTable (byte[] tableName,HTableDescriptor htd)

No

boolean tableExists(String tableName) Yes

Pair<Integer, Integer> getAlterStatus (byte[] tableName)

Yes

CompactionState getCompactionState(StringtableNameOrRegionName)

Yes Returns .CompactionState.NONE

void (byte[] tableNameOrRegionName)split Yes The parameter has atableNameOrRegionNamedifferent format when used with MapR tables than withApache HBase tables. With MapR Tables, specify boththe table path and the FID as a comma-separated list.

void abort(String why, Throwable e) No

void assign (byte[] regionName) No

boolean balancer() No

boolean balanceSwitch(boolean b) No

http://hbase.apache.org/apidocs/index.html

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#addColumn(java.lang.String, org.apache.hadoop.hbase.HColumnDescriptor)

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#close()

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#createTable(org.apache.hadoop.hbase.HTableDescriptor)

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#deleteTable(java.lang.String)

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#deleteTables(java.util.regex.Pattern)

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#getConfiguration()

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#getTableDescriptors(java.util.List)

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#isTableAvailable(java.lang.String)

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#isTableDisabled(java.lang.String)

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#isTableEnabled(java.lang.String)

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HConnection.html#listTables()

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#modifyColumn(java.lang.String, org.apache.hadoop.hbase.HColumnDescriptor)

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#tableExists(java.lang.String)

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#getCompactionState(java.lang.String)

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/Abortable.html#abort(java.lang.String, java.lang.Throwable)

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#balancer()

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/master/HMaster.html#balanceSwitch(boolean)

void closeRegion(ServerName sn, HRegionInfohri)

No

void closeRegion(String regionname, StringserverName)

No

boolean closeRegionWithEncodedRegionName(String encodedRegionName, String serverName)

No

void flush(String tableNameOrRegionName) No

ClusterStatus getClusterStatus() No

HConnection getConnection() No

HMasterInterface getMaster() No

String[] getMasterCoprocessors() No

boolean isAborted() No

boolean isMasterRunning() No

void majorCompact(StringtableNameOrRegionName)

No

void (byte[] encodedRegionName, byte[]movedestServerName)

No

byte[][] rollHLogWriter(String serverName) No

boolean setBalancerRunning(boolean on,boolean synchronous)

No

void shutdown() No

void stopMaster() No

void stopRegionServer(String hostnamePort) No

void unassign (byte[] regionName, booleanforce)

No

HTable API Available for MapRTables?

Comments

Configuration and State Management

void clearRegionCache() No Operation is silently ignored.

void close() Yes

<T extends CoprocessorProtocol, R>Map<byte[], R> coprocessorExec(Class<T>protocol, byte[] startKey, byte[] endKey,Call<T, R> callable)

No Returns .null

<T extends CoprocessorProtocol> TcoprocessorProxy(Class<T> protocol, byte[]row)

No Returns .null

Map<HRegionInfo, HServerAddress>deserializeRegionInfo(DataInput in)

Yes

void flushCommits() Yes

Configuration getConfiguration() Yes

HConnection getConnection() No Returns null

int getOperationTimeout() No Returns null

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#closeRegion(org.apache.hadoop.hbase.ServerName, org.apache.hadoop.hbase.HRegionInfo)

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#closeRegion(java.lang.String, java.lang.String)

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#closeRegionWithEncodedRegionName(java.lang.String, java.lang.String)

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#flush(java.lang.String)

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#getClusterStatus()

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#getConnection()

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#getMasterCoprocessors()

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#isAborted()

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#isMasterRunning()

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#majorCompact(java.lang.String)

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#rollHLogWriter(java.lang.String)

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#setBalancerRunning(boolean, boolean)

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#shutdown()

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#stopMaster()

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#stopRegionServer(java.lang.String)

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#clearRegionCache()

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#close()

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#flushCommits()

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#getConfiguration()

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#getConnection()

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#getOperationTimeout()

ExecutorService [getPool() No Returns null

int getScannerCaching() No Returns 0

ArrayList<Put> getWriteBuffer() No Returns null

long getWriteBufferSize() No Returns 0

boolean isAutoFlush() Yes

void prewarmRegionCache(Map<HRegionInfo,HServerAddress> regionMap)

No Operation is silently ignored.

void serializeRegionInfo(DataOutput out) Yes

void setAutoFlush(boolean autoFlush,boolean clearBufferOnFail)

Same assetAutoFlush(booleanautoFlush)

void setAutoFlush(boolean autoFlush) Yes

void setFlushOnRead(boolean val) Yes

boolean shouldFlushOnRead() Yes

void setOperationTimeout(intoperationTimeout)


void setScannerCaching(int scannerCaching) No Operation is silently ignored.

void setWriteBufferSize(longwriteBufferSize)


Atomic operations

Result append(Append append) Yes

boolean (byte[] row, byte[]checkAndDeletefamily, byte[] qualifier, byte[] value,Delete delete)

Yes

boolean (byte[] row, byte[]checkAndPutfamily, byte[] qualifier, byte[] value, Putput)

Yes

Result increment(Increment increment) Yes

long (byte[] row,incrementColumnValuebyte[] family, byte[] qualifier, longamount, boolean writeToWAL)

Yes

long (byte[] row,incrementColumnValuebyte[] family, byte[] qualifier, longamount)

Yes

void (RowMutations rm) mutateRow Yes

DML operations

void (List actions, Object[] results) batch Yes

Object[] batch(List<? extends Row> actions) Yes

void delete(Delete delete) Yes

void delete(List<Delete> deletes) Yes

boolean exists(Get get) Yes

Result get(Get get) Yes

Result[] get(List<Get> gets) Yes

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#getScannerCaching()

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#getScannerCaching()

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#getWriteBufferSize()

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#isAutoFlush()

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#setAutoFlush(boolean, boolean)

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#setAutoFlush(boolean, boolean)

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#setAutoFlush(boolean)

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#setOperationTimeout(int)

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#setOperationTimeout(int)

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#setScannerCaching(int)

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#setWriteBufferSize(long)

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#setWriteBufferSize(long)

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#append(org.apache.hadoop.hbase.client.Append)

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#increment(org.apache.hadoop.hbase.client.Increment)

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#mutateRow(org.apache.hadoop.hbase.client.RowMutations)

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#batch(java.util.List)

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#delete(org.apache.hadoop.hbase.client.Delete)

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#delete(java.util.List)

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#exists(org.apache.hadoop.hbase.client.Get)

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#get(org.apache.hadoop.hbase.client.Get)

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#get(java.util.List)

Result (byte[] row, byte[]getRowOrBeforefamily)

No

ResultScanner (...) getScanner Yes

void put(Put put) Yes

void put(List<Put> puts) Yes

Table Schema Information

HRegionLocation (byte[]getRegionLocationrow, boolean reload)

Yes

Map<HRegionInfo, HServerAddress>getRegionsInfo()

Yes

List<HRegionLocation> (bytgetRegionsInRangee[] startKey, byte[] endKey)

Yes

byte[][] getEndKeys() Yes

byte[][] getStartKeys() Yes

Pair<byte[][], byte[][]> getStartEndKeys() Yes

HTableDescriptor getTableDescriptor() Yes

byte[] getTableName() Yes Returns table path

Row Locks

RowLock lockRow(byte[] row) No

void unlockRow(RowLock rl) No

HTablePool API Available for MapRTables?

Comments

()close Yes

closeTablePool(byte[] tableName) Yes

(String tableName)closeTablePool Yes

protected HTableInterface (StricreateHTableng tableName)

Yes

int (String tableName)getCurrentPoolSize Yes

HTableInterface (byte[] tableName)getTable Yes

HTableInterface (String tableName)getTable Yes

void (HTableInterface table)putTable Yes

MapR Tables and FiltersMapR tables support the following built-in filters. These filters work identically to their Apache HBase versions.

Filter Description

ColumnCountGetFilter Returns the first N columns of a row.

ColumnPaginationFilter

ColumnPrefixFilter

ColumnRangeFilter

CompareFilter

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#put(org.apache.hadoop.hbase.client.Put)

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#put(java.util.List)

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#getEndKeys()

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#getStartKeys()

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#getStartEndKeys()

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#getTableDescriptor()

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#getTableName()

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTablePool.html#close()

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTablePool.html#closeTablePool(java.lang.String)

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTablePool.html#createHTable(java.lang.String)

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTablePool.html#getCurrentPoolSize(java.lang.String)

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTablePool.html#getTable(java.lang.String)

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTablePool.html#putTable(org.apache.hadoop.hbase.client.HTableInterface)

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/ColumnCountGetFilter.html

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/ColumnPaginationFilter.html

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/ColumnPrefixFilter.html

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/ColumnRangeFilter.html

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/CompareFilter.html

FirstKeyOnlyFilter

FuzzyRowFilter

InclusiveStopFilter

KeyOnlyFilter

MultipleColumnPrefixFilter

PageFilter

PrefixFilter

RandomRowFilter

SingleColumnValueFilter

SkipFilter

TimestampsFilter

WhileMatchFilter

FilterList

RegexStringComparator

ColumnRangeFilter

MultipleColumnPrefixFilter

HBase Shell Commands

The following table lists support information for HBase shell commands for managing MapR tables.

Command Available for MapRTables?

Comments

alter Yes

alter_async Yes

create Yes

describe Yes

disable Yes

drop Yes

enable Yes

exists Yes

is_disabled Yes

is_enabled Yes

list Yes

disable_all Yes

drop_all No Obsolete. Use the command from the MapR filesystem or rm <table names> hadoop fs instead.-rm <table names>

enable_all Yes

show_filters Yes

count Yes

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FirstKeyOnlyFilter.html

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FuzzyRowFilter.html

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/InclusiveStopFilter.html

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/KeyOnlyFilter.html

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/MultipleColumnPrefixFilter.html

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/PageFilter.html

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/PrefixFilter.html

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/RandomRowFilter.html

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/SingleColumnValueFilter.html

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/SkipFilter.html

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/TimestampsFilter.html

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/WhileMatchFilter.html

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FilterList.html

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/RegexStringComparator.html

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/ColumnRangeFilter.html

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/MultipleColumnPrefixFilter.html

get Yes

put Yes

scan Yes

delete Yes

deleteall Yes

incr Yes

truncate Yes

get_counter Yes

assign No

balance_switch No

balancer No

close_region No

major_compact No

move No

unassign No

zk_dump No

status No

version Yes

whoami Yes

Using AsyncHBase with MapR TablesYou can use the to provide asynchronous access to MapR tables. MapR provides a of AsyncHBase modified toAsyncHBase libraries versionwork with MapR tables. Once your cluster is ready to use MapR tables, it is also ready to use AsyncHBase with MapR tables.

After installing the package, the AsyncHBase JAR file is in the directory mapr-asynchbase asynchbase-1.4.1-mapr.jar /opt/mapr/had. Add that directory to your Java .oop/hadoop-0.20.2/lib CLASSPATH

See also

Documentation for AsyncHBase client

Using OpenTSDB with AsyncHBase and MapR TablesThe software package provides a time series database that collects user-specified data. OpenTSDB Because OpenTSDB depends on

Download the OpenTSDBAsyncHBase, MapR provides a customized version of OpenTSDB that works with AsyncHBase for MapR tables.source from the repository instead of the standard [email protected]:mapr/opentsdb.git github.com/OpenTSDB/opentsdb.gitn.

Nodes using OpenTSDB must have the following packages installed:

One of or mapr-core mapr-clientmapr-hbase-<version). To ensure that your existing HBase applications and workflow work properly, install the packagmapr-hbasee that provides the same version number of HBase as your existing Apache HBase.

You can follow the directions at OpenTSDB's page, changing to Getting Started git clone git://github.com/OpenTSDB/opentsdb.git.git clone git://github.com/mapr/opentsdb.git

After running the script, replace the contents of the file with thebuild.sh /opt/mapr/hadoop/hadoop-0.20.2/conf/core-site.xmlcontents of the on all nodes. Set up a table to specify the full path to the and /opentsdb/core-site.xml.template mapping tsdb tsdb-ui

tables. Create the directories in that path before running the script.d create_table.sh

See also

http://www.tsunanet.net/~tsuna/asynchbase/1.4.1/

http://doc.mapr.com/display/MapR/AsyncHBase+Release+Notes

http://tsunanet.net/~tsuna/asynchbase/api/org/hbase/async/HBaseClient.html

http://opentsdb.net

http://opentsdb.net/getting-started.html

Documentation for OpenTSDB

Protecting Table DataThis page discusses how to organize tables and files on a MapR cluster by making effective use of directories and volumes. To learn aboutaccess control and authorization for MapR tables on version 3.1 and later of the MapR distribution for Hadoop, see Enabling Table Authorization

. This page contains the following topics:with Access Control Expressions

Organizing Tables and Files in DirectoriesControlling Table Storage Policy with VolumesMirrors and Snapshots for MapR TablesComparison to Apache HBase Running on a MapR ClusterRelated Topics

Organizing Tables and Files in DirectoriesBecause the 3.0 release of the MapR distribution for Hadoop mingles unstructured files with structured tables in a directory structure, you cangroup logically-related files and tables together. For example, tables related to a project housed in directory can be saved in a/user/foosubdirectory, such as ./user/foo/tables

Listing the contents of a directory with lists both tables and files stored at that path. Because table data is not structured as a simple characterlsstream, you cannot operate on table data with common Linux commands such as , , and . See for morecat more > Filesystem Operationsinformation on Linux file system operations with MapR tables.

Example: Creating a MapR table in a directory using the HBase shell

In this example, we create a new table in directory on a MapR cluster that already contains a mix of files and tables. In thistable3 /user/daveexample, the MapR cluster is mounted at ./maprcluster/

$ pwd/maprcluster/user/dave

$ lsfile1 file2 table1 table2

$ hbase shellhbase(main):003:0> create '/user/dave/table3', 'cf1', 'cf2', 'cf3'0 row(s) in 0.1570 seconds

$ lsfile1 file2 table1 table2 table3

$ hadoop fs -ls /user/daveFound 5 items-rw-r--r-- 3 mapr mapr 16 2012-09-28 08:34 /user/dave/file1-rw-r--r-- 3 mapr mapr 22 2012-09-28 08:34 /user/dave/file2trwxr-xr-x 3 mapr mapr 2 2012-09-28 08:32 /user/dave/table1trwxr-xr-x 3 mapr mapr 2 2012-09-28 08:33 /user/dave/table2trwxr-xr-x 3 mapr mapr 2 2012-09-28 08:38 /user/dave/table3

Note that in the listing, table items are denoted by a bit.hadoop fs -ls t

Controlling Table Storage Policy with VolumesMapR provides volumes as a way to organize data and manage cluster performance. A is a logical unit that allows you to apply policies tovolumea set of files, directories, and tables. Volumes are used to enforce disk usage limits, set replication levels, define snapshots and mirrors, andestablish ownership and accountability.

Because MapR tables are stored in volumes, these same storage policy controls apply to table data. As an example, the diagram below depicts aMapR cluster storing table and file data. The cluster has three separate volumes mounted at directories , , and /user/john /user/dave /proj

. As shown, each directory contains both file data and table data, grouped together logically. Because each of these directories maps toect/adsa different volume, data in each directory can have different policy. For example, has a disk-usage quota, while is on/user/john /user/dave

http://opentsdb.net/getting-started.html



http://doc.mapr.com/display/MapR/Working+With+MapR+Tables+and+Column+Families#WorkingWithMapRTablesandColumnFamilies-FSOps


a snapshot schedule. Furthermore, two directories, and are mirrored to locations outside the cluster, providing/user/john /project/adsread-only access to high-traffic data, including the tables in those volumes.

Example: Restricting table storage with quotas and physical topology

This example creates a table with disk usage quota of 100GB restricted to certain data nodes in the cluster. First we create a volume named pro, specifying the quota and restricting storage to nodes in the topology, and mounting it in the localject-tables-vol /data/rack1

namespace. Next we use the HBase shell to create a new table named , specifying a path inside the volumedatastore project-tables-vol.

$ pwd /mapr/cluster1/user/project

$lsbin src

$ maprcli volume create -name project-tables-vol -path /user/project/tables \ -quota 100G -topology /data/rack1

$ lsbin src tables

$ hbase shellHBase Shell; enter 'help<RETURN>' for list of supported commands.Type "exit<RETURN>" to leave the HBase ShellVersion 0.94.1-SNAPSHOT, rUnknown, Thu Oct 25 09:28:51 PDT 2012hbase(main):001:0> create '/user/project/tables/datastore', 'colfamily1'0 row(s) in 0.5180 secondshbase(main):002:0> exit

$ ls -l tablestotal 1lrwxr-xr-x 1 mapr mapr 2 Oct 25 15:20 datastore -> mapr::table::2252.32.16498

1.

2.

Mirrors and Snapshots for MapR TablesBecause MapR tables are stored in volumes, you can take advantage of MapR , , and , whichSchedules Mirror Volumes Working with Snapshotsoperate at the volume level. Mirrors and snapshots are read-only copies of specific volumes on the cluster, which can be used to provision fordisaster recovery and improved access time for high-bandwidth data. To access tables in snapshots or mirrors, HBase programs access a tablepath in a mirror or snapshot volume.

You can set policy for volumes using the or the commands. For details, see .MapR Control System maprcli Managing Data with Volumes

Comparison to Apache HBase Running on a MapR ClusterPrior to MapR version 3.0, the only option for HBase users was to run Apache HBase on top of the MapR cluster. For the purposes of illustration,this section contrasts how running Apache HBase on a MapR cluster differs from the integrated tables in MapR.

As shown in the diagram below, installing Apache HBase on a MapR cluster involves storing all HBase components in a single volume mapped todirectory in the cluster. Compared to the MapR implementation shown above, this method has the following differences:/hbase

Tables are stored in a flat namespace, not grouped logically with related files.Because all Apache HBase data resides in one volume, only one set of storage policies can be applied to the entire Apache HBasedatastore.Mirrors and snapshots of the HBase volume do not provide functional replication of the datastore. Despite this limitation, mirrors can beused to backup HLogs and HFiles in order to provide a recovery point for Apache HBase data.

Related TopicsManaging Data with VolumesWorking With MapR Tables and Column Families Enabling Table Authorization with Access Control Expressions

Displaying Table Region InformationMapR tables are split into regions on an ongoing basis. Administrators and developers do not need to manage these regions or restructure dataon disk when data is added and deleted. These operations happen automatically. You can list region information for tables to get a sense of thesize and location of table data on the MapR cluster.

Examining Table Region Information in the MapR Control System

In the MCS pane under the group, click . The tab appears in the main window.Navigation MapR Data Platform Tables Tables

http://doc.mapr.com/display/MapR/Schedules

http://doc.mapr.com/display/MapR/Mirror+Volumes

http://doc.mapr.com/display/MapR/Working+with+Snapshots


http://doc.mapr.com/display/MapR/volume




2.

3. 4.

1. 2.

Find the table you want to work with, using one of the following methods.Scan for the table under on the tab.Recently Opened Tables TablesEnter the table pathname in the field and click .Go to table Go

Click the desired table name. A tab appears in the main MCS pane, displaying information for the specific table.TableClick the tab. The tab displays region information for the table.Regions Regions

Listing Table Region Information at the Command Line

Use the command:maprcli table region

$ maprcli table region list -path <path to table>sk sn ek pn lhb-INFINITY hostname1, hostname2 INFINITY hostname3 0

Integrating Hive and MapR TablesYou can create MapR tables from Hive that can be accessed by both Hive and MapR. With this functionality, you can run Hive queries on MapRtables. You can also convert existing MapR tables into Hive-MapR tables, running Hive queries on those tables as well.

Install and Configure HiveConfigure the the hive-site.xml File

Getting Started with Hive-MapR IntegrationCreate a Hive table with two columns:Start the HBase shell:

Zookeeper Connections

Install and Configure Hive

Install and configure Hive if it is not already installed.Execute the command and ensure that all relevant Hadoop, MapR, and Zookeeper processes are running.jps

Example:

$ jps21985 HRegionServer1549 jenkins.war15051 QuorumPeerMain30935 Jps15551 CommandServer15698 HMaster15293 JobTracker15328 TaskTracker15131 WardenMain

Configure the the Filehive-site.xml

1. Open the file with your favorite editor, or create a file if it doesn't already exist:hive-site.xml hive-site.xml

$ cd $HIVE_HOME$ vi conf/hive-site.xml

2. Copy the following XML code and paste it into the file.hive-site.xml

Note: If you already have an existing file with a element block, just copy the element block codehive-site.xml configuration propertybelow and paste it inside the element block in the file. Be sure to use the correct values for the paths to yourconfiguration hive-site.xmlauxiliary JARs and ZooKeeper IP numbers.

Example configuration:

http://doc.mapr.com/display/MapR/table+region

http://doc.mapr.com/display/MapR/Hive#Hive-installing

<configuration>

<property> <name>hive.aux.jars.path</name> <value>file:///opt/mapr/hive/hive-0.10.0/lib/hive-hbase-handler-0.10.0-mapr.jar,file:///opt/mapr/hbase/hbase-0.94.5/hbase-0.94.5-mapr.jar,file:///opt/mapr/zookeeper/zookeeper-3.4.5/zookeeper-3.4.5.jar</value> <description>A comma separated list (with no spaces) of the jar files required forHive-HBase integration</description></property>

<property> <name>hbase.zookeeper.quorum</name> <value>xx.xx.x.xxx,xx.xx.x.xxx,xx.xx.x.xxx</value> <description>A comma separated list (with no spaces) of the IP addresses of allZooKeeper servers in the cluster.</description></property>

<property> <name>hbase.zookeeper.property.clientPort</name> <value>5181</value> <description>The Zookeeper client port. The MapR default clientPort is5181.</description></property>

</configuration>

3. Save and close the file.hive-site.xml

If you have successfully completed all of the steps in this section, you're ready to begin the tutorial in the next section.

Getting Started with Hive-MapR Integration

In this tutorial we will:

Create a Hive tablePopulate the Hive table with data from a text fileQuery the Hive tableCreate a Hive-MapR tableIntrospect the Hive-MapR table from the HBase shellPopulate the Hive-MapR table with data from the Hive tableQuery the Hive-MapR table from HiveConvert an existing MapR table into a Hive-MapR table

Be sure that you have successfully completed all of the steps in the and sectionsInstall and Configure Hive Setting Up MapR-FS to Use Tablesbefore beginning this Getting Started tutorial.

This Getting Started tutorial is based on the section of the Apache Hive Wiki, and thanks to Samuel Guo and otherHive-HBase Integrationcontributors to that effort. If you are familiar with their approach to Hive-HBase integration, you should be immediately comfortable with thismaterial.

However, please note that there are some significant differences in this Getting Started section, especially in regards to configuration andcommand parameters or the lack thereof. Follow the instructions in this Getting Started tutorial to the letter so you can have an enjoyable andsuccessful experience.

Create a Hive table with two columns:

Change to your Hive installation directory if you're not already there and start Hive:

http://doc.mapr.com/display/MapR/Hive

https://cwiki.apache.org/Hive/hbaseintegration.html

$ cd $HIVE_HOME$ bin/hive

Execute the CREATE TABLE command to create the Hive pokes table:

hive> CREATE TABLE pokes (foo INT, bar STRING);

To see if the pokes table has been created successfully, execute the SHOW TABLES command:

hive> SHOW TABLES;OKpokesTime taken: 0.74 seconds

The table appears in the list of tables. pokes

Populate the Hive pokes table with data

The file is provided in the directory. Execute the LOAD DATA LOCAL INPATH command to populatekv1.txt $HIVE_HOME/examples/filesthe Hive table with data from the file.pokes kv1.txt

hive> LOAD DATA LOCAL INPATH './examples/files/kv1.txt' OVERWRITE INTO TABLE pokes;

A message appears confirming that the table was created successfully, and the Hive prompt reappears:

Copying data from file:...OKTime taken: 0.278 secondshive>

Execute a SELECT query on the Hive pokes table:

hive> SELECT * FROM pokes WHERE foo = 98;

The SELECT statement executes, runs a MapReduce job, and prints the job output:

OK98 val_9898 val_98Time taken: 18.059 seconds

The output of the SELECT command displays two identical rows because there are two identical rows in the Hive table with a key of 98.pokes

To create a Hive-MapR table, enter these four lines of code at the Hive prompt:

hive> CREATE TABLE mapr_table_1(key int, value string) > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' > WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val") > TBLPROPERTIES ("hbase.table.name" = "/user/mapr/xyz");

After a brief delay, a message appears confirming that the table was created successfully:

OKTime taken: 5.195 seconds

Note: The TBLPROPERTIES command is not required, but those new to Hive-MapR integration may find it easier to understand what's going onif Hive and MapR use different names for the same table.

In this example, Hive will recognize this table as "mapr_table_1" and MapR will recognize this table as "xyz".

Start the HBase shell:

Keeping the Hive terminal session open, start a new terminal session for HBase, then start the HBase shell:

$ cd $HBASE_HOME$ bin/hbase shellHBase Shell; enter 'help<RETURN>' for list of supported commands.Type "exit<RETURN>" to leave the HBase ShellVersion 0.90.4, rUnknown, Wed Nov 9 17:35:00 PST 2011

hbase(main):001:0>

Execute the list command to see a list of HBase tables:

hbase(main):001:0> listTABLE/user/mapr/xyz1 row(s) in 0.8260 seconds

HBase recognizes the Hive-MapR table named in directory . This is the same table known to Hive as .xyz /user/mapr mapr_table_1

Display the description of the /user/mapr/xyz table in the HBase shell:

Hive tables can have multiple identical keys. As we will see shortly, MapR tables cannot have multiple identical keys, only unique keys.

hbase(main):004:0> describe "/user/mapr/xyz"DESCRIPTION ENABLED {NAME => '/user/mapr/xyz', FAMILIES => [{NAME => 'cf1', DATA_B true LOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REP LICATION_SCOPE => '0', VERSIONS => '3', MIN_VERSION S => '0', TTL => '2147483647', KEEP_DELETED_CELLS = > 'false', BLOCKSIZE => '65536', IN_MEMORY => 'fals e', ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'} ]}1 row(s) in 0.0240 seconds

From the Hive prompt, insert data from the Hive table pokes into the Hive-MapR table mapr_table_1:

hive> INSERT OVERWRITE TABLE mapr_table_1 SELECT * FROM pokes WHERE foo=98;...2 Rows loaded to mapr_table_1OKTime taken: 13.384 seconds

Query mapr_table_1 to see the data we have inserted into the Hive-MapR table:

hive> SELECT * FROM mapr_table_1;OK98 val_98Time taken: 0.56 seconds

Even though we loaded two rows from the Hive table that had the same key of 98, only one row was actually inserted into pokes mapr_table_1. This is because is a MapR table, and although Hive tables support duplicate keys, MapR tables only support unique keys.mapr_table_1MapR tables arbitrarily retain only one key, and silently discard all of the data associated with duplicate keys.

Convert a pre-existing MapR table to a Hive-MapR table

To convert a pre-existing MapR table to a Hive-MapR table, enter the following four commands at the Hive prompt.

Note that in this example the existing MapR table is in directory .mapr_table_2 /user/mapr

hive> CREATE EXTERNAL TABLE mapr_table_2(key int, value string) > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' > WITH SERDEPROPERTIES ("hbase.columns.mapping" = "cf1:val") > TBLPROPERTIES("hbase.table.name" = "/user/mapr/my_mapr_table");

Now we can run a Hive query against the pre-existing MapR table that Hive sees as :/user/mapr/my_mapr_table mapr_table_2

hive> SELECT * FROM mapr_table_2 WHERE key > 400 AND key < 410;Total MapReduce jobs = 1Launching Job 1 out of 1Number of reduce tasks is set to 0 since there's no reduce operator...OK401 val_401402 val_402403 val_403404 val_404406 val_406407 val_407409 val_409Time taken: 9.452 seconds

Zookeeper Connections

If you see a similar error message to the following, ensure that and hbase.zookeeper.quorum hbase.zookeeper.property.clientPortare properly defined in the file.$HIVE_HOME/conf/hive-site.xml

Failed with exceptionjava.io.IOException:org.apache.hadoop.hbase.ZooKeeperConnectionException:HBase is able to connect to ZooKeeper but the connection closes immediately. Thiscould be asign that the server has too many connections (30 is the default). Consider inspectingyourZK server logs for that error and then make sure you are reusing HBaseConfiguration asoften asyou can. See HTable's javadoc for more information.

Migrating Between Apache HBase Tables and MapR TablesMapR tables can be parsed by the ( ). You can use theApache CopyTable tool org.apache.hadoop.hbase.mapreduce.CopyTableCopyTable tool to migrate data from an Apache HBase table to a MapR table or from a MapR table to an Apache HBase table.

Before You Start

Before migrating your tables to another platform, consider the following points:

Schema Changes. Apache HBase and MapR tables have different limits on the number of column families. If you are migrating to MapR,you may be interested in changing your table's to take advantage of the increased availability of column families. Conversely, ifschemayou're migrating from MapR tables to Apache, you may need to adjust your schema to reflect the reduced availability of column families.API Mappings: If you are migrating from Apache HBase to MapR tables, examine your current HBase applications to verify the APIs andHBase shell commands used are fully .supportedNamespace Mapping: If the migration will take place over a period of time, be sure to plan your in advancetable namespace mappingsto ease the transition.Implementation Limitations: MapR tables do not support HBase coprocessors. If your existing Apache HBase installation usescoprocessors, plan any necessary modifications in advance. MapR tables support a of the regular expressions supported insubsetApache HBase. Check your existing workflow and HBase applications to verify you are not using unsupported regular expressions.

If you are migrating MapR tables, be sure to change your Apache HBase client to the MapR client by installing the version of the to mapr-hbasepackage that matches the version of Apache HBase on your source cluster.

See for information about MapR installation procedures, including the proper repositories.Installing MapR Software setting up

Compression Mappings

MapR tables support the LZ4, LZF, and ZLIB compression algorithms.

http://hbase.apache.org/book/ops_mgt.html#copytable

http://doc.mapr.com/display/MapR/Installing+MapR+Software

http://doc.mapr.com/display/MapR/Preparing+Packages+and+Repositories

1.

1.

2.

3.

1.

When you create a MapR table with the Apache HBase API or the HBase shell and specify the LZ4, LZO, or SNAPPY compression algorithms,the resulting MapR table uses the LZ4 compression algorithm.

When you describe an MapR table schema through the HBase API, the LZ4 and OLDLZF compression algorithms map to the LZ4 compressionalgorithm.

Copying Data

Launch the CopyTable tool with the following command, specifying the full destination path of the table with the parameter:--new.name

hbase org.apache.hadoop.hbase.mapreduce.CopyTable-Dhbase.zookeeper.quorum=<ZooKeeper IP Address>-Dhbase.zookeeper.property.clientPort=2181 --new.name=/user/john/foo/mytable01

Example: Migrating an Apache HBase table to a MapR table

This example migrates the existing Apache HBase table to the MapR table .mytable01 /user/john/foo/mytable01

On the node in the MapR cluster where you will launch the CopyTable tool, modify the value of the property in the hbase.zookeeper.quorum h file to point at a ZooKeeper node in the source cluster. Alternately, you can specify the value for the base-site.xml hbase.zookeeper.quor

property from the command line. This example specifies the value in the command line.um

Create the destination table. This example uses the HBase shell. The and (MCS) are also viable methods.CLI MapR Control System

[user@host] hbase shellHBase Shell; enter 'help<RETURN>' for list of supported commands.Type "exit<RETURN>" to leave the HBase ShellVersion 0.94.3-SNAPSHOT, rUnknown, Thu Mar 7 10:15:47 PST 2013

hbase(main):001:0> create '/user/john/foo/mytable01', 'usernames', 'userpath'0 row(s) in 0.2040 seconds

Exit the HBase shell.

hbase(main):002:0> exit[user@host]

From the HBase command line, use the CopyTable tool to migrate data.

[user@host] hbase org.apache.hadoop.hbase.mapreduce.CopyTable--Dhbase.zookeeper.quorum=zknode1,zknode2,zknode3--new.name=/user/john/foo/mytable01 mytable01

Verifying Migration

After copying data to the new tables, verify that the migration is complete and successful. In increasing order of complexity:

Verify that the destination table exists. From the HBase shell, use the command, or use the command fromlist ls /user/john/fooa Linux prompt:

The CopyTable tool launches a MapReduce job. The nodes on your cluster must have the correct version of the packagemapr-hbaseinstalled. To ensure that your existing HBase applications and workflow work properly, install the mapr-hbase package that provides thesame version number of HBase as your existing Apache HBase.


http://doc.mapr.com/display/MapR/MapR-FS+Views#MapR-FSViews-tables

1.

2.

3.

1. 2.

3.

4.

5.

6.

7. 8.

1.

2. 3.

hbase(main):006:0> list '/user/john/foo'TABLE/user/john/foo/mytable011 row(s) in 0.0770 seconds

Check the number of rows in the source table against the destination table with the command:count

hbase(main):005:0> count '/user/john/foo/mytable01'30 row(s) in 0.1240 seconds

Hash each table, then compare the hashes.

Decommissioning the Source

After verifying a successful migration, you can decommission the source nodes where the tables were originally stored.

Decommissioning a MapR Node

Before you start, drain the node of data by moving the node to the physical . All the data on a node in the /decommissioned topology /decommi topology is migrated to volumes and nodes in the topology.ssioned /data

Run the following command to check if a given volume is present on the node:

maprcli dump volumenodes -volumename <volume> -json | grep <ip:port>

Run this command for each non-local volume in your cluster to verify that the node being decommissioned is not storing any volume data.

Change to the root user (or use sudo for the following commands).Stop the Warden:service mapr-warden stopIf ZooKeeper is installed on the node, stop it:service mapr-zookeeper stopDetermine which MapR packages are installed on the node:

dpkg --list | grep mapr (Ubuntu)rpm -qa | grep mapr (Red Hat or CentOS)

Remove the packages by issuing the appropriate command for the operating system, followed by the list of services. Examples:apt-get purge mapr-core mapr-cldb mapr-fileserver (Ubuntu)yum erase mapr-core mapr-cldb mapr-fileserver (Red Hat or CentOS)

Remove the directory to remove any instances of , , , and left behind by the package/opt/mapr hostid hostname zkdata zookeepermanager.Remove any MapR cores in the directory./opt/coresIf the node you have decommissioned is a CLDB node or a ZooKeeper node, then run on all other nodes in the clusterconfigure.sh(see ).Configuring the Node

Decommissioning Apache HBase Nodes

To decommission nodes running Apache HBase, follow these steps for each node:

From the HBase shell, disable the Region Load Balancer by setting the value of to :balance_switch false

hbase(main):001:0> balance_switch false

Leave the HBase shell by typing .exitRun the script to stop the HBase RegionServer:graceful stop

http://doc.mapr.com/display/MapR/Setting+Up+Topology#SettingUpTopology-SettingUpNodeTopology

http://doc.mapr.com/display/MapR/configure.sh

http://doc.mapr.com/display/MapR/Managing+Nodes#ManagingNodes-configure-node

3.

[user@host] ./bin/graceful_stop.sh <hostname>

Language Support for MapR TablesMapR tables can store, retrieve, and process data in the following languages:

A

AbazaAbkhazianAchineseAcoliAdangmeAdygheAfarAfrikaansAghemAinuAkanAkkadianAkooseAlbanianAleutAmharicAmoAncient EgyptianAncient GreekAngikaArabicAragoneseAramaicArapahoArawakArmenianAromanianAssameseAssyrian Neo-AramaicAsturianAsuAtikamekwAtsamAvaricAvestanAwadhiAymaraAzerbaijani

B

BadagaBafiaBafutBagheliBalineseBalkan Gagauz TurkishBaltiBaluchiBambaraBamunBantawaBasaaBashkirBasque

The script does not look up the hostname for an IP number. Do not pass an IP number to the script.graceful_stop.shCheck the list of RegionServers in the Apache HBase Master UI to determine the hostname for the node beingdecommissioned.

BatakBatak TobaBateriBejaBelarusianBembaBenaBengaliBhiliBhojpuriBikolBiniBislamaBlinBodoBomuBosnianBrajBretonBubeBugineseBuhidBulgarianBuluBuriatBurmeseBushi

C

CaddoCantoneseCarianCaribCatalanCayugaCebaara SenoufoCebuanoCentral Atlas TamazightCentral Huasteca NahuatlCentral MazahuaCentral OkinawanChadian ArabicChakmaChamorroChechenCherokeeCheyenneChhattisgarhiChigaChineseChinook JargonChipewyanChoctawChukotChurch SlavicChuukeseChuvashClassical MandaicColognianComorianCongo SwahiliCopticCornishCorsicanCreeCreekCrimean TurkishCroatianCzech

D

DakotaDanDangaura TharuDanishDargwaDariDazagaDelawareDinkaDivehiDogriDogribDomariDualaDunganDutchDyulaDzongkha

E

Eastern ChamEastern FrisianEastern GurungEastern Huasteca NahuatlEastern KayahEastern LawaEastern MagarEastern TamangEfikEkajukEmbuEnglishErzyaEsperantoEstonianEtruscanEvenkiEweEwondo

F

FangFantiFaroeseFijianFilipinoFinnishFonFrenchFriulianFulah

G

GaGagauzGalicianGandaGarhwaliGaroGayoGbayaGeezGeorgianGermanGhomala

GilberteseGondiGorontaloGothicGreboGreekGroningsGuajajáraGuaraniGuianese Creole FrenchGujaratiGujariGusiiGwichin

H

HadothiHaidaHaitianHanunooHausaHawaiianHebrewHereroHiligaynonHindiHiri MotuHittiteHmongHoHopiHungarianHupa

I

IbanIbibioIcelandicIgboIlokoInari SamiIndonesianIndus KohistaniIngushInterlinguaInuktitutInupiaqIrishItalian

J

JapaneseJavaneseJenaama BozoJjuJola-FonyiJudeo-ArabicJudeo-PersianJumli

K

KabardianKabuverdianuKabyleKachchiKachi Koli

KachinKaingangKakoKalaallisutKalangaKalenjinKalmykKalo Finnish RomaniKambaKanaujiKanembuKannadaKanuriKara-KalpakKarachay-BalkarKarelianKashmiriKashubianKathoriya TharuKazakhKerinciKlngaxo BozoKhakasKhamtiKhantyKhasiKhmerKhmuKhowarKikuyuKimbunduKinyarwandaKita ManinkakanKochila TharuKomKomeringKomiKomi-PermyakKongoKonkaniKoreanKoroKoro WachiKoryakKosraeanKoyra ChiiniKoyraboro SenniKpelleKrioKuanyamaKumykKurdishKurukhKutenaiKuyKwasioKyrgyz

L

LadinoLahndaLakLakiLakotaLambaLambadiLangiLaoLarge Flowery MiaoLatinLatvian

LepchaLezghianLimbuLimburgishLingalaLisuLiterary ChineseLithuanianLombardLow GermanLower SorbianLoziLüLuba-KatangaLuba-LuluaLuisenoLule SamiLundaLuoLushootseedLuxembourgishLuyiaLycianLydian

M

MabaMacedonianMachameMadureseMafaMagahiMaguindanaonMaithiliMakasarMakhuwa-MeettoMakondeMalagasyMalayMalayalamMalteseManchuMandarMandingoManipuriMansiManxManyikaMaoriMapucheMarathiMariMarshalleseMarwariMasaiMbereMbungaMedumbaMendeMeroiticMeruMeta'MicmacMinangkabauMirandeseMizoMohawkMokshaMonMongoMongolian

MontagnaisMoose CreeMorisyenMossiMundaMundangMundariMyene

N

N’KoNamaNanaiNaskapiNauruNavajoNaxiNdongaNeapolitanNegeri Sembilan MalayNenetsNepaliNewariNgajuNgambayNgiemboonNgombaNiasNigerian PidginNiueanNogaiNorth NdebeleNorth SlaveyNortheastern ThaiNorthern East CreeNorthern FrisianNorthern SamiNorthern SothoNorthern ThaiNorwegianNorwegian BokmålNorwegian NynorskNuerNyamweziNyanjaNyankoleNyasa TongaNyoroNzima

O

OccitanOjibwaOld IrishOld NorseOld PersianOld TurkishOriyaOromoOsageOscanOssetic

P

PahlaviPalauanPaliPampanga

PangasinanPapiamentoParkari KoliParsi-DariParthianPashtoPersianPhoenicianPlains CreePohnpeianPökootPolishPortuguesePrussianPunjabiPunu

Q

Quechua

R

RajasthaniRajbanshiRana TharuRangpuriRapanuiRarotonganRejangRéunion Creole FrenchRiang (India)Rinconada BikolRomanianRomanshRomanyRomboRongaRundiRussianRusynRwa

S

SabaeanSafalibaSahoSakhaSamaritanSamaritan AramaicSamburuSamoanSandaweSangirSangoSanguSanskritSantaliSardinianSasakSaurashtraScotsScottish GaelicSekiSelkupSenaSenecaSerbianSerbo-CroatianSerer

ShambalaShanSherpaShonaShorSichuan YiSicilianSidamoSiksikaSindhiSinhalaSinte RomaniSirmauriSkolt SamiSlaveSlovakSlovenianSogaSomaliSoninkeSoraSorani KurdishSouth NdebeleSouthern AltaiSouthern East CreeSouthern HindkoSouthern KurdishSouthern LuriSouthern SamiSouthern SothoSouthwestern TamangSpanishSranan TongoStandard Moroccan TamazightSukumaSundaneseSusuSwahiliSwampy CreeSwatiSwedishSwiss GermanSylhetiSyriac

T

TabassaranTachelhitTae'TagalogTagbanwaTahitianTai DamTai NüaTaitaTajikTamashekTamilTarokoTasawaqTatarTausugTavringer RomaniTeluguTerenoTesoTetumThaiThulungTibetanTigre

TigrinyaTimneTivTlingitTok PisinTokelauTolakiTomo Kan DogonTonganTooroTornedalen FinnishTshanglaTsimshianTsongaTswanaTuluTumbukaTurkishTurkmenTuroyoTuvaluTuvinianTwiTyap

U

Uab MetoUdiheUdmurtUgariticUkrainianUlithianUmbrianUmbunduUnknown LanguageUpper SorbianUrduUyghurUzbekVaiVendaVietnameseVirgin Islands Creole EnglishVolapükVoticVunjo

W

Wadiyara KoliWalloonWalserWarayWashoWelshWestern ChamWestern FrisianWestern GurungWestern Huasteca NahuatlWestern KayahWestern LawaWestern MagarWestern MariWestern TamangWolayttaWolof

X

Xaasongaxango

XavánteXhosa

Y

YangbenYaoYapeseYembaYiddishYorubaYucateco

Z

ZapotecZarmaZazaZeeuwsZenagaZhuangZuluZuni

guide to mapr-db - industry's next generation data ...€¦ · mapr's implementation of...

Documents