guide to mapr-db - industry's next generation data ...€¦ · mapr's implementation of...
TRANSCRIPT
1. M7 - Native Storage for MapR Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.1 Setting Up MapR-FS to Use Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.1 Mapping Table Namespace Between Apache HBase Tables and MapR Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.1.2 Working With MapR Tables and Column Families . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.1.2.1 Bulk Loading and MapR-DB Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.1.2.2 Schema Design for MapR Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141.1.2.3 Supported Regular Expressions in MapR Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.2 MapR Table Support for Apache HBase API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181.3 Using AsyncHBase with MapR Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.3.1 Using OpenTSDB with AsyncHBase and MapR Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241.4 Protecting Table Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251.5 Displaying Table Region Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271.6 Integrating Hive and MapR Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281.7 Migrating Between Apache HBase Tables and MapR Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331.8 Language Support for MapR Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
M7 - Native Storage for MapR TablesStarting in version 3.0, the MapR distribution for Hadoop integrates native tables stored directly in MapR-FS.
This page contains the following topics:
About MapR TablesMapR-FS Handles Structured and Unstructured DataBenefits of Integrated Tables in MapR-FSThe MapR Implementation of HBaseEffects of Decoupling API and ArchitectureThe HBase Data Model
Using MapR and Apache HBase Tables TogetherCurrent LimitationsAdministering MapR TablesRelated Topics
About MapR TablesIn the 3.0 release of the MapR distribution for Hadoop, MapR-FS enables you to create and manipulate tables in many of the same ways that youcreate and manipulate files in a standard UNIX file system. This document discusses how to set up your MapR installation to use MapR tables.For users experienced with standard Apache HBase, this document describes the differences in capabilities and behavior between MapR tablesand Apache HBase tables.
MapR-FS Handles Structured and Unstructured DataThe 3.0 release of the MapR distribution for Hadoop features a unified architecture for files and tables, providing distributed data replication forstructured and unstructured data. Tables enable you to manage data, as opposed to the unstructured data management provided bystructuredfiles. The structure for structured data management is defined by a , a set of rules that defines the relationships in the structure.data model
By design, the data model for tables in MapR focuses on columns, similar to the open-source standard Apache HBase system. Like ApacheHBase, MapR tables store data structured as a nested sequence of key/value pairs, where the value in one pair serves as the key for anotherpair. With a properly licensed MapR installation, you can use MapR tables exclusively or work in a mixedApache HBase works with MapR tables.environment with Apache HBase tables.
MapR tables are implemented directly within MapR-FS, yielding a familiar, open-standards API that provides a high-performance datastore fortables. MapR-FS is written in C and optimized for performance. As a result, MapR-FS runs significantly faster than JVM-based Apache HBase.The diagram below compares the application stacks for different HBase implementations.
Benefits of Integrated Tables in MapR-FSThe MapR cluster architecture provides the following benefits for table storage, providing an enterprise-grade HBase environment.
MapR clusters with HA features recover instantly from node failures.MapR provides a unified namespace for tables and files, allowing users to group tables in directories by user, project, or any other usefulgrouping.Tables are stored in volumes on the cluster alongside unstructured files. Storage policy settings for apply to tables as well asvolumesfiles.Volume mirrors and snapshots provide flexible, reliable read-only access.Table storage and MapReduce jobs can co-exist on the same nodes without degrading cluster performance.The use of MapR tables imposes no administrative overhead beyond administration of the MapR cluster.Node upgrades and other administrative tasks do not cause downtime for table storage.
The MapR Implementation of HBaseMapR's implementation supports the core HBase API. Programmers who are used to writing code for the HBase API will have immediate, intuitiveaccess to MapR tables. MapR delivers faithfully on the original vision for Google's BigTable paper, using the open-standard HBase API.
MapR's implementation of the HBase API provides enterprise-grade high availability (HA), data protection, and disaster recovery features fortables on a distributed Hadoop cluster. MapR tables can be used as the underlying key-value store for Hive, or any other application requiring ahigh-performance, high-availability key-value datastore. Because MapR uses the open-standard HBase API, many legacy HBase applications can
.continue to run on MapR without modification
MapR has extended to work with MapR tables in addition to Apache HBase tables. Similar to development for Apache HBase, thehbase shellsimplest way to create tables and column families in MapR-FS, and put and get data from them, is to use . MapR tables can behbase shellcreated from the (MCS) user interface or from the Linux , without the need to coordinate with a databaseMapR Control System command lineadministrator. You can treat a MapR table just as you would a file, specifying a path to a location in a directory, and the table appears in the samenamespace as your regular files. You can also create and manage for your table from the MCS or directly from the command line.column families
During or other specific scenarios where you need to refer to a MapR table of the same name as an Apache HBase table in thedata migrationsame cluster, you can to enable that operation.map the table namespace
To summarize:
The MapR table API works with the core HBase API.MapR tables implement the HBase feature set.You can use MapR tables as the datastore for Hive applications.
Effects of Decoupling API and ArchitectureThe following features of MapR tables result from decoupling the HBase API from the Apache HBase architecture:
MapR's High Availability (HA) cluster architecture eliminates the RegionServer component of traditional Apache HBase architecture,which was a single point of failure and bottleneck for scalability. In MapR-FS, MapR tables are HA at all levels, similar to other serviceson a MapR cluster.MapR tables can have up to 64 column families, with no limit on number of columns.MapR-FS manages how data is laid out on disk without large variations in user-visible latency. MapR table regions are split as neededand deleted data is purged.Crash recovery is significantly faster than Apache HBase.
The HBase Data ModelApache HBase stores structured data as a nested series of maps. Each map consists of a set of key-value pairs, where the value can be the keyin another map. Keys are kept in strict lexicographical order: 1, 10, and 113 come before 2, 20, and 213.
In descending order of granularity, the elements of an HBase entry are:
Key: Keys define the rows in an HBase table.Column family: A column family is a key associated with a set of columns. Specify this association according to your individual use case,creating sets of columns. A column family can contain an arbitrary number of columns. MapR tables support up to 64 column families.Column: Columns are keys that are associated with a series of timestamps that define when the value in that column was updated.Timestamp: The timestamp in a column specifies a particular data write to that column.
The Apache HBase API exposes many low-level administrative functions that can be tuned for performance or reliability. The reliabilityand functionality of MapR tables renders these low-level functions moot, and these low-level calls are not supported for MapR tables. S
for detailed information.ee MapR Table Support for Apache HBase API
Value: The data written to that column at the specific timestamp.
This structure results in versioned values that you can access flexibly and quickly. Because Apache HBase and MapR tables are , any ofsparsethe column values for a given key can be null.
Example HBase Table
This example uses JSON notation for representational clarity. In this example, timestamps are arbitrarily assigned.Expand this section to see the JSON code sample.
{ "arbitraryFirstKey" : { "firstColumnFamily" : { "firstColumn" : { 10 : "valueFive", 7 : "valueThree", 4 : "valueOne", } "secondColumn" : { 16 : "valueEight", 1 : "valueSeven", } } "secondColumnFamily" : { "firstColumn" : { 37 : "valueFive", 23 : "valueThree", 11 : "valueSeven", 4 : "valueOne", } "secondColumn" : { 15 : "valueEight", } } } "arbitrarySecondKey" : { "firstColumnFamily" : { "firstColumn" : { 10 : "valueFive", 4 : "valueOne", } "secondColumn" : { 16 : "valueEight", 7 : "valueThree", 1 : "valueSeven", } } "secondColumnFamily" : { "firstColumn" : { 23 : "valueThree", 11 : "valueSeven", } } }}
HBase queries return the most recent timestamp by default. A query for the value in "arbitrarySecondKey"/"secondColumnFamily:firstColumn"returns . Specifying a timestamp with a query for "arbitrarySecondKey"/"secondColumnFamily:firstColumn"/11 returns .valueThree valueSeven
Using MapR and Apache HBase Tables TogetherMapR table storage is independent from Apache HBase table storage, enabling a single MapR cluster to run both systems. Users typically runboth systems concurrently, particularly during the migration phase. Alternately, you can leave Apache HBase running for existing applications,and use MapR tables for new applications. You can set up for your cluster to run both MapR tables and Apache HBasenamespace mappingstables concurrently, during or on an ongoing basis.migration
Current LimitationsCustom HBase filters are not supported.User permissions for column families are not supported. User permissions for tables and columns are supported.HBase authentication is not supported.HBase replication is handled with .Mirror VolumesBulk loads using the HFiles workaround are not supported and not necessary.HBase coprocessors are not supported.Filters use a different regular expression library from . See java.util.regex.Pattern Supported Regular Expressions in MapR
for a complete list of supported regular expressions.Tables
Administering MapR TablesThe and the provide a compact set of features for . In a traditionalMapR Control System command-line interface adding and managing tablesHBase environment, cluster administrators are typically involved in provisioning tables and column families, because of limitations on the numberof tables and column families that Apache HBase can support. MapR supports a virtually unlimited number of tables with up to 64 column families,reducing administrative overhead.
HBase programmers can use the API function calls to create as many tables and column families as needed for the particular application.Programmers can also use tables to store intermediate data in a multi-stage MapReduce program, then delete the tables without assistance froman administrator. See for more information.Working With MapR Tables and Column Families
Related TopicsSetting Up MapR-FS to Use TablesWorking With MapR Tables and Column FamiliesMapping Table Namespace Between Apache HBase Tables and MapR TablesProtecting Table DataMigrating Between Apache HBase Tables and MapR Tables
Setting Up MapR-FS to Use Tables
This page describes how to begin using tables natively with MapR-FS. This page contains the following topics:
InstallationEnabling Access to MapR Tables via HBase APIs, hbase shell, and MapReduce JobsMapR Tables and Apache HBase Tables on the Same Cluster
Set Up User Directories for MapR TablesConfiguring Maximum Row Sizes for MapR Tables
Maximum Row Sizes for HBase APIsTroubleshooting RPC Errors Related to Row Size
Related Topics
InstallationAs of version 3.0 of the MapR distribution, MapR-FS provides storage for structured table data. No additional installation steps are required toinstall table capabilities. However, you must after you've completed the to enable table features.apply an appropriate license installation process
You can also set up a to connect to your MapR cluster and access table.client-only node
Before using MapR Tables, verify that the MapR File System has at least 4GB of memory assigned. Edit the value of the service.co property in the file to at least 4GB.mmand.mfs.heapsize.min warden.conf
Enabling Access to MapR Tables via HBase APIs, , and MapReduce Jobshbase shell
You can use the HBase API and the command to access your MapR tables. MapR has extended the HBase component to handlehbase shellaccess to both MapR tables and Apache HBase tables. MapR tables do not support low-level HBase API calls that are used to manipulate thestate of an Apache HBase cluster. See the page for a full list of supported HBase API and shellMapR Table Support for Apache HBase APIcommands.
To enable the HBase API and access, install the package on every node in the cluster. The HBase component ofhbase shell mapr-hbasethe MapR distribution for Hadoop is typically installed under . /opt/mapr/hbase To ensure that your existing HBase applications and workflowwork properly, install the mapr-hbase package that provides the same version number of HBase as your existing Apache HBase.
See for information about MapR installation procedures, including the proper repositories.Installing MapR Software setting up
MapR Tables and Apache HBase Tables on the Same Cluster
Apache HBase can run on MapR's distribution of Hadoop, and users can store table data in both Apache HBase tables as well as MapR tablesconcurrently. Apache HBase and MapR store table data separately. However, the same mechanisms (HBase APIs and ) are usedhbase shellto access data in both systems. On clusters that run Apache HBase on top of MapR, you can set up a to specify whether anamespace mappinggiven table identifier maps to a MapR table or an Apache HBase table.
Set Up User Directories for MapR TablesBecause MapR tables, like files, are created by users, MapR tracks table activity in a user's home directory on the cluster. Create a homedirectory at on your cluster for each user that will access MapR tables. After the cluster on NFS, create these/user/<username> mountingdirectories with the standard Linux command in the cluster's directory structure.mkdir
When a user does not have a corresponding directory on the cluster, querying MapR for a list of tables that belong to that userfoo /user/foogenerates an error reporting the missing directory.
Configuring Maximum Row Sizes for MapR TablesMapR tables support rows up to 2 GB in size. Rows in excess of 100MB may show decreased performance. The default maximum row size atinstallation is 16MB. You can configure this maximum by changing the value of the mfs.db.max.rowsize.kb value with the maprcli config
command, as in the following command:save
maprcli config save -values {"mfs.db.max.rowsize.kb":<valu }e in KB>
To view the current setting, use the following command:
maprcli -json | grep mfs.db.max.rowsconfig load ize.kb
Maximum Row Sizes for HBase APIs
The following table lists the maximum row sizes supported by specific HBase APIs:
API Maximum Size Comment
put() 2GB
get() 2GB
scan() 2GB
checkAndPut() 16MB The maximum is due to a limitation in protobuf.
The version of HBase provided by MapR has been modified to work with MapR tables in addition to Apache HBase. Do not downloadand install stock Apache HBase on a MapR cluster that uses MapR tables.
If you use fat JARs to deploy your application as a single JAR including all dependencies, be aware that the fat JAR may containversions of HBase that override the installed MapR versions, leading to problems. Check your fat JARs for the presence of stock HBaseto prevent this problem.
append() 16MB The maximum is due to a limitation in protobuf.
increment() 16MB The maximum is due to a limitation in protobuf.
Troubleshooting RPC Errors Related to Row Size
Sending data in excess of , which is specified by the value of the parameter, results in an maxdbrowsize mfs.db.max.rowsize.kb E error. MapR-FS logs an error message with the keyword, the current row size, and the maximum supported row size.2BIG largerow
Requesting data in excess of , which is specified by the value of the parameter, results in anmaxdbrowsize mfs.db.max.rowsize.kb error. MapR-FS logs an error message with the keyword, the current row size, and the maximum supported row size.EFBIG largerow
Sending or requesting data in excess of the RPC payload size (2GB) or protobuf-limited size (16MB) while the value of imaxdbrowsizes larger than 2GB results in an error. The client returns information on the current row size and the maximum supported row size.E2BIGSending or requesting data in excess of when the value of is larger than 16MB logs an INFO messagemaxdbrowsize maxdbrowsizeon the server with the keyword.largerowWhen the row size for a spill caused by multiple separate insert operations for the same key exceeds the value of maxdbrowsize,MapR-FS logs a non-fatal error. The insert operations proceed.
Related TopicsMapping Table Namespace Between Apache HBase Tables and MapR TablesProtecting Table Data
Mapping Table Namespace Between Apache HBase Tables and MapR TablesMapR's implementation of the HBase API differentiates between Apache HBase tables and MapR tables, based on the table name. In certaincases, such as migrating code from Apache HBase tables to MapR tables, users need to force the API to access a MapR table, even though thetable name could map to an Apache HBase table. The property allows you to map Apache HBase tablehbase.table.namespace.mappingsnames to MapR tables. This property is typically set in the configuration file
./opt/mapr/hadoop/hadoop-<version>/conf/core-site.xml
In general, if a table name includes a slash ( ), the name is assumed to be a path to a MapR table, because slash is not a valid character for/Apache HBase table names. In the case of "flat" table names without a slash, namespace conflict is possible, and you might need to use tablemappings.
Table Mapping Naming Conventions
A table mapping takes the form , where is the table name to redirect and is the modification made to the name. The value in :name map name map can be a literal string or contain the wildcard. When mapping a name with a wild card, the mapping is treated as a directory. Requests toname *
tables with names that match the wild card are sent to the directory in the mapping.
When mapping a name that is a literal string, you can choose from two different behaviors:
End the mapping with a slash to indicate that this mapping is to a directory. For example, the mapping sendsmytable1:/user/aaa/requests for table to the full path .mytable1 /user/aaa/mytable1End the mapping without a slash, which creates an alias and treats the mapping as a full path. For example, the mapping mytable1:/u
sends requests for table to the full path .ser/aaa mytable1 /user/aaa
Mappings and Table Listing Behaviors
When you use the command without specifying a directory, the command's behavior depends on two factors:list
Whether a table mapping existsWhether Apache HBase is installed and running
Here are three different scenarios and the resulting command behavior for each.list
There is a table mapping for *, as in .*:/tablesIn this case, the command lists the tables in the mapped directory.listThere is no mapping for *, and Apache HBase is installed and running.In this case, the command lists the HBase tables.listThere is no mapping for *, and Apache HBase is not installed or is not running.In this case, the shell will try to connect to an HBase cluster, but will not be able to. After a few seconds, it will give up and fall back tolisting the M7 tables in the user's home directory.
Table Mapping Examples
Example 1: Map all HBase tables to MapR tables in a directory
In this example, any flat table name is treated as a MapR table in the directory .foo /tables_dir/foo
<property> <name>hbase.table.namespace.mappings</name> <value>*:/tables_dir</value></property>
Example 2: Map specific Apache HBase tables to specific MapR tables
In this example, the Apache HBase table name is treated as a MapR table at . The Apache Hbase table namemytable1 /user/aaa/mytable1 is treated as a MapR table at . All other Apache HBase table names are treated as stock Apache HBasemytable2 /user/bbb/mytable2
tables.
<property> <name>hbase.table.namespace.mappings</name> <value>mytable1:/user/aaa/,mytable2:/user/bbb/</value></property>
Example 3: Combination of specific table names and wildcards
Mappings are evaluated in order. In this example, the flat table name is treated as a MapR table at . The flatmytable1 /user/aaa/mytable1table name is treated as a MapR table at . Any other flat table name is treated as a MapR table at mytable2 /user/bbb/mytable2 foo /tabl
.es_dir/foo
<property> <name>hbase.table.namespace.mappings</name> <value>mytable1:/user/aaa/,mytable2:/user/bbb/,*:/tables_dir</value></property>
Working With MapR Tables and Column FamiliesAbout MapR TablesFilesystem Operations
Setting PermissionsRead and WriteMoveRemoveCopy and Recursive/Directory Copy
Developing For MapR Tables
About MapR Tables
The MapR Data Platform stores tables in the same namespace as files. You can move, delete, and set attributes for a table similarly to a file. Allfilesystem operations remain accessible with the command.hadoop fs
You can create MapR tables using the (MCS) and the interface in addition to the normal HBase shell or HBaseMapR Control System maprcliAPI methods.
When creating a MapR table, specify a location in the MDP directory structure in addition to the name of the table. A user can create a MapRtable anywhere on the cluster that the user has write access.
Volume properties, such as replication factor or rack topology, that apply to the specified location also apply to tables stored at that location. Youcan move a table with the Linux command or the command.mv hadoop fs -mv
Administrators may choose to pre-create tables for a project in order to enforce a designated naming convention, or to store tables in a desiredlocation in the cluster.
Because all data stored in a column family is compressed together, encapsulating similar kinds of data within a column family can
The number of tables that can be stored on a MapR cluster is constrained only by the number of open file handles and storage space availability.Each table can have up to 64 column families.
You can add, edit and delete column families in a MapR table with the (MCS) and the interface. You can alsoMapR Control System maprcliadd column families to MapR tables with the HBase shell or API.
When you use Direct Access NFS or the command to access a MapR cluster, tables and files are listed together. Because thehadoop fs -lsclient's Linux commands are not table-aware, other Linux file manipulation commands, notably file read and write commands, are not available forMapR tables.
Some Apache HBase table operations are not applicable or required for MapR tables, notably manual compactions, table enables, and tabledisables. HBase API calls that perform such operations on a MapR table result in the modification being silently ignored. When appropriate, themodification request is cached in the client and returned by API calls to enable legacy HBase applications to run successfully.
In addition, the command displays recently-accessed MapR tables, rather than listing tables across the entire filemaprcli table listrecentsystem.
See for a complete list of supported operations.MapR Table Support for Apache HBase API
Filesystem Operations
This section describes the operations that you can perform on MapR tables through a Linux command line when you access the cluster throughNFS or with the commands.hadoop fs
Setting Permissions
MapR tables do not support setting user permissions through the UNIX command or the analogue. Instead, starting inchmod hadoop fs -cmodversion 3.1 of the MapR distribution for Hadoop, MapR table access is controlled with (ACEs).Access Control Expressions
Read and Write
You cannot perform read or write operations to a MapR table from a Linux filesystem context. Among other things, you cannot use the commcatand to insert text to a table or search through a table with the command. The MapR software returns an error when an application attemptsgrepto read or write to a MapR table.
Move
You can move a MapR table within a volume with the command over NFS or with the command. These moves are subjectmv hadoop fs -mvto the standard . Moves across volumes are not currently supported.permissions restrictions
Remove
You can remove a table with the command over NFS or with the command. These commands remove the table from therm hadoop fs -rmnamespace and asynchronously reclaims the disk space. You can remove a directory that includes both files and tables with the or rm -r hadoo
commands.p fs -rmr
Copy and Recursive/Directory Copy
Table copying at the filesystem level is not supported in this release. See forMigrating Between Apache HBase Tables and MapR Tablesinformation on copying tables using the HBase shell.
Example: Creating a MapR Table
With the HBase shell
This example creates a table called in directory with a column family called , using system defaults. In thisdevelopment /user/foo stageexample, we first start the HBase shell from the command line with , and then use the command to create the table.hbase shell create
improve compression.
1. 2. 3. 4.
$ hbase shellHBase Shell; enter 'help<RETURN>' for list of supported commands.Type "exit<RETURN>" to leave the HBase ShellVersion 0.94.1-SNAPSHOT, rUnknown, Mon Dec 17 09:23:31 PST 2012
hbase(main):001:0>hbase(main):001:0> create '/user/foo/development', 'stage'
With the MapR Control System
In the MCS pane under the group, click . The tab appears in the main window.Navigation MapR Data Platform Tables TablesClick the button.New TableType a complete path for the new table.Click . The MCS displays a tab for the new table.OK
The screen-capture below demonstrates the creation of a table in location .table01 /user/analysis/tables/
With the MapR CLI
Use the command at a command line. For details, type at a command line.maprcli table create maprcli table create -helpThe following example demonstrates creation of a table in cluster location . The cluster table02 /user/analysis/tables/ my.cluster
is mounted at ..com /mnt/mapr/
$ maprcli table create -path /user/analysis/tables/table02$ ls -l /mnt/mapr/my.cluster.com/user/analysis/tableslrwxr-xr-x 1 mapr mapr 2 Oct 24 16:14 table01 -> mapr::table::2056.62.17034lrwxr-xr-x 1 mapr mapr 2 Oct 24 16:13 table02 -> mapr::table::2056.56.17022$ maprcli table listrecentpath/user/analytics/tables/table01/user/analytics/tables/table02
Example: Adding a column family
With the HBase shell
This example adds a column family called to the table , using system defaults. In this example, we first start thestatus development
1. 2.
3. 4. 5. 6.
HBase shell from the command line with , and then use the command to add the column family.hbase shell alter
$ hbase shellHBase Shell; enter 'help<RETURN>' for list of supported commands.Type "exit<RETURN>" to leave the HBase ShellVersion 0.94.1-SNAPSHOT, rUnknown, Mon Dec 17 09:23:31 PST 2012
hbase(main):001:0>hbase(main):001:0> alter '/user/foo/development', {NAME => 'status'}
With the MapR Control System
In the MCS pane under the group, click . The tab appears in the main window.Navigation MapR Data Platform Tables TablesFind the table you want to work with, using one of the following methods.
Scan for the table under on the tab.Recently Opened Tables TablesEnter a regular expression for part of the table pathname in the field and click .Go to table Go
Click the desired table name. A tab appears in the main MCS pane, displaying information for the specific table.TableClick the tab.Column FamiliesClick . The dialog appears.New Column Family Create Column FamilyEnter values for the following fields:
Column Family Name - Required.Max Versions - The maximum number of versions of a cell to keep in the table.Min Versions - The minimum number of versions of a cell to keep in the table.Compression - The compression algorithm used on the column family's data. Select a value from the drop-down. Thedefault value is , which uses the same compression type as the table. Available compression methods are LZF,InheritedLZ4, and ZLib. Select to disable compression.OFFTime-To-Live - The minimum time-to-live for cells in this column family. Cells older than their time-to-live stamp are purgedperiodically.In memory - Preference for a column family to reside in memory for fast lookup.
You can change any column family properties at a later time using the MCS or from the command line.maprcli table cf edit
The screen-capture below demonstrates the creation of a column family to table at location userinfo /user/analysis/tables/table0.1
With the MapR CLI
Use the command at a command line. For details see or type maprcli table cf create table cf create maprcli table cf at a command line. The following example demonstrates addition of a column family named in table create -help casedata /user/ana
, using lz4 compression, and keeping a maximum of 5 versions of cells in the column family.lysis/tables/table01
1. 2.
$ maprcli table cf create -path /user/analysis/tables/table01 \ -cfname casedata -compression lzf -maxversions 5$ maprcli table cf list -path /user/analysis/tables/table01inmemory cfname compression ttl maxversions minversionstrue userinfo lz4 0 3 0false casedata lzf 0 5 0$
You can change any column family properties at a later time using the command.maprcli table cf edit
Developing For MapR Tables
When you use to manage your application development, look for the following section in your POM file:Maven
<dependency> <groupId>com.mapr.fs</groupId> <artifactId>mapr-hbase</artifactId> <version>1.0.3-mapr-3.0.2</version> <scope>provided</scope></dependency>
Add the following section to your POM file immediately following the above section:
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase</artifactId>
<version>0.94.12-mapr-1310-m7-3.0.2</version>
<scope>provided</scope>
</dependency>
This section makes the calls use the correct version of the implementation.ClassLoader
Bulk Loading and MapR-DB Tables
The most common way of loading data to a MapR table is with a put operation. At large scales, bulk loads offer a performance advantage over putoperations.
Bulk loading can be performed as a full bulk load or as an incremental bulk load. Full bulk loads offer the best performance advantage for emptytables. Incremental bulk loads can add data to existing tables concurrently with other table operations, with better performance than put operations.
Bulk Load Process Flow
Once your source data is in the MapR-FS layer, bulk loading uses a MapReduce job to perform the following steps:
Transform the source data into the native file format used by MapR tables.Notify the database of the location of the resulting files.
A full bulk load operation can only be performed to an empty table and skips the write-ahead log (WAL) typical of Apache HBase and MapR tableoperations, resulting in increased performance. Incremental bulk load operations do use the WAL.
Creating a MapR Table with Full Bulk Load Support
When you create a new MapR table with the maprcli table create command, specify the value of the -bulkload parameter as true.
When you create a new MapR table from the hbase shell, specify BULKLOAD as true, as in the following example:
create '/a0','f1', BULKLOAD => 'true'
When you create a new MapR table from the MapR Control System (MCS), check the Bulk Load box under Table Properties.
Performing Bulk Load Operations
Note: You can only perform a full bulk load to empty tables that have the bulk load attribute set. You can only set this attribute during tablecreation. The alter operation will not set this attribute to true on an existing table.Warning: Your table is unavailable for normal client operations, including put, get, and scan operations, while a full bulk load operation is inprogress. To keep your table available for client operations, use an incremental bulk load.
Note: Attempting a full bulk load to a table that does not have the bulk load attribute set will result in an incremental bulk load being performedinstead.
You can use incremental bulk loads to ingest large amounts of data to an existing table. Tables remain available for standard client operationssuch as put, get, and scan while the bulk load is in process. A table can perform multiple incremental bulk load operations simultaneously.
Bulk loading is supported for the following tools, which can be used for both full or incremental bulk load operations:
The CopyTable tool uses a MapReduce job to copy a MapR table.hbase com.mapr.fs.hbase.mapreduce.CopyTable -src /table1 -dst /table2The CopytableTest tool copies a MapR table without using MapReduce.hbase com.mapr.fs.CopyTableTest -src /table1 -dst /table2The ImportTsv tool imports a tab-separated values file into a MapR table.importtsv-Dimporttsv.columns=HBASE_ROW_KEY,CF-1:custkey,CF-1:orderstatus,CF-1:totalprice,CF-1:orderdate,CF-1:orderpriority -Dimporttsv.separator='|' -Dimporttsv.bulk.output=/dummy /table1 /ordersThe ImportFiles tool imports HFile or Result files into a MapR table.hbase com.mapr.fs.hbase.mapreduce.ImportFiles -Dmapred.reduce.tasks=2 -inputDir /test/tabler.kv-table /table2 -format ResultCustom MapReduce jobs can use bulk loads with the configureIncrementalLoad() method from the HFileOutputFormat class.HTable table = new HTable(jobConf, tableName);HFileOutputFormat.configureIncrementalLoad(mrJob, table);
After completing a full bulk load operation, take the table out of bulk load mode to restore normal client operations. You can do this from thecommand line or the HBase shell with the following commands:# maprcli table edit -path /user/juser/mytable -bulkload false (command line)hbase shell> alter '/user/juser/mytable', 'f2', BULKLOAD => 'false' (hbase shell)
Writing Custom MapReduce Jobs Using Bulk Loads
The class on MapR clusters distinguishes between Apache HBase tables and MapR tables, behaving appropriately for eachHFileOutputFormattype. Existing workflows that rely on the tools, support both types of tables withoutHFileOutputFormat class, such as the importtsv and copytablefurther configuration.
Writing a Bulk Load MapReduce Job for a Pre-split Table
The MapReduce job for bulk loading to a pre-split table has a number of reducers that is determined by the number of splits in the table. You canset up the partitions for the reducers to match the table regions with the HFileOutputFormat.configureIncrementalLoad() API, as withApache HBase. The following sample code shows how to set up these partitions:
HTable table = new HTable(conf, tableName);job.setReducerClass(KeyValueSortReducer.class);job.setMapOutputKeyClass(ImmutableBytesWritable.class);job.setMapOutputValueClass(KeyValue.class);HFileOutputFormat.configureIncrementalLoad(job, table);
Writing a Bulk Load MapReduce Job for an Auto-split Table
To set up reducer partitions for an automatically split table, where the number of regions is unknown, there are two approaches:
Create a partition file and define a starting key for each reducer. With this approach, an even distribution of load on the reducers dependson the qualities of the input data.Use an input sampler to create the partition sampler. The input sampler randomly walks through the input data and generates an optimalsplit based on the samples.
Example 1: Using a Partition File
job.setPartitionerClass(TotalOrderPartitioner.class);Path partitionFile = new Path(job.getWorkingDirectory(), "partitions");TotalOrderPartitioner.setPartitionFile(conf, partitionFile);// User method to write the split keys to partition filewritePartitions(conf, partitionFile, startKeys);job.setOutputFormatClass(HFileOutputFormat.class);HFileOutputFormat.configureMapRTablePath(job, dstTableName);
Example 2: Using an Input Sampler to Create the Partition File
TotalOrderPartitioner.setPartitionFile(conf, partitionFile);// Setup a partitioner with sampling.InputSampler.Sampler<ImmutableBytesWritable, KeyValue> sampler = newInputSampler.SplitSampler<ImmutableBytesWritable, KeyValue>(1000);// Let the input sampler write the data to partition fileInputSampler.writePartitionFile(job, sampler);job.setOutputFormatClass(HFileOutputFormat.class);HFileOutputFormat.configureMapRTablePath(job, tableName);
Schema Design for MapR Tables
Your database schema defines the data in your tables and how they are related. The choices you make when specifying how to arrange your datain keys and columns, and how those columns are grouped in families, can have a significant effect on query performance.
Row key designComposite keys
Column family designColumn design
Schemas for MapR tables follow the same general principles as schemas for standard Apache HBase tables, with one important difference.Because MapR tables can use up to 64 column families, you can make more extensive use of the advantages of column families:
Segregate related data into column families for more efficient queriesOptimize column-family specific parameters to tune for performanceGroup related data for more efficient compression
Naming your identifiers: Because the names for the row key, column family, and column identifiers are associated with every value in a table,these identifiers are replicated potentially billions of times across your tables. Keeping these names short can have a significant effect on yourtables' storage demands.
Access times for MapR tables are fastest when a single record is looked up based on the full row key. Partial scans within a column family aremore demanding on the cluster's resources. A full-table scan is the least efficient way to retrieve data from your table.
Row key design
Because records in Apache HBase tables are stored in lexicographical order, using a sequential generation method for row keys can lead to a hot problem. As new rows are created, the table splits. Since the new records are still being created sequentially, all the new entries are stillspot
directed to a single node until the next split, and so on. In addition to concentrating activity on a single region, all the other splits remain at halftheir maximum size.
With MapR tables, the cluster handles sequential keys and table splits to keep potential hotspots moving across nodes, decreasing the intensityand performance impact of the hotspot.
To spread write and insert activity across the cluster, you can randomize sequentially generated keys by hashing the keys, inverting the byteorder. Note that these strategies come with trade-offs. Hashing keys, for example, makes table scans for key subranges inefficient, since thesubrange is spread across the cluster.
Instead of hashing the entire key, you can the key by prepending a few bytes of the hash to the actual key. For a key based on a timestamp,saltfor instance, a timestamp value of has an MD5 hash that ends with . By making the key for that row , you1364248490 ffe5 ffe51364248490avoid hotspotting. Since the first four digits are known to be the hash salt, you can derive the original timestamp by dropping those digits.
Be aware that a row key is immutable once created, and cannot be renamed. To change a row key's name, the original row must be deleted andthen re-created with the new name.
Composite keys
Rows in a MapR table can only have a single row key. You can create to approximate multiple keys in a table. A composite keycomposite keyscontains several individual fields joined together, for example and . You can then scan for the specific segments of theuserID applicationIDcomposite row key that represent the original, individual field.
Because rows are stored in sorted order, you can affect the results of the sort by changing the ordering of the fields that make up the compositerow key. For example, if your application IDs are generated sequentially but your user IDs are not, using a composite key of userID+applicationIDwill store all rows with the same user ID closely together. If you know the userID for which you want to retrieve rows, you can specify the first userI
row and the first row as the start and stop rows for your scan, then retrieve the rows you're interested in without scanning the entireD userID+1table.
When designing a composite key, consider how the data will be queried during production use. Place the fields that will be queried the most oftentowards the front of the composite key, bearing in mind that sequential keys will generate hot spotting.
Column family design
Scanning an entire table for matches can be very performance-intensive. enable you to group related sets of data and restrictColumn familiesqueries to a defined subset, leading to better performance. You can also make a specified column family remain in memory to further increase thespeed at which the system accesses that data. When you design a column family, think about what kinds of queries are going to be used the mostoften, and group your columns accordingly.
You can specify compression settings for individual column families, which lets you choose the settings that prioritize speed of access or efficientuse of disk space, according to your needs.
Be aware of the approximate number of rows in your column families. This property is called the column family's . When column familiescardinalityin the same table have very disparate cardinalities, the sparser table's data can be spread out across multiple nodes, due to the denser tablerequiring more splits. Scans on the sparser column family can take longer due to this effect. For example, consider a table that lists productsacross a small range of numbers, but with a row for the unique serial numbers for each individual product manufactured within a givenmodelmodel. Such a table will have a very large difference in cardinality between a column family that relates to the model number compared to a
column family that relates to the serial number. Scans on the model-number column family will have to range across the cluster, since thefrequent splits required by the comparatively large numbers of serial-number rows will spread the model-number rows out across many regions onmany nodes.
Column design
MapR tables split at the row level, not the column level. For this reason, extremely wide tables with very large numbers of columns can sometimesreach the recommended size for a table split at a comparatively small number of rows.
Because MapR tables are , you can add columns to a table at any time. Null columns for a given row don't take up any storage space.sparse
Supported Regular Expressions in MapR Tables
MapR tables support the regular expressions provided by the , as well as a subset of the completePerl- Compatible Regular Expressions libraryset of regular expressions supported in . For more information on Perl compatible regular expressions, issue the java.util.regex.Pattern m
command from a terminal prompt.an pcrepattern
Applications for Apache HBase that use regular expressions not supported in MapR tables will need to be rewritten to use supported regularexpressions.
The tables in the following sections define the subset of Java regular expressions supported in MapR tables.
Characters
Pattern Description
x The character x
\\ The backslash character
\0n The character with octal value 0n (0 <= n <= 7)
\0nn The character with octal value 0nn (0 <= n <= 7)
\xhh The character with hexadecimal value 0xhh
\t The tab character ('\u0009')
\n The newline (line feed) character ('\u000A')
\r The carriage-return character ('\u000D')
\f The form-feed character ('\u000C')
\a The alert (bell) character ('\u0007')
\e The escape character ('\u001B')
\cx The control character corresponding to x
Character Classes
Pattern Description
[abc] a, b, or c (simple class)
[Supported Regular Expressions in MapR Tables^abc] Any character except a, b, or c (negation)
[a-zA-Z] a through z or A through Z, inclusive (range)
Predefined Character Classes
Pattern Description
. Any character (may or may not match line terminators)
\d A digit: [0-9]
\D A non-digit: [Supported Regular Expressions in MapR Tables^0-9]
In general, design your schema to prioritize more rows and fewer columns.
\s A whitespace character: [ \t\n\x0B\f\r]
\S A non-whitespace character: [Supported Regular Expressions in MapR Tables^\s]
\w A word character: [a-zA-Z_0-9]
\W A non-word character: [Supported Regular Expressions in MapR Tables^\w]
Classes for Unicode Blocks and Categories
Pattern Description
\p{Lu} An uppercase letter (simple category)
\p{Sc} A currency symbol
Boundaries
Pattern Description
^ The beginning of a line
$ The end of a line
\b A word boundary
\B A non-word boundary
\A The beginning of the input
\G The end of the previous match
\Z The end of the input but for the final terminator, if any
\z The end of the input
Greedy Quantifiers
Pattern Description
X? X, once or not at all
X* X, zero or more times
X+ X, one or more times
X{n} X, exactly n times
X{n,} X, at least n times
X{n,m} X, at least n but not more than m times
Reluctant Quantifiers
Pattern Description
X?? X, once or not at all
X*? X, zero or more times
X+? X, one or more times
X{n}? X, exactly n times
X{n,}? X, at least n times
X{n,m}? X, at least n but not more than m times
Possessive Quantifiers
Pattern Description
X?+ X, once or not at all
X*+ X, zero or more times
X++ X, one or more times
X{n}+ X, exactly n times
X{n,}+ X, at least n times
X{n,m}+ X, at least n but not more than m times
Logical Operators
Pattern Description
XY X followed by Y
X|Y Either X or Y
(X) X, as a capturing group
Back References
Pattern Description
\n Whatever the nth capturing group matches
Quotation
Pattern Description
\ Nothing, but quotes the following character
\Q Nothing, but quotes all characters until \E
\E Nothing, but ends quoting started by \Q
Special Constructs
Pattern Description
(?:X) X, as a non-capturing group
(?=X) X, via zero-width positive lookahead
(?!X) X, via zero-width negative lookahead
(?<=X) X, via zero-width positive lookbehind
(?<!X) X, via zero-width negative lookbehind
(?>X) X, as an independent, non-capturing group
MapR Table Support for Apache HBase APIThis page lists the supported interfaces for accessing MapR tables. This page contains the following topics:
Compatibility with the Apache HBase APIMapR Tables and Filters
HBase Shell Commands
Compatibility with the Apache HBase API
The API for accessing MapR tables works the same way as the Apache HBase API. Code written for Apache HBase can be easily ported to useMapR tables.
MapR tables do not support low-level HBase API calls that are used to manipulate the state of an Apache HBase cluster. HBase API calls that arenot supported by MapR tables report successful completion to allow legacy code written for Apache HBase to continue executing, but do notperform any actual operations.
For details on the behavior of each function, refer to the .Apache HBase API documentation
HBaseAdmin API Available for MapRTables?
Comments
void addColumn(String tableName,HColumnDescriptor column)
Yes
void close() Yes
void createTable()(HTableDescriptor desc,byte[][] splitKeys)
Yes This call is synchronous.
void createTableAsync() (HTableDescriptordesc, byte[][] splitKeys)
Yes For MapR tables, this call is identical to createTable.
void deleteColumn (byte[] family, byte[]qualifier, long timestamp)
Yes
void deleteTable(String tableName) Yes
HTableDescriptor[] deleteTables(Patternpattern)
Yes
Configuration getConfiguration() Yes
HTableDescriptor getTableDescriptor (byte[]tableName)
Yes
HTableDescriptor[] getTableDescriptors(List<String> tableNames)
Yes
boolean isTableAvailable(String tableName) Yes
boolean isTableDisabled(String tableName) Yes
boolean isTableEnabled(String tableName) Yes
HTableDescriptor[] listTables() Yes
void modifyColumn(String tableName,HColumnDescriptor descriptor)
Yes
void modifyTable (byte[] tableName,HTableDescriptor htd)
No
boolean tableExists(String tableName) Yes
Pair<Integer, Integer> getAlterStatus (byte[] tableName)
Yes
CompactionState getCompactionState(StringtableNameOrRegionName)
Yes Returns .CompactionState.NONE
void (byte[] tableNameOrRegionName)split Yes The parameter has atableNameOrRegionNamedifferent format when used with MapR tables than withApache HBase tables. With MapR Tables, specify boththe table path and the FID as a comma-separated list.
void abort(String why, Throwable e) No
void assign (byte[] regionName) No
boolean balancer() No
boolean balanceSwitch(boolean b) No
void closeRegion(ServerName sn, HRegionInfohri)
No
void closeRegion(String regionname, StringserverName)
No
boolean closeRegionWithEncodedRegionName(String encodedRegionName, String serverName)
No
void flush(String tableNameOrRegionName) No
ClusterStatus getClusterStatus() No
HConnection getConnection() No
HMasterInterface getMaster() No
String[] getMasterCoprocessors() No
boolean isAborted() No
boolean isMasterRunning() No
void majorCompact(StringtableNameOrRegionName)
No
void (byte[] encodedRegionName, byte[]movedestServerName)
No
byte[][] rollHLogWriter(String serverName) No
boolean setBalancerRunning(boolean on,boolean synchronous)
No
void shutdown() No
void stopMaster() No
void stopRegionServer(String hostnamePort) No
void unassign (byte[] regionName, booleanforce)
No
HTable API Available for MapRTables?
Comments
Configuration and State Management
void clearRegionCache() No Operation is silently ignored.
void close() Yes
<T extends CoprocessorProtocol, R>Map<byte[], R> coprocessorExec(Class<T>protocol, byte[] startKey, byte[] endKey,Call<T, R> callable)
No Returns .null
<T extends CoprocessorProtocol> TcoprocessorProxy(Class<T> protocol, byte[]row)
No Returns .null
Map<HRegionInfo, HServerAddress>deserializeRegionInfo(DataInput in)
Yes
void flushCommits() Yes
Configuration getConfiguration() Yes
HConnection getConnection() No Returns null
int getOperationTimeout() No Returns null
ExecutorService [getPool() No Returns null
int getScannerCaching() No Returns 0
ArrayList<Put> getWriteBuffer() No Returns null
long getWriteBufferSize() No Returns 0
boolean isAutoFlush() Yes
void prewarmRegionCache(Map<HRegionInfo,HServerAddress> regionMap)
No Operation is silently ignored.
void serializeRegionInfo(DataOutput out) Yes
void setAutoFlush(boolean autoFlush,boolean clearBufferOnFail)
Same assetAutoFlush(booleanautoFlush)
void setAutoFlush(boolean autoFlush) Yes
void setFlushOnRead(boolean val) Yes
boolean shouldFlushOnRead() Yes
void setOperationTimeout(intoperationTimeout)
No Operation is silently ignored.
void setScannerCaching(int scannerCaching) No Operation is silently ignored.
void setWriteBufferSize(longwriteBufferSize)
No Operation is silently ignored.
Atomic operations
Result append(Append append) Yes
boolean (byte[] row, byte[]checkAndDeletefamily, byte[] qualifier, byte[] value,Delete delete)
Yes
boolean (byte[] row, byte[]checkAndPutfamily, byte[] qualifier, byte[] value, Putput)
Yes
Result increment(Increment increment) Yes
long (byte[] row,incrementColumnValuebyte[] family, byte[] qualifier, longamount, boolean writeToWAL)
Yes
long (byte[] row,incrementColumnValuebyte[] family, byte[] qualifier, longamount)
Yes
void (RowMutations rm) mutateRow Yes
DML operations
void (List actions, Object[] results) batch Yes
Object[] batch(List<? extends Row> actions) Yes
void delete(Delete delete) Yes
void delete(List<Delete> deletes) Yes
boolean exists(Get get) Yes
Result get(Get get) Yes
Result[] get(List<Get> gets) Yes
Result (byte[] row, byte[]getRowOrBeforefamily)
No
ResultScanner (...) getScanner Yes
void put(Put put) Yes
void put(List<Put> puts) Yes
Table Schema Information
HRegionLocation (byte[]getRegionLocationrow, boolean reload)
Yes
Map<HRegionInfo, HServerAddress>getRegionsInfo()
Yes
List<HRegionLocation> (bytgetRegionsInRangee[] startKey, byte[] endKey)
Yes
byte[][] getEndKeys() Yes
byte[][] getStartKeys() Yes
Pair<byte[][], byte[][]> getStartEndKeys() Yes
HTableDescriptor getTableDescriptor() Yes
byte[] getTableName() Yes Returns table path
Row Locks
RowLock lockRow(byte[] row) No
void unlockRow(RowLock rl) No
HTablePool API Available for MapRTables?
Comments
()close Yes
closeTablePool(byte[] tableName) Yes
(String tableName)closeTablePool Yes
protected HTableInterface (StricreateHTableng tableName)
Yes
int (String tableName)getCurrentPoolSize Yes
HTableInterface (byte[] tableName)getTable Yes
HTableInterface (String tableName)getTable Yes
void (HTableInterface table)putTable Yes
MapR Tables and FiltersMapR tables support the following built-in filters. These filters work identically to their Apache HBase versions.
Filter Description
ColumnCountGetFilter Returns the first N columns of a row.
ColumnPaginationFilter
ColumnPrefixFilter
ColumnRangeFilter
CompareFilter
FirstKeyOnlyFilter
FuzzyRowFilter
InclusiveStopFilter
KeyOnlyFilter
MultipleColumnPrefixFilter
PageFilter
PrefixFilter
RandomRowFilter
SingleColumnValueFilter
SkipFilter
TimestampsFilter
WhileMatchFilter
FilterList
RegexStringComparator
ColumnRangeFilter
MultipleColumnPrefixFilter
HBase Shell Commands
The following table lists support information for HBase shell commands for managing MapR tables.
Command Available for MapRTables?
Comments
alter Yes
alter_async Yes
create Yes
describe Yes
disable Yes
drop Yes
enable Yes
exists Yes
is_disabled Yes
is_enabled Yes
list Yes
disable_all Yes
drop_all No Obsolete. Use the command from the MapR filesystem or rm <table names> hadoop fs instead.-rm <table names>
enable_all Yes
show_filters Yes
count Yes
get Yes
put Yes
scan Yes
delete Yes
deleteall Yes
incr Yes
truncate Yes
get_counter Yes
assign No
balance_switch No
balancer No
close_region No
major_compact No
move No
unassign No
zk_dump No
status No
version Yes
whoami Yes
Using AsyncHBase with MapR TablesYou can use the to provide asynchronous access to MapR tables. MapR provides a of AsyncHBase modified toAsyncHBase libraries versionwork with MapR tables. Once your cluster is ready to use MapR tables, it is also ready to use AsyncHBase with MapR tables.
After installing the package, the AsyncHBase JAR file is in the directory mapr-asynchbase asynchbase-1.4.1-mapr.jar /opt/mapr/had. Add that directory to your Java .oop/hadoop-0.20.2/lib CLASSPATH
See also
Documentation for AsyncHBase client
Using OpenTSDB with AsyncHBase and MapR TablesThe software package provides a time series database that collects user-specified data. OpenTSDB Because OpenTSDB depends on
Download the OpenTSDBAsyncHBase, MapR provides a customized version of OpenTSDB that works with AsyncHBase for MapR tables.source from the repository instead of the standard [email protected]:mapr/opentsdb.git github.com/OpenTSDB/opentsdb.gitn.
Nodes using OpenTSDB must have the following packages installed:
One of or mapr-core mapr-clientmapr-hbase-<version). To ensure that your existing HBase applications and workflow work properly, install the packagmapr-hbasee that provides the same version number of HBase as your existing Apache HBase.
You can follow the directions at OpenTSDB's page, changing to Getting Started git clone git://github.com/OpenTSDB/opentsdb.git.git clone git://github.com/mapr/opentsdb.git
After running the script, replace the contents of the file with thebuild.sh /opt/mapr/hadoop/hadoop-0.20.2/conf/core-site.xmlcontents of the on all nodes. Set up a table to specify the full path to the and /opentsdb/core-site.xml.template mapping tsdb tsdb-ui
tables. Create the directories in that path before running the script.d create_table.sh
See also
Documentation for OpenTSDB
Protecting Table DataThis page discusses how to organize tables and files on a MapR cluster by making effective use of directories and volumes. To learn aboutaccess control and authorization for MapR tables on version 3.1 and later of the MapR distribution for Hadoop, see Enabling Table Authorization
. This page contains the following topics:with Access Control Expressions
Organizing Tables and Files in DirectoriesControlling Table Storage Policy with VolumesMirrors and Snapshots for MapR TablesComparison to Apache HBase Running on a MapR ClusterRelated Topics
Organizing Tables and Files in DirectoriesBecause the 3.0 release of the MapR distribution for Hadoop mingles unstructured files with structured tables in a directory structure, you cangroup logically-related files and tables together. For example, tables related to a project housed in directory can be saved in a/user/foosubdirectory, such as ./user/foo/tables
Listing the contents of a directory with lists both tables and files stored at that path. Because table data is not structured as a simple characterlsstream, you cannot operate on table data with common Linux commands such as , , and . See for morecat more > Filesystem Operationsinformation on Linux file system operations with MapR tables.
Example: Creating a MapR table in a directory using the HBase shell
In this example, we create a new table in directory on a MapR cluster that already contains a mix of files and tables. In thistable3 /user/daveexample, the MapR cluster is mounted at ./maprcluster/
$ pwd/maprcluster/user/dave
$ lsfile1 file2 table1 table2
$ hbase shellhbase(main):003:0> create '/user/dave/table3', 'cf1', 'cf2', 'cf3'0 row(s) in 0.1570 seconds
$ lsfile1 file2 table1 table2 table3
$ hadoop fs -ls /user/daveFound 5 items-rw-r--r-- 3 mapr mapr 16 2012-09-28 08:34 /user/dave/file1-rw-r--r-- 3 mapr mapr 22 2012-09-28 08:34 /user/dave/file2trwxr-xr-x 3 mapr mapr 2 2012-09-28 08:32 /user/dave/table1trwxr-xr-x 3 mapr mapr 2 2012-09-28 08:33 /user/dave/table2trwxr-xr-x 3 mapr mapr 2 2012-09-28 08:38 /user/dave/table3
Note that in the listing, table items are denoted by a bit.hadoop fs -ls t
Controlling Table Storage Policy with VolumesMapR provides volumes as a way to organize data and manage cluster performance. A is a logical unit that allows you to apply policies tovolumea set of files, directories, and tables. Volumes are used to enforce disk usage limits, set replication levels, define snapshots and mirrors, andestablish ownership and accountability.
Because MapR tables are stored in volumes, these same storage policy controls apply to table data. As an example, the diagram below depicts aMapR cluster storing table and file data. The cluster has three separate volumes mounted at directories , , and /user/john /user/dave /proj
. As shown, each directory contains both file data and table data, grouped together logically. Because each of these directories maps toect/adsa different volume, data in each directory can have different policy. For example, has a disk-usage quota, while is on/user/john /user/dave
a snapshot schedule. Furthermore, two directories, and are mirrored to locations outside the cluster, providing/user/john /project/adsread-only access to high-traffic data, including the tables in those volumes.
Example: Restricting table storage with quotas and physical topology
This example creates a table with disk usage quota of 100GB restricted to certain data nodes in the cluster. First we create a volume named pro, specifying the quota and restricting storage to nodes in the topology, and mounting it in the localject-tables-vol /data/rack1
namespace. Next we use the HBase shell to create a new table named , specifying a path inside the volumedatastore project-tables-vol.
$ pwd /mapr/cluster1/user/project
$lsbin src
$ maprcli volume create -name project-tables-vol -path /user/project/tables \ -quota 100G -topology /data/rack1
$ lsbin src tables
$ hbase shellHBase Shell; enter 'help<RETURN>' for list of supported commands.Type "exit<RETURN>" to leave the HBase ShellVersion 0.94.1-SNAPSHOT, rUnknown, Thu Oct 25 09:28:51 PDT 2012hbase(main):001:0> create '/user/project/tables/datastore', 'colfamily1'0 row(s) in 0.5180 secondshbase(main):002:0> exit
$ ls -l tablestotal 1lrwxr-xr-x 1 mapr mapr 2 Oct 25 15:20 datastore -> mapr::table::2252.32.16498
1.
2.
Mirrors and Snapshots for MapR TablesBecause MapR tables are stored in volumes, you can take advantage of MapR , , and , whichSchedules Mirror Volumes Working with Snapshotsoperate at the volume level. Mirrors and snapshots are read-only copies of specific volumes on the cluster, which can be used to provision fordisaster recovery and improved access time for high-bandwidth data. To access tables in snapshots or mirrors, HBase programs access a tablepath in a mirror or snapshot volume.
You can set policy for volumes using the or the commands. For details, see .MapR Control System maprcli Managing Data with Volumes
Comparison to Apache HBase Running on a MapR ClusterPrior to MapR version 3.0, the only option for HBase users was to run Apache HBase on top of the MapR cluster. For the purposes of illustration,this section contrasts how running Apache HBase on a MapR cluster differs from the integrated tables in MapR.
As shown in the diagram below, installing Apache HBase on a MapR cluster involves storing all HBase components in a single volume mapped todirectory in the cluster. Compared to the MapR implementation shown above, this method has the following differences:/hbase
Tables are stored in a flat namespace, not grouped logically with related files.Because all Apache HBase data resides in one volume, only one set of storage policies can be applied to the entire Apache HBasedatastore.Mirrors and snapshots of the HBase volume do not provide functional replication of the datastore. Despite this limitation, mirrors can beused to backup HLogs and HFiles in order to provide a recovery point for Apache HBase data.
Related TopicsManaging Data with VolumesWorking With MapR Tables and Column Families Enabling Table Authorization with Access Control Expressions
Displaying Table Region InformationMapR tables are split into regions on an ongoing basis. Administrators and developers do not need to manage these regions or restructure dataon disk when data is added and deleted. These operations happen automatically. You can list region information for tables to get a sense of thesize and location of table data on the MapR cluster.
Examining Table Region Information in the MapR Control System
In the MCS pane under the group, click . The tab appears in the main window.Navigation MapR Data Platform Tables Tables
2.
3. 4.
1. 2.
Find the table you want to work with, using one of the following methods.Scan for the table under on the tab.Recently Opened Tables TablesEnter the table pathname in the field and click .Go to table Go
Click the desired table name. A tab appears in the main MCS pane, displaying information for the specific table.TableClick the tab. The tab displays region information for the table.Regions Regions
Listing Table Region Information at the Command Line
Use the command:maprcli table region
$ maprcli table region list -path <path to table>sk sn ek pn lhb-INFINITY hostname1, hostname2 INFINITY hostname3 0
Integrating Hive and MapR TablesYou can create MapR tables from Hive that can be accessed by both Hive and MapR. With this functionality, you can run Hive queries on MapRtables. You can also convert existing MapR tables into Hive-MapR tables, running Hive queries on those tables as well.
Install and Configure HiveConfigure the the hive-site.xml File
Getting Started with Hive-MapR IntegrationCreate a Hive table with two columns:Start the HBase shell:
Zookeeper Connections
Install and Configure Hive
Install and configure Hive if it is not already installed.Execute the command and ensure that all relevant Hadoop, MapR, and Zookeeper processes are running.jps
Example:
$ jps21985 HRegionServer1549 jenkins.war15051 QuorumPeerMain30935 Jps15551 CommandServer15698 HMaster15293 JobTracker15328 TaskTracker15131 WardenMain
Configure the the Filehive-site.xml
1. Open the file with your favorite editor, or create a file if it doesn't already exist:hive-site.xml hive-site.xml
$ cd $HIVE_HOME$ vi conf/hive-site.xml
2. Copy the following XML code and paste it into the file.hive-site.xml
Note: If you already have an existing file with a element block, just copy the element block codehive-site.xml configuration propertybelow and paste it inside the element block in the file. Be sure to use the correct values for the paths to yourconfiguration hive-site.xmlauxiliary JARs and ZooKeeper IP numbers.
Example configuration:
<configuration>
<property> <name>hive.aux.jars.path</name> <value>file:///opt/mapr/hive/hive-0.10.0/lib/hive-hbase-handler-0.10.0-mapr.jar,file:///opt/mapr/hbase/hbase-0.94.5/hbase-0.94.5-mapr.jar,file:///opt/mapr/zookeeper/zookeeper-3.4.5/zookeeper-3.4.5.jar</value> <description>A comma separated list (with no spaces) of the jar files required forHive-HBase integration</description></property>
<property> <name>hbase.zookeeper.quorum</name> <value>xx.xx.x.xxx,xx.xx.x.xxx,xx.xx.x.xxx</value> <description>A comma separated list (with no spaces) of the IP addresses of allZooKeeper servers in the cluster.</description></property>
<property> <name>hbase.zookeeper.property.clientPort</name> <value>5181</value> <description>The Zookeeper client port. The MapR default clientPort is5181.</description></property>
</configuration>
3. Save and close the file.hive-site.xml
If you have successfully completed all of the steps in this section, you're ready to begin the tutorial in the next section.
Getting Started with Hive-MapR Integration
In this tutorial we will:
Create a Hive tablePopulate the Hive table with data from a text fileQuery the Hive tableCreate a Hive-MapR tableIntrospect the Hive-MapR table from the HBase shellPopulate the Hive-MapR table with data from the Hive tableQuery the Hive-MapR table from HiveConvert an existing MapR table into a Hive-MapR table
Be sure that you have successfully completed all of the steps in the and sectionsInstall and Configure Hive Setting Up MapR-FS to Use Tablesbefore beginning this Getting Started tutorial.
This Getting Started tutorial is based on the section of the Apache Hive Wiki, and thanks to Samuel Guo and otherHive-HBase Integrationcontributors to that effort. If you are familiar with their approach to Hive-HBase integration, you should be immediately comfortable with thismaterial.
However, please note that there are some significant differences in this Getting Started section, especially in regards to configuration andcommand parameters or the lack thereof. Follow the instructions in this Getting Started tutorial to the letter so you can have an enjoyable andsuccessful experience.
Create a Hive table with two columns:
Change to your Hive installation directory if you're not already there and start Hive:
$ cd $HIVE_HOME$ bin/hive
Execute the CREATE TABLE command to create the Hive pokes table:
hive> CREATE TABLE pokes (foo INT, bar STRING);
To see if the pokes table has been created successfully, execute the SHOW TABLES command:
hive> SHOW TABLES;OKpokesTime taken: 0.74 seconds
The table appears in the list of tables. pokes
Populate the Hive pokes table with data
The file is provided in the directory. Execute the LOAD DATA LOCAL INPATH command to populatekv1.txt $HIVE_HOME/examples/filesthe Hive table with data from the file.pokes kv1.txt
hive> LOAD DATA LOCAL INPATH './examples/files/kv1.txt' OVERWRITE INTO TABLE pokes;
A message appears confirming that the table was created successfully, and the Hive prompt reappears:
Copying data from file:...OKTime taken: 0.278 secondshive>
Execute a SELECT query on the Hive pokes table:
hive> SELECT * FROM pokes WHERE foo = 98;
The SELECT statement executes, runs a MapReduce job, and prints the job output:
OK98 val_9898 val_98Time taken: 18.059 seconds
The output of the SELECT command displays two identical rows because there are two identical rows in the Hive table with a key of 98.pokes
To create a Hive-MapR table, enter these four lines of code at the Hive prompt:
hive> CREATE TABLE mapr_table_1(key int, value string) > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' > WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val") > TBLPROPERTIES ("hbase.table.name" = "/user/mapr/xyz");
After a brief delay, a message appears confirming that the table was created successfully:
OKTime taken: 5.195 seconds
Note: The TBLPROPERTIES command is not required, but those new to Hive-MapR integration may find it easier to understand what's going onif Hive and MapR use different names for the same table.
In this example, Hive will recognize this table as "mapr_table_1" and MapR will recognize this table as "xyz".
Start the HBase shell:
Keeping the Hive terminal session open, start a new terminal session for HBase, then start the HBase shell:
$ cd $HBASE_HOME$ bin/hbase shellHBase Shell; enter 'help<RETURN>' for list of supported commands.Type "exit<RETURN>" to leave the HBase ShellVersion 0.90.4, rUnknown, Wed Nov 9 17:35:00 PST 2011
hbase(main):001:0>
Execute the list command to see a list of HBase tables:
hbase(main):001:0> listTABLE/user/mapr/xyz1 row(s) in 0.8260 seconds
HBase recognizes the Hive-MapR table named in directory . This is the same table known to Hive as .xyz /user/mapr mapr_table_1
Display the description of the /user/mapr/xyz table in the HBase shell:
Hive tables can have multiple identical keys. As we will see shortly, MapR tables cannot have multiple identical keys, only unique keys.
hbase(main):004:0> describe "/user/mapr/xyz"DESCRIPTION ENABLED {NAME => '/user/mapr/xyz', FAMILIES => [{NAME => 'cf1', DATA_B true LOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REP LICATION_SCOPE => '0', VERSIONS => '3', MIN_VERSION S => '0', TTL => '2147483647', KEEP_DELETED_CELLS = > 'false', BLOCKSIZE => '65536', IN_MEMORY => 'fals e', ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'} ]}1 row(s) in 0.0240 seconds
From the Hive prompt, insert data from the Hive table pokes into the Hive-MapR table mapr_table_1:
hive> INSERT OVERWRITE TABLE mapr_table_1 SELECT * FROM pokes WHERE foo=98;...2 Rows loaded to mapr_table_1OKTime taken: 13.384 seconds
Query mapr_table_1 to see the data we have inserted into the Hive-MapR table:
hive> SELECT * FROM mapr_table_1;OK98 val_98Time taken: 0.56 seconds
Even though we loaded two rows from the Hive table that had the same key of 98, only one row was actually inserted into pokes mapr_table_1. This is because is a MapR table, and although Hive tables support duplicate keys, MapR tables only support unique keys.mapr_table_1MapR tables arbitrarily retain only one key, and silently discard all of the data associated with duplicate keys.
Convert a pre-existing MapR table to a Hive-MapR table
To convert a pre-existing MapR table to a Hive-MapR table, enter the following four commands at the Hive prompt.
Note that in this example the existing MapR table is in directory .mapr_table_2 /user/mapr
hive> CREATE EXTERNAL TABLE mapr_table_2(key int, value string) > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' > WITH SERDEPROPERTIES ("hbase.columns.mapping" = "cf1:val") > TBLPROPERTIES("hbase.table.name" = "/user/mapr/my_mapr_table");
Now we can run a Hive query against the pre-existing MapR table that Hive sees as :/user/mapr/my_mapr_table mapr_table_2
hive> SELECT * FROM mapr_table_2 WHERE key > 400 AND key < 410;Total MapReduce jobs = 1Launching Job 1 out of 1Number of reduce tasks is set to 0 since there's no reduce operator...OK401 val_401402 val_402403 val_403404 val_404406 val_406407 val_407409 val_409Time taken: 9.452 seconds
Zookeeper Connections
If you see a similar error message to the following, ensure that and hbase.zookeeper.quorum hbase.zookeeper.property.clientPortare properly defined in the file.$HIVE_HOME/conf/hive-site.xml
Failed with exceptionjava.io.IOException:org.apache.hadoop.hbase.ZooKeeperConnectionException:HBase is able to connect to ZooKeeper but the connection closes immediately. Thiscould be asign that the server has too many connections (30 is the default). Consider inspectingyourZK server logs for that error and then make sure you are reusing HBaseConfiguration asoften asyou can. See HTable's javadoc for more information.
Migrating Between Apache HBase Tables and MapR TablesMapR tables can be parsed by the ( ). You can use theApache CopyTable tool org.apache.hadoop.hbase.mapreduce.CopyTableCopyTable tool to migrate data from an Apache HBase table to a MapR table or from a MapR table to an Apache HBase table.
Before You Start
Before migrating your tables to another platform, consider the following points:
Schema Changes. Apache HBase and MapR tables have different limits on the number of column families. If you are migrating to MapR,you may be interested in changing your table's to take advantage of the increased availability of column families. Conversely, ifschemayou're migrating from MapR tables to Apache, you may need to adjust your schema to reflect the reduced availability of column families.API Mappings: If you are migrating from Apache HBase to MapR tables, examine your current HBase applications to verify the APIs andHBase shell commands used are fully .supportedNamespace Mapping: If the migration will take place over a period of time, be sure to plan your in advancetable namespace mappingsto ease the transition.Implementation Limitations: MapR tables do not support HBase coprocessors. If your existing Apache HBase installation usescoprocessors, plan any necessary modifications in advance. MapR tables support a of the regular expressions supported insubsetApache HBase. Check your existing workflow and HBase applications to verify you are not using unsupported regular expressions.
If you are migrating MapR tables, be sure to change your Apache HBase client to the MapR client by installing the version of the to mapr-hbasepackage that matches the version of Apache HBase on your source cluster.
See for information about MapR installation procedures, including the proper repositories.Installing MapR Software setting up
Compression Mappings
MapR tables support the LZ4, LZF, and ZLIB compression algorithms.
1.
1.
2.
3.
1.
When you create a MapR table with the Apache HBase API or the HBase shell and specify the LZ4, LZO, or SNAPPY compression algorithms,the resulting MapR table uses the LZ4 compression algorithm.
When you describe an MapR table schema through the HBase API, the LZ4 and OLDLZF compression algorithms map to the LZ4 compressionalgorithm.
Copying Data
Launch the CopyTable tool with the following command, specifying the full destination path of the table with the parameter:--new.name
hbase org.apache.hadoop.hbase.mapreduce.CopyTable-Dhbase.zookeeper.quorum=<ZooKeeper IP Address>-Dhbase.zookeeper.property.clientPort=2181 --new.name=/user/john/foo/mytable01
Example: Migrating an Apache HBase table to a MapR table
This example migrates the existing Apache HBase table to the MapR table .mytable01 /user/john/foo/mytable01
On the node in the MapR cluster where you will launch the CopyTable tool, modify the value of the property in the hbase.zookeeper.quorum h file to point at a ZooKeeper node in the source cluster. Alternately, you can specify the value for the base-site.xml hbase.zookeeper.quor
property from the command line. This example specifies the value in the command line.um
Create the destination table. This example uses the HBase shell. The and (MCS) are also viable methods.CLI MapR Control System
[user@host] hbase shellHBase Shell; enter 'help<RETURN>' for list of supported commands.Type "exit<RETURN>" to leave the HBase ShellVersion 0.94.3-SNAPSHOT, rUnknown, Thu Mar 7 10:15:47 PST 2013
hbase(main):001:0> create '/user/john/foo/mytable01', 'usernames', 'userpath'0 row(s) in 0.2040 seconds
Exit the HBase shell.
hbase(main):002:0> exit[user@host]
From the HBase command line, use the CopyTable tool to migrate data.
[user@host] hbase org.apache.hadoop.hbase.mapreduce.CopyTable--Dhbase.zookeeper.quorum=zknode1,zknode2,zknode3--new.name=/user/john/foo/mytable01 mytable01
Verifying Migration
After copying data to the new tables, verify that the migration is complete and successful. In increasing order of complexity:
Verify that the destination table exists. From the HBase shell, use the command, or use the command fromlist ls /user/john/fooa Linux prompt:
The CopyTable tool launches a MapReduce job. The nodes on your cluster must have the correct version of the packagemapr-hbaseinstalled. To ensure that your existing HBase applications and workflow work properly, install the mapr-hbase package that provides thesame version number of HBase as your existing Apache HBase.
1.
2.
3.
1. 2.
3.
4.
5.
6.
7. 8.
1.
2. 3.
hbase(main):006:0> list '/user/john/foo'TABLE/user/john/foo/mytable011 row(s) in 0.0770 seconds
Check the number of rows in the source table against the destination table with the command:count
hbase(main):005:0> count '/user/john/foo/mytable01'30 row(s) in 0.1240 seconds
Hash each table, then compare the hashes.
Decommissioning the Source
After verifying a successful migration, you can decommission the source nodes where the tables were originally stored.
Decommissioning a MapR Node
Before you start, drain the node of data by moving the node to the physical . All the data on a node in the /decommissioned topology /decommi topology is migrated to volumes and nodes in the topology.ssioned /data
Run the following command to check if a given volume is present on the node:
maprcli dump volumenodes -volumename <volume> -json | grep <ip:port>
Run this command for each non-local volume in your cluster to verify that the node being decommissioned is not storing any volume data.
Change to the root user (or use sudo for the following commands).Stop the Warden:service mapr-warden stopIf ZooKeeper is installed on the node, stop it:service mapr-zookeeper stopDetermine which MapR packages are installed on the node:
dpkg --list | grep mapr (Ubuntu)rpm -qa | grep mapr (Red Hat or CentOS)
Remove the packages by issuing the appropriate command for the operating system, followed by the list of services. Examples:apt-get purge mapr-core mapr-cldb mapr-fileserver (Ubuntu)yum erase mapr-core mapr-cldb mapr-fileserver (Red Hat or CentOS)
Remove the directory to remove any instances of , , , and left behind by the package/opt/mapr hostid hostname zkdata zookeepermanager.Remove any MapR cores in the directory./opt/coresIf the node you have decommissioned is a CLDB node or a ZooKeeper node, then run on all other nodes in the clusterconfigure.sh(see ).Configuring the Node
Decommissioning Apache HBase Nodes
To decommission nodes running Apache HBase, follow these steps for each node:
From the HBase shell, disable the Region Load Balancer by setting the value of to :balance_switch false
hbase(main):001:0> balance_switch false
Leave the HBase shell by typing .exitRun the script to stop the HBase RegionServer:graceful stop
3.
[user@host] ./bin/graceful_stop.sh <hostname>
Language Support for MapR TablesMapR tables can store, retrieve, and process data in the following languages:
A
AbazaAbkhazianAchineseAcoliAdangmeAdygheAfarAfrikaansAghemAinuAkanAkkadianAkooseAlbanianAleutAmharicAmoAncient EgyptianAncient GreekAngikaArabicAragoneseAramaicArapahoArawakArmenianAromanianAssameseAssyrian Neo-AramaicAsturianAsuAtikamekwAtsamAvaricAvestanAwadhiAymaraAzerbaijani
B
BadagaBafiaBafutBagheliBalineseBalkan Gagauz TurkishBaltiBaluchiBambaraBamunBantawaBasaaBashkirBasque
The script does not look up the hostname for an IP number. Do not pass an IP number to the script.graceful_stop.shCheck the list of RegionServers in the Apache HBase Master UI to determine the hostname for the node beingdecommissioned.
BatakBatak TobaBateriBejaBelarusianBembaBenaBengaliBhiliBhojpuriBikolBiniBislamaBlinBodoBomuBosnianBrajBretonBubeBugineseBuhidBulgarianBuluBuriatBurmeseBushi
C
CaddoCantoneseCarianCaribCatalanCayugaCebaara SenoufoCebuanoCentral Atlas TamazightCentral Huasteca NahuatlCentral MazahuaCentral OkinawanChadian ArabicChakmaChamorroChechenCherokeeCheyenneChhattisgarhiChigaChineseChinook JargonChipewyanChoctawChukotChurch SlavicChuukeseChuvashClassical MandaicColognianComorianCongo SwahiliCopticCornishCorsicanCreeCreekCrimean TurkishCroatianCzech
D
DakotaDanDangaura TharuDanishDargwaDariDazagaDelawareDinkaDivehiDogriDogribDomariDualaDunganDutchDyulaDzongkha
E
Eastern ChamEastern FrisianEastern GurungEastern Huasteca NahuatlEastern KayahEastern LawaEastern MagarEastern TamangEfikEkajukEmbuEnglishErzyaEsperantoEstonianEtruscanEvenkiEweEwondo
F
FangFantiFaroeseFijianFilipinoFinnishFonFrenchFriulianFulah
G
GaGagauzGalicianGandaGarhwaliGaroGayoGbayaGeezGeorgianGermanGhomala
GilberteseGondiGorontaloGothicGreboGreekGroningsGuajajáraGuaraniGuianese Creole FrenchGujaratiGujariGusiiGwichin
H
HadothiHaidaHaitianHanunooHausaHawaiianHebrewHereroHiligaynonHindiHiri MotuHittiteHmongHoHopiHungarianHupa
I
IbanIbibioIcelandicIgboIlokoInari SamiIndonesianIndus KohistaniIngushInterlinguaInuktitutInupiaqIrishItalian
J
JapaneseJavaneseJenaama BozoJjuJola-FonyiJudeo-ArabicJudeo-PersianJumli
K
KabardianKabuverdianuKabyleKachchiKachi Koli
KachinKaingangKakoKalaallisutKalangaKalenjinKalmykKalo Finnish RomaniKambaKanaujiKanembuKannadaKanuriKara-KalpakKarachay-BalkarKarelianKashmiriKashubianKathoriya TharuKazakhKerinciKlngaxo BozoKhakasKhamtiKhantyKhasiKhmerKhmuKhowarKikuyuKimbunduKinyarwandaKita ManinkakanKochila TharuKomKomeringKomiKomi-PermyakKongoKonkaniKoreanKoroKoro WachiKoryakKosraeanKoyra ChiiniKoyraboro SenniKpelleKrioKuanyamaKumykKurdishKurukhKutenaiKuyKwasioKyrgyz
L
LadinoLahndaLakLakiLakotaLambaLambadiLangiLaoLarge Flowery MiaoLatinLatvian
LepchaLezghianLimbuLimburgishLingalaLisuLiterary ChineseLithuanianLombardLow GermanLower SorbianLoziLüLuba-KatangaLuba-LuluaLuisenoLule SamiLundaLuoLushootseedLuxembourgishLuyiaLycianLydian
M
MabaMacedonianMachameMadureseMafaMagahiMaguindanaonMaithiliMakasarMakhuwa-MeettoMakondeMalagasyMalayMalayalamMalteseManchuMandarMandingoManipuriMansiManxManyikaMaoriMapucheMarathiMariMarshalleseMarwariMasaiMbereMbungaMedumbaMendeMeroiticMeruMeta'MicmacMinangkabauMirandeseMizoMohawkMokshaMonMongoMongolian
MontagnaisMoose CreeMorisyenMossiMundaMundangMundariMyene
N
N’KoNamaNanaiNaskapiNauruNavajoNaxiNdongaNeapolitanNegeri Sembilan MalayNenetsNepaliNewariNgajuNgambayNgiemboonNgombaNiasNigerian PidginNiueanNogaiNorth NdebeleNorth SlaveyNortheastern ThaiNorthern East CreeNorthern FrisianNorthern SamiNorthern SothoNorthern ThaiNorwegianNorwegian BokmålNorwegian NynorskNuerNyamweziNyanjaNyankoleNyasa TongaNyoroNzima
O
OccitanOjibwaOld IrishOld NorseOld PersianOld TurkishOriyaOromoOsageOscanOssetic
P
PahlaviPalauanPaliPampanga
PangasinanPapiamentoParkari KoliParsi-DariParthianPashtoPersianPhoenicianPlains CreePohnpeianPökootPolishPortuguesePrussianPunjabiPunu
Q
Quechua
R
RajasthaniRajbanshiRana TharuRangpuriRapanuiRarotonganRejangRéunion Creole FrenchRiang (India)Rinconada BikolRomanianRomanshRomanyRomboRongaRundiRussianRusynRwa
S
SabaeanSafalibaSahoSakhaSamaritanSamaritan AramaicSamburuSamoanSandaweSangirSangoSanguSanskritSantaliSardinianSasakSaurashtraScotsScottish GaelicSekiSelkupSenaSenecaSerbianSerbo-CroatianSerer
ShambalaShanSherpaShonaShorSichuan YiSicilianSidamoSiksikaSindhiSinhalaSinte RomaniSirmauriSkolt SamiSlaveSlovakSlovenianSogaSomaliSoninkeSoraSorani KurdishSouth NdebeleSouthern AltaiSouthern East CreeSouthern HindkoSouthern KurdishSouthern LuriSouthern SamiSouthern SothoSouthwestern TamangSpanishSranan TongoStandard Moroccan TamazightSukumaSundaneseSusuSwahiliSwampy CreeSwatiSwedishSwiss GermanSylhetiSyriac
T
TabassaranTachelhitTae'TagalogTagbanwaTahitianTai DamTai NüaTaitaTajikTamashekTamilTarokoTasawaqTatarTausugTavringer RomaniTeluguTerenoTesoTetumThaiThulungTibetanTigre
TigrinyaTimneTivTlingitTok PisinTokelauTolakiTomo Kan DogonTonganTooroTornedalen FinnishTshanglaTsimshianTsongaTswanaTuluTumbukaTurkishTurkmenTuroyoTuvaluTuvinianTwiTyap
U
Uab MetoUdiheUdmurtUgariticUkrainianUlithianUmbrianUmbunduUnknown LanguageUpper SorbianUrduUyghurUzbekVaiVendaVietnameseVirgin Islands Creole EnglishVolapükVoticVunjo
W
Wadiyara KoliWalloonWalserWarayWashoWelshWestern ChamWestern FrisianWestern GurungWestern Huasteca NahuatlWestern KayahWestern LawaWestern MagarWestern MariWestern TamangWolayttaWolof
X
Xaasongaxango