mysql to hbase in 5 steps

13
MySql to HBase in 5 Steps Converting MySql or Oracle databases to Apache HBase with on-line examples using the popular Wordnet ® dictionary Scott Cinnamond – TerraMeta Software Inc. http://cloudgraph.org CloudGraph ®

Upload: scott-cinnamond

Post on 28-Aug-2014

1.203 views

Category:

Technology


1 download

DESCRIPTION

Converting MySql or Oracle databases to Apache HBase with on-line examples using the popular Wordnet dictionary

TRANSCRIPT

Page 1: MySql to HBase in 5 Steps

MySql to HBase in 5 StepsConverting MySql or Oracle databases to Apache HBase™ with on-line examples using the popular Wordnet® dictionary

Scott Cinnamond – TerraMeta Software Inc.http://cloudgraph.org

CloudGraph ®

Page 2: MySql to HBase in 5 Steps

What is Wordnet® ?• Large complex lexical (MySql) database of

English. • Nouns, verbs, adjectives and adverbs

grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept.

• Synsets are interlinked by means of conceptual-semantic and lexical relations.

Page 3: MySql to HBase in 5 Steps

HBase Conversion Stepshttp://wordnet.cloudgraph.org

1) Model Creation: reverse engineer Wordnet DB into UML®

2) Code Generation: provision persistence and query-DSL java code

3) HBase™ Table Mapping: map data graphs and row keys to table(s)

4) Data Migration: MySql to HBase

5) Services / App Creation: build services, web app

Page 4: MySql to HBase in 5 Steps

1.) Model CreationReverse engineer Wordnet DB into PlasmaSDO™ UML® Model

• Capture entities, properties, data types, associations, enumerations, comments as UML

• Why UML? Popular standards-based format. Editable, viewable using standard tools. Supports enterprise governance processes

• How? Maven build with plasma-maven-plugin RDB tool (goal:RDB, action:reverse, dialect:mysql)

• Download working example at https://github.com/cloudgraph/wordnet

Page 5: MySql to HBase in 5 Steps

Generated Wordnet Model(core subset of 30 total entities and enumerations)

Page 6: MySql to HBase in 5 Steps

2.) Code GenerationProvision SDO persistence and query DSL java code

• Generate Java API based on Wordnet UML Model

• Why? Use across RDB, HBase, other CloudGraph Services. Compile time checking for queries, all persistence logic

• How? Maven build with plasma-maven-plugin SDO and DSL tools

• See generated API Javadocs on-line at http://wordnet.cloudgraph.org

Page 7: MySql to HBase in 5 Steps

3.) HBase™ Table MappingMap data graphs and row keys to HBase™ table(s)

• Configure delimited, hashed, salted, formatted, composite row keys with (xpath) paths into target data graphs

• Map data graph roots to HBase tables • Why? Automates row-key creation via data

extraction processing from anywhere in your data graphs

• How? CloudGraph Configuration XML. See https://github.com/cloudgraph/wordnet

Page 8: MySql to HBase in 5 Steps

4.) Data Migration MySql to HBase

• Create RDB-to-HBase standalone migration app using generated persistence and DSL query API incrementally call CloudGraph HBase and RDB services

• Why? Wordnet data is large and highly connected, so must be incrementally extracted/inserted and linked

Page 9: MySql to HBase in 5 Steps

5.) Services / App CreationBuild services, web app

• Build simple pojo services using persistence and DSL query API

• Encapsulate Wordnet business logic• Add adapter/wrapper structures• Call services called from web-app

Page 10: MySql to HBase in 5 Steps

Web Apphttp://wordnet.cloudgraph.org

• Auto-complete field triggers CloudGraph HBase to use the HBase fuzzy row filter API

• Find button returns all semantic and lexical relations for the selected word, including descriptions and example sentences

• Resulting relation graphs typically contain more than 100 nodes and return in less than 200 milliseconds

Page 11: MySql to HBase in 5 Steps

Conclusions• Complex, highly recursive RDB models

can be easily converted and leveraged in HBase and future CloudGraph services

• Large lexical data graphs can be returned in single query

• Data migration difficult given complex recursive model

Page 12: MySql to HBase in 5 Steps

Resources• Download the complete CloudGraph Wordnet

example: https://github.com/cloudgraph/wordnet• Run the example online:

http://wordnet.cloudgraph.org• Project details, contact information:

http://cloudgraph.org• Beta Source Repo:

https://github.com/terrameta/cloudgraph• Production Source Repo (under construction):

https://github.com/cloudgraph

Page 13: MySql to HBase in 5 Steps

Status / Legal

• Project Status– CloudGraph ® is currently under private beta testing

• Licensing– CloudGraph ® 0.5.5 Community Edition (CE) is open source licensed

under version 2 of the GNU General Public License• Trademarks

– WordNet ® is a registered trademark of Princeton University– Apache HBase™ is a trademark of Apache Software Foundation– CloudGraph ® is a trademark of TerraMeta Software LLC, TerraMeta

Software Inc.