hbase meetup @ cask hq 09/25

19
HBase Meetup @ Cask HQ September 25, 2014 CASK DATA APP PLATFORM tigon.io cdap.io coopr.io

Upload: cask-data-inc

Post on 01-Jul-2015

509 views

Category:

Technology


1 download

DESCRIPTION

Cask OSS http://cdap.io, http://coopr.io, http://tigon.io, http://tephra.io

TRANSCRIPT

Page 1: HBase Meetup @ Cask HQ 09/25

HBase Meetup @ Cask HQ September 25, 2014

CASK DATA APP PLATFORM

tigon.io cdap.io coopr.io

Page 2: HBase Meetup @ Cask HQ 09/25

PROPRIETARY & CONFIDENTIAL

HBase Meetup Agenda

• Cask Open Source Project Announcements by Jonathan Gray

• Project CDAP: Cask Data App Platform by Jonathan Gray

• Project Coopr: Cluster Provisioning by Albert Shau

• Project Tigon: RT Streaming on YARN + HBase by Gokul Gunasekaran

• HBase at Flipboard by Sang Chi

• Master Topologies post HBase 1.0 by Mikhail Antonov of WANDisco

Page 3: HBase Meetup @ Cask HQ 09/25

Cask Open Source Project Announcements

100% Apache 2.0 Licensed Software

Page 4: HBase Meetup @ Cask HQ 09/25

PROPRIETARY & CONFIDENTIAL

Simple access to powerful technology

• Continuuity is now Cask

• Same Mission. Same Team. Same Technology. Now Open Source!

• We are an open source software company focused on developers and applications on Hadoop

• We have been building our platform and technologies for 3 years and have released major projects today, everything by the end of the year

• We are committed to building vibrant communities around these projects and will drive towards a true community-driven process

Page 5: HBase Meetup @ Cask HQ 09/25

Cask Newly Launched Projects

PROPRIETARY & CONFIDENTIAL

Virtualization for Hadoop Data and Apps

Real-time streaming for the real world Clusters with a click

CASK DATA APP PLATFORM

tigon.io cdap.io coopr.io

Page 6: HBase Meetup @ Cask HQ 09/25

PROPRIETARY & CONFIDENTIAL

Cask Launch Day Thursday, September 25, 2014

• Website launch at cask.co with lots of documentation and technical content

• Project sites launch at cdap.io / coopr.io / tigon.io / tephra.io

• Reactor released as CDAP v2.5 under ASL2 on GitHub

• Loom released as Coopr v0.9.8 under ASL2 on GitHub

• Tigon dev release as Tigon v0.1.0 (Cask + AT&T) under ASL2 on GitHub

• Hosting HBase Meetup @ Cask HQ w/ sessions on CDAP, Coopr, Tigon

Page 7: HBase Meetup @ Cask HQ 09/25

Cask Data App Platform

cdap.io

Page 8: HBase Meetup @ Cask HQ 09/25

Why virtualize?

PROPRIETARY & CONFIDENTIAL

Runtime Languages •Simpler programming •Portability

Virtual machines and containers •More efficient resource utilization •Reuse

Software defined networks •Adaptability •New applications

Bringing the concepts of virtualization to Hadoop and HBase data and applications augments existing Hadoop-ecosystem open source technologies to enable more use cases to be built by more developers in less time.

•Broader use cases •Faster development •Accelerated disruption of proprietary incumbents

Page 9: HBase Meetup @ Cask HQ 09/25

Cask Data Application Platform

PROPRIETARY & CONFIDENTIAL

Application innovation • Enable a new class of applications to drive greater business

value, including those requiring real-time and batch processing

Simplified development • Simplify big data app development – more apps faster with

less dependence on Hadoop expertise !

Production-ready applications • Avoid compromising operational transparency and control -

security, logging, metrics, lineage, and more

Data Virtualization Logical representations of data

App Virtualization Standardized containers for apps

Page 10: HBase Meetup @ Cask HQ 09/25

App Virtualization

PROPRIETARY & CONFIDENTIAL

What is App Virtualization? • Applications deployed in CDAP containers with runtime services !Features • Framework level guarantees • Applications aren’t required to be idempotent

• Support for development life cycle and operational deployment • Portable from laptop to cluster • Logging, metrics, security with no developer overhead

• Standardization of containers across programming paradigms • Take advantage of Spark, Cascading, Hive, etc. using programmatic

APIs without worrying about system implementation !

Benefits • Developers can build a broader range of apps focusing on business

logic, not core system services • Speed up time from coding to testing to operational deployment • Take advantage of new technology with less need for training and

expertise !

Page 11: HBase Meetup @ Cask HQ 09/25

Data Virtualization

PROPRIETARY & CONFIDENTIAL

What is Data Virtualization? • Logical representations of underlying data in CDAP datasets !Features • Streams for data ingestion • Supports Kafka, Flume, REST, user-defined protocols • Time-stamped and ordered • Horizontally scalable

• Logical representations in commonly used access patterns • Time series, Key value, objects, geospatial index, OLAP cube and more

• Data available to multiple applications • MapReduce, Hive, Spark, Flows and more • REST APIs • Unified batch and real-time processing

!Benefits • Simplify data ingestion and Extract Transform Load (ETL) to accelerate time to

value • Maximize value of data by making it easy to find and easy to explore through

multiple query methods • Protect the data through security, audit, lineage, and reporting !!

Page 12: HBase Meetup @ Cask HQ 09/25

Thank You! !

Questions?

Page 13: HBase Meetup @ Cask HQ 09/25

TIGON

+ =

Real-time Streaming for the Real World

Gokul, Software Engineer, Cask Data HBase Meetup, September 2014

Page 14: HBase Meetup @ Cask HQ 09/25

PROPRIETARY & CONFIDENTIAL

Open-source Distributed Real-time Stream Processing Framework !

Exactly-once processing guarantees

Provides both imperative Java API and SQL-like declarative language for building powerful apps! Built on top of Hadoop YARNTM and Apache HBaseTM

Leverages Twill, an Apache incubator project and CASK’s open source transaction engine - Tephra

Meet Tigon

Page 15: HBase Meetup @ Cask HQ 09/25

PROPRIETARY & CONFIDENTIAL

Tigon Stack

Evolution of Tigon

Flowlet

Flowlet

Flowlet

Flowlet

TigonSQLFlowlet

Events

Tigon Architecture

STANDALONE

Threads

In Memory Queues

DISTRIBUTED

YARN Containers

HBase Tables

Page 16: HBase Meetup @ Cask HQ 09/25

PROPRIETARY & CONFIDENTIAL

Tigon in ActionSample Case : Real-time filter and join of data streams

Flowlet

Flowlet

Flowlet

TigonSQLJoin & Filter

Events(<id>,<name>)

<id,name, age> [name, count]

Events(<id>,<age>)

Page 17: HBase Meetup @ Cask HQ 09/25

PROPRIETARY & CONFIDENTIAL

Developer Preview Release available for download on www.tigon.io Tigon source available on GitHub (www.github.com/caskdata/tigon) !

Download, Develop, Launch, Fork, Contribute!

Help Tigon Grow

Page 18: HBase Meetup @ Cask HQ 09/25

PROPRIETARY & CONFIDENTIAL

Coopr clusters with a click

!

Page 19: HBase Meetup @ Cask HQ 09/25

PROPRIETARY & CONFIDENTIAL