hbase meetup @ cask hq 09/25
DESCRIPTION
Cask OSS http://cdap.io, http://coopr.io, http://tigon.io, http://tephra.ioTRANSCRIPT
HBase Meetup @ Cask HQ September 25, 2014
CASK DATA APP PLATFORM
tigon.io cdap.io coopr.io
PROPRIETARY & CONFIDENTIAL
HBase Meetup Agenda
• Cask Open Source Project Announcements by Jonathan Gray
• Project CDAP: Cask Data App Platform by Jonathan Gray
• Project Coopr: Cluster Provisioning by Albert Shau
• Project Tigon: RT Streaming on YARN + HBase by Gokul Gunasekaran
• HBase at Flipboard by Sang Chi
• Master Topologies post HBase 1.0 by Mikhail Antonov of WANDisco
Cask Open Source Project Announcements
100% Apache 2.0 Licensed Software
PROPRIETARY & CONFIDENTIAL
Simple access to powerful technology
• Continuuity is now Cask
• Same Mission. Same Team. Same Technology. Now Open Source!
• We are an open source software company focused on developers and applications on Hadoop
• We have been building our platform and technologies for 3 years and have released major projects today, everything by the end of the year
• We are committed to building vibrant communities around these projects and will drive towards a true community-driven process
Cask Newly Launched Projects
PROPRIETARY & CONFIDENTIAL
Virtualization for Hadoop Data and Apps
Real-time streaming for the real world Clusters with a click
CASK DATA APP PLATFORM
tigon.io cdap.io coopr.io
PROPRIETARY & CONFIDENTIAL
Cask Launch Day Thursday, September 25, 2014
• Website launch at cask.co with lots of documentation and technical content
• Project sites launch at cdap.io / coopr.io / tigon.io / tephra.io
• Reactor released as CDAP v2.5 under ASL2 on GitHub
• Loom released as Coopr v0.9.8 under ASL2 on GitHub
• Tigon dev release as Tigon v0.1.0 (Cask + AT&T) under ASL2 on GitHub
• Hosting HBase Meetup @ Cask HQ w/ sessions on CDAP, Coopr, Tigon
Cask Data App Platform
cdap.io
Why virtualize?
PROPRIETARY & CONFIDENTIAL
Runtime Languages •Simpler programming •Portability
Virtual machines and containers •More efficient resource utilization •Reuse
Software defined networks •Adaptability •New applications
Bringing the concepts of virtualization to Hadoop and HBase data and applications augments existing Hadoop-ecosystem open source technologies to enable more use cases to be built by more developers in less time.
•Broader use cases •Faster development •Accelerated disruption of proprietary incumbents
Cask Data Application Platform
PROPRIETARY & CONFIDENTIAL
Application innovation • Enable a new class of applications to drive greater business
value, including those requiring real-time and batch processing
Simplified development • Simplify big data app development – more apps faster with
less dependence on Hadoop expertise !
Production-ready applications • Avoid compromising operational transparency and control -
security, logging, metrics, lineage, and more
Data Virtualization Logical representations of data
App Virtualization Standardized containers for apps
App Virtualization
PROPRIETARY & CONFIDENTIAL
What is App Virtualization? • Applications deployed in CDAP containers with runtime services !Features • Framework level guarantees • Applications aren’t required to be idempotent
• Support for development life cycle and operational deployment • Portable from laptop to cluster • Logging, metrics, security with no developer overhead
• Standardization of containers across programming paradigms • Take advantage of Spark, Cascading, Hive, etc. using programmatic
APIs without worrying about system implementation !
Benefits • Developers can build a broader range of apps focusing on business
logic, not core system services • Speed up time from coding to testing to operational deployment • Take advantage of new technology with less need for training and
expertise !
Data Virtualization
PROPRIETARY & CONFIDENTIAL
What is Data Virtualization? • Logical representations of underlying data in CDAP datasets !Features • Streams for data ingestion • Supports Kafka, Flume, REST, user-defined protocols • Time-stamped and ordered • Horizontally scalable
• Logical representations in commonly used access patterns • Time series, Key value, objects, geospatial index, OLAP cube and more
• Data available to multiple applications • MapReduce, Hive, Spark, Flows and more • REST APIs • Unified batch and real-time processing
!Benefits • Simplify data ingestion and Extract Transform Load (ETL) to accelerate time to
value • Maximize value of data by making it easy to find and easy to explore through
multiple query methods • Protect the data through security, audit, lineage, and reporting !!
Thank You! !
Questions?
TIGON
+ =
Real-time Streaming for the Real World
Gokul, Software Engineer, Cask Data HBase Meetup, September 2014
PROPRIETARY & CONFIDENTIAL
Open-source Distributed Real-time Stream Processing Framework !
Exactly-once processing guarantees
Provides both imperative Java API and SQL-like declarative language for building powerful apps! Built on top of Hadoop YARNTM and Apache HBaseTM
Leverages Twill, an Apache incubator project and CASK’s open source transaction engine - Tephra
Meet Tigon
PROPRIETARY & CONFIDENTIAL
Tigon Stack
Evolution of Tigon
Flowlet
Flowlet
Flowlet
Flowlet
TigonSQLFlowlet
Events
Tigon Architecture
STANDALONE
Threads
In Memory Queues
DISTRIBUTED
YARN Containers
HBase Tables
PROPRIETARY & CONFIDENTIAL
Tigon in ActionSample Case : Real-time filter and join of data streams
Flowlet
Flowlet
Flowlet
TigonSQLJoin & Filter
Events(<id>,<name>)
<id,name, age> [name, count]
Events(<id>,<age>)
PROPRIETARY & CONFIDENTIAL
Developer Preview Release available for download on www.tigon.io Tigon source available on GitHub (www.github.com/caskdata/tigon) !
Download, Develop, Launch, Fork, Contribute!
Help Tigon Grow
PROPRIETARY & CONFIDENTIAL
Coopr clusters with a click
!
PROPRIETARY & CONFIDENTIAL