hadoop meets cloud with multi-tenancy

44
Treasure Data Hadoop meets Cloud with Multi-Tenancy Kazuki Ohta Founder and CTO at Treasure Data, Inc. Hadoopユーザー会 [email protected] @kzk_mover Friday, April 5, 13

Upload: treasure-data-inc

Post on 10-May-2015

7.023 views

Category:

Technology


0 download

DESCRIPTION

CTO Kaz's talk at Hadoop Conference Japan 2013 Winter.

TRANSCRIPT

Page 1: Hadoop meets Cloud with Multi-Tenancy

Treasure DataHadoop meets Cloud with Multi-Tenancy

Kazuki OhtaFounder and CTO at Treasure Data, Inc.

Hadoopユーザー会 [email protected]

@kzk_mover

Friday, April 5, 13

Page 2: Hadoop meets Cloud with Multi-Tenancy

Who are you? Kazuki Ohta (太田一樹)

• @kzk_mover, [email protected]

Treasure Data, Inc.• Chief Technology Officer, Founded July 2011

Hadoop User Group Japan• One of Founders• “Hadoop徹底入門”

Open-Source Enthusiast• Hadoop, memcached, jemalloc, MongoDB, memcached, uim, etc...

2

Friday, April 5, 13

Page 3: Hadoop meets Cloud with Multi-Tenancy

3

Data Volume

Cloud

EnterpriseRDBMSLightweight

RDBMS

DB2

1Bil entryOr 10TB

TraditionalData Warehouse

$10Bmarket

$34Bmarket

Database-as-a-service

Big Data-as-a-Service

On-Premise

© 2012 Forrester Research, Inc. Reproduction Prohibited

Treasure Data = Cloud + Big Data

Friday, April 5, 13

Page 4: Hadoop meets Cloud with Multi-Tenancy

4

What is the Problem?

Friday, April 5, 13

Page 5: Hadoop meets Cloud with Multi-Tenancy

Big Data? NoSQL?

5

Friday, April 5, 13

Page 6: Hadoop meets Cloud with Multi-Tenancy

6

Too Many Solutions

Friday, April 5, 13

Page 7: Hadoop meets Cloud with Multi-Tenancy

7from http://marblejenka.blogspot.jp/2013/01/hadoop.html

Hadoop Versions

Too Many Variations (+Eco System)

Friday, April 5, 13

Page 8: Hadoop meets Cloud with Multi-Tenancy

Current Big Data Solutions: ‘Feature Creep’

8http://en.wikipedia.org/wiki/Feature_creepFriday, April 5, 13

Page 9: Hadoop meets Cloud with Multi-Tenancy

9

We need Machete :)

Machete Design by James LindenbaumHeroku Co-Founderhttp://www.youtube.com/watch?v=3BhDLm9jo5Y

EVERYTHINGwith

ONE interface

Simple & Discoverable

Friday, April 5, 13

Page 10: Hadoop meets Cloud with Multi-Tenancy

‘Simplicity’ itself is a feature :)

10

by Anand Babu PeriasamyGlusterFS Co-Founder

Friday, April 5, 13

Page 11: Hadoop meets Cloud with Multi-Tenancy

Next Topic: Cloud?

11

Friday, April 5, 13

Page 12: Hadoop meets Cloud with Multi-Tenancy

12

http://www.saasblogs.com/saas/demystifying-the-cloud-where-do-saas-paas-and-other-acronyms-fit-in/

Friday, April 5, 13

Page 13: Hadoop meets Cloud with Multi-Tenancy

Battle Field of IaaS Vendors: SCM

13

HW Performance / Price

Time

On-Premise

Decrease withMoore’s Law

IaaS Vendors

Battle Field:Supply Chain Management

In the near future, most of HW buyers aren’t individual companies, but cloud.

Friday, April 5, 13

Page 14: Hadoop meets Cloud with Multi-Tenancy

PaaS, SaaS:IT is all about Operation

14

With PaaS, you offload your development operations function and have the PaaS provider handle the tools and components required to deploy and manage applications reliably. - EngineYard

More Sleep, More Value

Friday, April 5, 13

Page 15: Hadoop meets Cloud with Multi-Tenancy

15

PaaS/SaaS Battle Field: ‘Time’ is Money

CustomerValue

Time

IdealExpectation

Sign-up or PO

Obsoleteover time

Reality(On-Premise)

HW/SW Selection, PoC, Deploy...Upgrade

Friday, April 5, 13

Page 16: Hadoop meets Cloud with Multi-Tenancy

16

Introductionto

Treasure Data

Friday, April 5, 13

Page 17: Hadoop meets Cloud with Multi-Tenancy

17

Company Overview

US team as of 2012 JulyFriday, April 5, 13

Page 18: Hadoop meets Cloud with Multi-Tenancy

Company Overview Silicon Valley-based Company

• All Founders are Japanese• Hironobu Yoshikawa• Kazuki Ohta• Sadayuki Furuhashi

OSS Enthusiasts• MessagePack, Fluentd, etc.• Cloud native

18

Friday, April 5, 13

Page 19: Hadoop meets Cloud with Multi-Tenancy

19

Our 50+ Customers – Fortune Global 500 leaders and start-ups including:

250 billion records / month in Feb 2013

2 million jobs executed

Friday, April 5, 13

Page 20: Hadoop meets Cloud with Multi-Tenancy

20

Vision: Single Analytics Platform for the World

Friday, April 5, 13

Page 21: Hadoop meets Cloud with Multi-Tenancy

Investors Bill Tai Naren Gupta - Nexus Ventures, Director of Redhat, TIBCO Othman Laraki - Former VP Growth at Twitter James Lindenbaum, Adam Wiggins, Orion Henry - Heroku

Founders Anand Babu Periasamy, Hitesh Chellani - Gluster

Founders Yukihiro “Matz” Matsumoto - Creator of Ruby Dan Scheinman - Director of Arista Networks + 10 more people

• and....21

Jerry Yang, Founder of Yahoo!where Hadoop was invented :)

Check out Today (2013/01/21)’s Morning 日経新聞!

Friday, April 5, 13

Page 22: Hadoop meets Cloud with Multi-Tenancy

22

Treasure Data’sPhilosophy and Architecture

Friday, April 5, 13

Page 23: Hadoop meets Cloud with Multi-Tenancy

23

Big Data Adoption Stages

Intelligence Sophistication

Standard Reports

Ad-hoc Reports

Drill Down Query

Alerts

Statistical Analysis

Predictive Analysis

Optimization

What happened?

Where?

Where exactly?

Error?

Why?

What’s a trend?

What’s the best?

Analytics

Reporting

Treasure Data’s FOCUS

(80% of needs)

Friday, April 5, 13

Page 24: Hadoop meets Cloud with Multi-Tenancy

24

Full Stack Support for Big Data Reporting

Our best-in-class architecture and operations team ensure the integrity and availability of your data.

Data from almost any source can be securely and reliably uploaded using td-agent in streaming or batch mode.

Our SQL, REST, JDBC, ODBC and command-line interfaces support all major query tools and approaches.

You can store gigabytes to petabytes of data efficiently and securely in our cloud-based columnar datastore.

Friday, April 5, 13

Page 25: Hadoop meets Cloud with Multi-Tenancy

25

Treasure Data = Collect + Store + Query

Friday, April 5, 13

Page 26: Hadoop meets Cloud with Multi-Tenancy

26

Example in AdTech: MobFox

1. Europe’s largest independent mobile ad exchange.

2. 20 billion imps/month (circa Jan. 2013)

3. Serving ads for 15,000+ mobile apps (circa Jan. 2013)

4. Needed Big Data Analytics infrastructure ASAP.

Friday, April 5, 13

Page 27: Hadoop meets Cloud with Multi-Tenancy

27

Two Weeks From Start to Finish!

Friday, April 5, 13

Page 28: Hadoop meets Cloud with Multi-Tenancy

28

Our Value was Proven :)

CustomerValue

Time

Our Value: Save Time!

Sign-up or PO

Obsoleteover time

Reality(On-Premise)

HW/SW Selection, PoC, Deploy...Upgrade

SimpleInterface

Friday, April 5, 13

Page 29: Hadoop meets Cloud with Multi-Tenancy

29

Architecture Breakdown

Data Collection• Increasing variety of

data sources• No single data schema• Lack of streaming data

collection method• 60% of Big Data project

resource consumed

Data Store/Analytics• Remaining complexity in

both traditional DWH and Hadoop (very slow time to market)

• Challenges in scaling data volume and expanding cost.

Connectivity• Required to ensure

connectivity with existing BI/visualization/apps by JDBC, REST and ODBC.

Friday, April 5, 13

Page 30: Hadoop meets Cloud with Multi-Tenancy

1) Data Collection 60% of BI project resource is consumed here Most ‘underestimated’ and ‘unsexy’ but MOST important Fluentd: OSS lightweight but robust Log Collector

• http://fluentd.org/

30

15:40~ Log analysis system with Hadoop in livedoor 2013

by Satoshi Tagomori @ NHN Japan

16:30~ いかにしてHadoopにデータを集めるか by Sadayuki Furuhahsi @ Treasure Data, Inc.

These talks will cover Fluentd :)

Friday, April 5, 13

Page 31: Hadoop meets Cloud with Multi-Tenancy

31

2) Data Store / Analytics - Columnar Storage

Friday, April 5, 13

Page 32: Hadoop meets Cloud with Multi-Tenancy

32

3) Connectivity

Query

Web App

MySQLTreasure Data

Columnar Storage

QueryProcessingCluster

Query API

REST API

JDBC, ODBC Driver

td-command

BI apps

Postgres

Result

Friday, April 5, 13

Page 33: Hadoop meets Cloud with Multi-Tenancy

Most Difficult Challenge: Multi-Tenancy All customers share the Hadoop clusters (4 Data Centers) Resource Sharing (Burst Cores), Rapid Improvement, Ease of Upgrade

33

datacenter A

datacenter B

datacenter C

datacenter D

Local FairScheduler

Local FairScheduler

Local FairScheduler

Local FairScheduler

GlobalScheduler

On-DemandResouce Allocation

Job Submission+ Plan Change

Friday, April 5, 13

Page 34: Hadoop meets Cloud with Multi-Tenancy

Conclusion Big Data is too complex

• Needs Simplicity• Machete v.s. Swiss Army Knife (Feature Creep)

IT is changing• The value of Software itself is decreasing• Operation is the key

Treasure Data = Cloud + Big Data• Currently Focusing on Big Data Reporting• Instant Value with Simple Interface

34

Friday, April 5, 13

Page 35: Hadoop meets Cloud with Multi-Tenancy

35

We’re Hiring Top Talents, please contact me :)

Friday, April 5, 13

Page 36: Hadoop meets Cloud with Multi-Tenancy

3618

Appendix

Friday, April 5, 13

Page 37: Hadoop meets Cloud with Multi-Tenancy

37

Big Data Market GrowthBig Data Revenue Breakdown(average of IDC, Gartner and Wikibon stats)

CAGR 38%

“More than half a billion dollars in venture capital has been invested in new big data technology.”

— Dan Vessett, IDC

“In 2012…BI and Analytics are rated #1 priorities.” — Ravi Kalakota, Gartner

“Big Data is the new definitive source of competitive advantage across all industries.”

— Jeff Kelly, Wikibon

Friday, April 5, 13

Page 38: Hadoop meets Cloud with Multi-Tenancy

38

Big Data Situation

CustomerValue

Time

Treasure Data

AWS

On-premise solutions

Sign-up or PO

Software B

EMR

RedShift

Software A

Obsolescenceover time

Friday, April 5, 13

Page 39: Hadoop meets Cloud with Multi-Tenancy

39

Treasure Data Service ArchitectureUser

Apache

App

App

Other data sources

RDBMS

Treasure Data columnar data

warehouse

QueryProcessingCluster

Query API

HIVE, PIG (to be supported)

JDBC, REST

MAPREDUCE JOBS

td-command

BI apps

Friday, April 5, 13

Page 40: Hadoop meets Cloud with Multi-Tenancy

40

Our Own Open Source technologiesWe are open source natives and proud of our heritage.We’ve contributed to Hibernate, Hadoop, Cassandra, Memcached, KDE, MongoDB among others.Our product reflects our deep commitment to the open-source community and is built on top of open source software we’ve authored and open sourced.• Fluentd - a popular data collector daemon written in Ruby www.fluentd.org (a leading user: SlideShare/Linkedin, One Kings Lane)• MessagePack - a fast, compact serializer. www.msgpack.org (a leading user: Pinterest, Redis)

Substantial commitment(Code, Packaging, Documentation,

Sponsorship)

Tech marketing, Possible lead gen

Friday, April 5, 13

Page 41: Hadoop meets Cloud with Multi-Tenancy

41

Example in Web Industry

Friday, April 5, 13

Page 42: Hadoop meets Cloud with Multi-Tenancy

42

Example Use Case – MySQL to TD

Friday, April 5, 13

Page 43: Hadoop meets Cloud with Multi-Tenancy

43

Example Use Case – MySQL to TD

Friday, April 5, 13

Page 44: Hadoop meets Cloud with Multi-Tenancy

Big Data for the Rest of Us

www.treasure-data.com | @TreasureData

Friday, April 5, 13