strata + hadoop world 2012 keynote: beyond batch - doug cutting

8
1 Headline Goes Here Speaker Name or Subhead Goes Here DO NOT USE PUBLICLY PRIOR TO 10/23/12 Beyond Batch Doug Cutting October 2012

Upload: cloudera-inc

Post on 11-May-2015

2.032 views

Category:

Documents


3 download

DESCRIPTION

Hadoop started as an offline, batch-processing system. It made it practical to store and process much larger datasets than before. Subsequently, more interactive, online systems emerged, integrating with Hadoop. First among these was HBase, the key/value store. Now scalable interactive query engines are beginning to join the Hadoop ecosystem. Realtime is gradually becoming a viable peer to batch in big data.

TRANSCRIPT

Page 1: Strata + Hadoop World 2012 Keynote: Beyond Batch - Doug Cutting

1

Headline Goes HereSpeaker Name or Subhead Goes Here

DO NOT USE PUBLICLY PRIOR TO 10/23/12Beyond Batch

Doug Cutting October 2012

Page 2: Strata + Hadoop World 2012 Keynote: Beyond Batch - Doug Cutting

2

Hadoop Started As Batch

MapReduce• Simple, powerful• Kills a lot of birds

• Efficient, scalable• Compute at storage

• Shared platform• Used by Pig, Hive, etc.

• Incredibly useful!• But not sufficient

Page 3: Strata + Hadoop World 2012 Keynote: Beyond Batch - Doug Cutting

3

Big Data Is Not (Just) Batch

Its true themes are:• Scalability

• Affordability• Commodity hardware• Open-source software

• Distributed & reliable• Schema on read• Data beats algorithms

Page 4: Strata + Hadoop World 2012 Keynote: Beyond Batch - Doug Cutting

4

HBase: First Non-Batch Component

Online key/value store• Complement to batch

• Online put/get• Batch load & analyze• Best of both• Popular combination

• A step towards the future…

Page 5: Strata + Hadoop World 2012 Keynote: Beyond Batch - Doug Cutting

5

Holy Grail Of Big Data

• Open source, commodity HW, etc.• Linear scaling

• To scale, just buy more hardware• On many axes

• Storage capacity• Throughput & latency

• of batch & query• Transactions, Joins, Indexes

• and batch!

Page 6: Strata + Hadoop World 2012 Keynote: Beyond Batch - Doug Cutting

6

Google Gives Us A Map

Google publication Apache project

2004 GFS & MapReduce 2006 Hadoop batch programs

2005 Sawzall 2008 Pig & Hive batch queries

2006 BigTable 2008 HBase online key/value

... ... ... ... ...

2012 Spanner ? ? holy grail?

Google publication Open source project

2004 GFS & MapReduce 2006 Hadoop batch programs

2005 Sawzall 2008 Pig & Hive batch queries

2006 BigTable 2008 HBase online key/value

... ... ... ... ...

2012 Spanner ? ? transactions, etc.

5 years – 26 authors!

Page 7: Strata + Hadoop World 2012 Keynote: Beyond Batch - Doug Cutting

7

Impala Is Latest Step

Google publication Apache project

2004 GFS & MapReduce 2006 Hadoop batch programs

2005 Sawzall 2008 Pig & Hive batch queries

2006 BigTable 2008 HBase online key/value

... ... ... ... ...

2012 Spanner ? ? holy grail?

Google publication Open source project

2004 GFS & MapReduce 2006 Hadoop batch programs

2005 Sawzall 2008 Pig & Hive batch queries

2006 BigTable 2008 HBase online key/value

2010 Dremel/F1 2012 Impala online queries

2012 Spanner ? ? transactions, etc.

Page 8: Strata + Hadoop World 2012 Keynote: Beyond Batch - Doug Cutting

8

@cutting #bigquestions