big data is a big scam most of the time! (mysql connect keynote 2012)

16
MySQL Connect Conference Keynote Address September 30, 2012 v1.2 Big Data is a Big Scam (Most of the Time) Daniel Austin, PayPal Technical Staff

Upload: daniel-austin

Post on 08-May-2015

752 views

Category:

Technology


1 download

DESCRIPTION

Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)

TRANSCRIPT

Page 1: Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)

MySQL Connect Conference Keynote AddressSeptember 30, 2012 v1.2

Big Data is a Big Scam (Most of the Time)

Daniel Austin, PayPal Technical Staff

Page 2: Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)

Confidential and Proprietary 2Global In-memory MySQL

Big Myths About Big Data

Preview - YESQL: A Counterexample

Today’s Agenda

Page 3: Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)

Confidential and Proprietary

THE FUNDAMENTAL PROBLEM IN DISTRIBUTED DATA SYSTEMS

“How Do We Manage Reliable Distribution of Data Across Geographical Distances?”

Page 4: Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)

Confidential and Proprietary

Big Data Myth #1: Big Data = NoSQL

• ‘Big Data’ Refers to a Common Set of Problems– Large Volumes– High Rates of Change

• Of Data• Of Data Models• Of Data Presentation and Output

– Often Require ‘Fast Data’ as well as ‘Big’• Near-real Time Analytics• Mapping Complex Structures

Takeaway: Big Data is the problem, NoSQL is one (proposed) solution

Page 5: Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)

Confidential and Proprietary

D oYou Need A Big Data System?

Well, Maybe….But Before You Go There…There are essentially two ‘Big Data Problems’:“I have too much data and it’s coming in too fast to handle with any RDBMS.”

“I have a lot of data distributed geographically and need to be able to read and write from anywhere in near real-time.”

Takeaway: if you have one of these Big Data problems, a NoSQL solution might work for you.But there are also other alternatives…

Page 6: Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)

Confidential and Proprietary

The NoSQL Solution

• NoSQL Systems provide a solution that relaxes many of the common constraints of typical RDBMS systems– Slow - RDBMS has not scaled with CPUs– Often require complex data management (SOX,

SOR)– Costly to build and maintain, slow to change and

adapt– Intolerant of CAP models (more on this later)

• Non-relational models, usually key-value• May be batched or streaming• Not necessarily distributed geographically

Page 7: Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)

Confidential and Proprietary

Big Data Myth #2: The CAP Theorem Doesn’t Say What You Think It Does

• Consistency, Availability, (Network) Partition• The Real Story: These are not Independent

Variables• AP =CP (Um, what? But…A != C )• Variations:

– PACELC (adds latency tolerance)

Takeaway: the real story here is about the tradeoffs made by designers of different systems, and the main tradeoff is between consistency and availability, usually in favor of the latter.

Page 8: Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)

Confidential and Proprietary

Big Data Hype Cycle: Where Are We Now?

There are currently more than 120+ NoSQL databases listed at nosql-databases.com!

You Are Here ?

As the pace of new technology solutions has slowed, some clear winners have emerged.

Page 9: Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)

Confidential and Proprietary

BIG DATA MYTH #3: BIG DATA AND NOSQL ARE NEW IDEAS

• The first and most successful such system is DNS, created in 1983.

• Began with flat files• Currently serves the entire

Internet (!)• DNS is an AP system, availability

is #1• Many extensions complicate a

simple design• Suggests a new term for CAP-

like ideas: variability• DNS variability is very high,

often 2-3x the mean

Page 10: Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)

Confidential and Proprietary 10Global In-memory MySQL

Big Myths About Big Data

Preview : YESQL: A Counterexample

Q&A

Today’s Agenda

Page 11: Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)

Confidential and Proprietary

“Develop a globally distributed DB For user-related data.”• Must Not Fail (99.999%)• Must Not Lose Data. Period.• Must Support Transactions• Must Support (some) SQL• Must WriteRead 32-bit integer globally in

1000ms• Maximum Data Volume: 100 TB• Must Scale Linearly with Costs

Mission YESQL

Page 12: Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)

Confidential and Proprietary

What about “High Performance”?

•Maximum lightspeed distance on Earth’s Surface: ~67 ms

•Target: data available worldwide in < 1000 ms

Sound Easy?

Think Again!

Page 13: Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)

Confidential and Proprietary

Architecture Stack

A B A B A B

A B A B

5 AWS Data Centers:US-E, US-W, TK, EU, AS

A BA B

Scale by Tiling

Page 14: Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)

Confidential and Proprietary

In The Full Session….

• More Big Data Myths• YeSQL Architecture• Failover• Conservation of Timestamps!• Join me today at 103o AM for the details!

Page 15: Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)

Confidential and Proprietary

Summing Up: The Big Picture on Big Data

• Only use Big Data solutions when you have a real Big Data problem.– Don’t be a Dedicated Follower of Tech Fashion!

• Not all Big Data solutions are created equal– What tradeoffs are most important to you?– Consistency, Fault Tolerance, Availability,

Performance, Variability • Is your data model a fit for NoSQL?

– You don’t have to give up the relational model in most cases, so don’t!

• You can achieve high performance and availability without giving up relational models and read consistency! Just say YESQL!

Page 16: Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)

Twitter: @daniel_b_austinEmai: [email protected]

“In the long run, we are all dead eventually consistent.”Maynard Keynes on NoSQL Databases

With apologies and thanks to the real DB experts, Andrew Goodman, Yves Trudeau, Clement Frazer, Daniel Abadi, Kent Beck, and everyone else who contributed. It really works!