what does ‘big data’ mean · 2 the meaning of big data - 3 v’s •big volume — with simple...
TRANSCRIPT
![Page 1: What Does ‘Big Data’ Mean · 2 The Meaning of Big Data - 3 V’s •Big Volume — With simple (SQL) analytics — With complex (non-SQL) analytics •Big Velocity — Drink from](https://reader036.vdocuments.net/reader036/viewer/2022062919/5ee2c84fad6a402d666d0e37/html5/thumbnails/1.jpg)
What Does ‘Big Data’ Mean
and Who Will Win?
Michael Stonebraker
![Page 2: What Does ‘Big Data’ Mean · 2 The Meaning of Big Data - 3 V’s •Big Volume — With simple (SQL) analytics — With complex (non-SQL) analytics •Big Velocity — Drink from](https://reader036.vdocuments.net/reader036/viewer/2022062919/5ee2c84fad6a402d666d0e37/html5/thumbnails/2.jpg)
2
The Meaning of Big Data - 3 V’s
• Big Volume
— With simple (SQL) analytics
— With complex (non-SQL) analytics
• Big Velocity
— Drink from the fire hose
• Big Variety
— Large number of diverse data sources to integrate
![Page 3: What Does ‘Big Data’ Mean · 2 The Meaning of Big Data - 3 V’s •Big Volume — With simple (SQL) analytics — With complex (non-SQL) analytics •Big Velocity — Drink from](https://reader036.vdocuments.net/reader036/viewer/2022062919/5ee2c84fad6a402d666d0e37/html5/thumbnails/3.jpg)
3
Big Volume - Little Analytics
• Well addressed by data warehouse crowd
• Who are pretty good at SQL analytics on
— Hundreds of nodes
— Petabytes of data
![Page 4: What Does ‘Big Data’ Mean · 2 The Meaning of Big Data - 3 V’s •Big Volume — With simple (SQL) analytics — With complex (non-SQL) analytics •Big Velocity — Drink from](https://reader036.vdocuments.net/reader036/viewer/2022062919/5ee2c84fad6a402d666d0e37/html5/thumbnails/4.jpg)
4
The Participants
• Row storage and row executor
— Microsoft Madison, DB2, Netezza, Oracle(!)
• Column store grafted onto a row executor (wannabees)
— Terradata/Asterdata, EMC/Greenplum
• Column store and column executor
— HP/Vertica, Sybase/IQ, Paraccel
Oracle Exadata is not:
a column store
a scalable shared-nothing architecture
![Page 5: What Does ‘Big Data’ Mean · 2 The Meaning of Big Data - 3 V’s •Big Volume — With simple (SQL) analytics — With complex (non-SQL) analytics •Big Velocity — Drink from](https://reader036.vdocuments.net/reader036/viewer/2022062919/5ee2c84fad6a402d666d0e37/html5/thumbnails/5.jpg)
5
Performance
• Row stores -- x1
• Column stores -- x50
• Wannabees -- x5 (???)
![Page 6: What Does ‘Big Data’ Mean · 2 The Meaning of Big Data - 3 V’s •Big Volume — With simple (SQL) analytics — With complex (non-SQL) analytics •Big Velocity — Drink from](https://reader036.vdocuments.net/reader036/viewer/2022062919/5ee2c84fad6a402d666d0e37/html5/thumbnails/6.jpg)
6
Big Data - Big Analytics
• Complex math operations (machine learning, clustering, trend detection, ….)
— In your market, the world of the ―quants‖
— Mostly specified as linear algebra on array data
• A dozen or so common ‗inner loops‘
— Matrix multiply
— QR decomposition
— SVD decomposition
— Linear regression
![Page 7: What Does ‘Big Data’ Mean · 2 The Meaning of Big Data - 3 V’s •Big Volume — With simple (SQL) analytics — With complex (non-SQL) analytics •Big Velocity — Drink from](https://reader036.vdocuments.net/reader036/viewer/2022062919/5ee2c84fad6a402d666d0e37/html5/thumbnails/7.jpg)
7
Big Data - Big Analytics An Example
• Consider closing price on all trading days for the last 5 years for two stocks A and B
• What is the covariance between the two time-series?
(1/N) * sum (Ai - mean(A)) * (Bi - mean (B))
![Page 8: What Does ‘Big Data’ Mean · 2 The Meaning of Big Data - 3 V’s •Big Volume — With simple (SQL) analytics — With complex (non-SQL) analytics •Big Velocity — Drink from](https://reader036.vdocuments.net/reader036/viewer/2022062919/5ee2c84fad6a402d666d0e37/html5/thumbnails/8.jpg)
8
Now Make It Interesting …
• Do this for all pairs of 4000 stocks
— The data is the following 4000 x 1000 matrix
Stock t1 t2 t3 t4 t5 t6 t7 …. t1000
S1
S2
…
S4000
Hourly data? All securities?
![Page 9: What Does ‘Big Data’ Mean · 2 The Meaning of Big Data - 3 V’s •Big Volume — With simple (SQL) analytics — With complex (non-SQL) analytics •Big Velocity — Drink from](https://reader036.vdocuments.net/reader036/viewer/2022062919/5ee2c84fad6a402d666d0e37/html5/thumbnails/9.jpg)
9
Array Answer
• Ignoring the (1/N) and subtracting off the
means ….
Stock * StockT
• Now try it for companies headquartered in
Charlotte!
![Page 10: What Does ‘Big Data’ Mean · 2 The Meaning of Big Data - 3 V’s •Big Volume — With simple (SQL) analytics — With complex (non-SQL) analytics •Big Velocity — Drink from](https://reader036.vdocuments.net/reader036/viewer/2022062919/5ee2c84fad6a402d666d0e37/html5/thumbnails/10.jpg)
10
Goal
• Good data management
• Integrated with complex analytics
— Specified as arrays, not tables
![Page 11: What Does ‘Big Data’ Mean · 2 The Meaning of Big Data - 3 V’s •Big Volume — With simple (SQL) analytics — With complex (non-SQL) analytics •Big Velocity — Drink from](https://reader036.vdocuments.net/reader036/viewer/2022062919/5ee2c84fad6a402d666d0e37/html5/thumbnails/11.jpg)
11
Solution Options
• SAS et. al
— Weak or non-existent data management
• SAS plus RDBMS
— No integration
• RDBMS plus user-defined functions
— Slowwwww (X10 to X100)
• Array DBMS
— Check out SciDB.org
![Page 12: What Does ‘Big Data’ Mean · 2 The Meaning of Big Data - 3 V’s •Big Volume — With simple (SQL) analytics — With complex (non-SQL) analytics •Big Velocity — Drink from](https://reader036.vdocuments.net/reader036/viewer/2022062919/5ee2c84fad6a402d666d0e37/html5/thumbnails/12.jpg)
12
Hadoop…..
• Simple analytics
— X100 times a parallel DBMS
• Complex analytics (Mahout or roll-your-own)
— X100 times Scalapack
• Parallel programming
— Parallel grep (great)
— Everything else (awful)
• Hadoop lacks
— Stateful computations
— Point-to-point communication
![Page 13: What Does ‘Big Data’ Mean · 2 The Meaning of Big Data - 3 V’s •Big Volume — With simple (SQL) analytics — With complex (non-SQL) analytics •Big Velocity — Drink from](https://reader036.vdocuments.net/reader036/viewer/2022062919/5ee2c84fad6a402d666d0e37/html5/thumbnails/13.jpg)
13
Big Velocity
• Trading volume on Wall Street going through
the roof
• Breaking all their infrastructure
• And it will just get worse
![Page 14: What Does ‘Big Data’ Mean · 2 The Meaning of Big Data - 3 V’s •Big Volume — With simple (SQL) analytics — With complex (non-SQL) analytics •Big Velocity — Drink from](https://reader036.vdocuments.net/reader036/viewer/2022062919/5ee2c84fad6a402d666d0e37/html5/thumbnails/14.jpg)
14
Big Velocity
• Sensor tagging everything of value sends
velocity through the roof
— E.g. car insurance
• Smart phones as a mobile platform sends
velocity through the roof
• State of multi-player internet games must be
recorded – sends velocity through the roof
![Page 15: What Does ‘Big Data’ Mean · 2 The Meaning of Big Data - 3 V’s •Big Volume — With simple (SQL) analytics — With complex (non-SQL) analytics •Big Velocity — Drink from](https://reader036.vdocuments.net/reader036/viewer/2022062919/5ee2c84fad6a402d666d0e37/html5/thumbnails/15.jpg)
15
P.S. I started StreamBase but I have no
current relationship with the company
• Big pattern - little state (electronic trading)
— Find me a ‗strawberry‘ followed within
100 msec by a ‗banana‘
• Complex event processing (CEP) is focused
on this problem
— Patterns in a firehose
Two Different Solutions
![Page 16: What Does ‘Big Data’ Mean · 2 The Meaning of Big Data - 3 V’s •Big Volume — With simple (SQL) analytics — With complex (non-SQL) analytics •Big Velocity — Drink from](https://reader036.vdocuments.net/reader036/viewer/2022062919/5ee2c84fad6a402d666d0e37/html5/thumbnails/16.jpg)
16
Two Different Solutions
• Big state - little pattern
— For every security, assemble my real-time global position
— And alert me if my exposure is greater than X
• Looks like high performance OLTP
— Want to update a database at very high speed
![Page 17: What Does ‘Big Data’ Mean · 2 The Meaning of Big Data - 3 V’s •Big Volume — With simple (SQL) analytics — With complex (non-SQL) analytics •Big Velocity — Drink from](https://reader036.vdocuments.net/reader036/viewer/2022062919/5ee2c84fad6a402d666d0e37/html5/thumbnails/17.jpg)
17
My Suspicion
• Your have 3-4 Big state - little pattern problems for every one Big pattern – little state problem
![Page 18: What Does ‘Big Data’ Mean · 2 The Meaning of Big Data - 3 V’s •Big Volume — With simple (SQL) analytics — With complex (non-SQL) analytics •Big Velocity — Drink from](https://reader036.vdocuments.net/reader036/viewer/2022062919/5ee2c84fad6a402d666d0e37/html5/thumbnails/18.jpg)
18
New OLTP
• You need to ingest a fire
hose in real-time
• You need to perform high
volume OLTP
• You often need real-time
analytics
![Page 19: What Does ‘Big Data’ Mean · 2 The Meaning of Big Data - 3 V’s •Big Volume — With simple (SQL) analytics — With complex (non-SQL) analytics •Big Velocity — Drink from](https://reader036.vdocuments.net/reader036/viewer/2022062919/5ee2c84fad6a402d666d0e37/html5/thumbnails/19.jpg)
19
Solution Choices
• Old SQL
— The elephants
— Slowwww (X 50)
— Non-starter
• No SQL
— 75 or so vendors giving up both SQL and ACID
• New SQL
— Retain SQL and ACID but go fast with a new
architecture
![Page 20: What Does ‘Big Data’ Mean · 2 The Meaning of Big Data - 3 V’s •Big Volume — With simple (SQL) analytics — With complex (non-SQL) analytics •Big Velocity — Drink from](https://reader036.vdocuments.net/reader036/viewer/2022062919/5ee2c84fad6a402d666d0e37/html5/thumbnails/20.jpg)
20
No SQL
• Give up SQL
— Interesting to note that
Cassandra and Mongo are
moving to (yup) SQL
• Give up ACID
— If you need ACID, this is a
decision to tear your hair out
by doing it in user code
— Can you guarantee you won‘t
need ACID tomorrow?
![Page 21: What Does ‘Big Data’ Mean · 2 The Meaning of Big Data - 3 V’s •Big Volume — With simple (SQL) analytics — With complex (non-SQL) analytics •Big Velocity — Drink from](https://reader036.vdocuments.net/reader036/viewer/2022062919/5ee2c84fad6a402d666d0e37/html5/thumbnails/21.jpg)
21
VoltDB: an example of New SQL
• A main memory SQL engine
• Open source
• Shared nothing, Linux, TCP/IP on jelly beans
• Light-weight transactions
— Run-to-completion with no locking
• Single-threaded
— Multi-core by splitting main memory
• About 100x RDBMS on TPC-C
![Page 22: What Does ‘Big Data’ Mean · 2 The Meaning of Big Data - 3 V’s •Big Volume — With simple (SQL) analytics — With complex (non-SQL) analytics •Big Velocity — Drink from](https://reader036.vdocuments.net/reader036/viewer/2022062919/5ee2c84fad6a402d666d0e37/html5/thumbnails/22.jpg)
22
Big Variety
• Typical enterprise has 5000 operational systems
— Only a few get into the data warehouse
— What about the rest?
• And what about all the rest of your data?
— Spreadsheets
— Access data bases
— Web pages
• And public data from the web?
![Page 23: What Does ‘Big Data’ Mean · 2 The Meaning of Big Data - 3 V’s •Big Volume — With simple (SQL) analytics — With complex (non-SQL) analytics •Big Velocity — Drink from](https://reader036.vdocuments.net/reader036/viewer/2022062919/5ee2c84fad6a402d666d0e37/html5/thumbnails/23.jpg)
23
The World of Data Integration
enterprise
data warehouse
text
the rest of your data
![Page 24: What Does ‘Big Data’ Mean · 2 The Meaning of Big Data - 3 V’s •Big Volume — With simple (SQL) analytics — With complex (non-SQL) analytics •Big Velocity — Drink from](https://reader036.vdocuments.net/reader036/viewer/2022062919/5ee2c84fad6a402d666d0e37/html5/thumbnails/24.jpg)
24
Summary
• The rest of your data (public and private)
— Is a treasure trove of incredibly valuable
information
— Largely untapped
![Page 25: What Does ‘Big Data’ Mean · 2 The Meaning of Big Data - 3 V’s •Big Volume — With simple (SQL) analytics — With complex (non-SQL) analytics •Big Velocity — Drink from](https://reader036.vdocuments.net/reader036/viewer/2022062919/5ee2c84fad6a402d666d0e37/html5/thumbnails/25.jpg)
25
Data Tamer
• Integrate the rest of your data
• Has to
— Be scalable to 1000s of sites
— Deal with incomplete, conflicting, and incorrect data
— Be incremental
• Task is never done
![Page 26: What Does ‘Big Data’ Mean · 2 The Meaning of Big Data - 3 V’s •Big Volume — With simple (SQL) analytics — With complex (non-SQL) analytics •Big Velocity — Drink from](https://reader036.vdocuments.net/reader036/viewer/2022062919/5ee2c84fad6a402d666d0e37/html5/thumbnails/26.jpg)
26
Data Tamer in a Nutshell
• Apply machine learning and statistics to perform
automatic:
— Discovery of structure
— Entity resolution
— Transformation
• With a human assist if necessary
— WYSIWYG tool (Wrangler)
![Page 27: What Does ‘Big Data’ Mean · 2 The Meaning of Big Data - 3 V’s •Big Volume — With simple (SQL) analytics — With complex (non-SQL) analytics •Big Velocity — Drink from](https://reader036.vdocuments.net/reader036/viewer/2022062919/5ee2c84fad6a402d666d0e37/html5/thumbnails/27.jpg)
27
Data Tamer
• MIT research project
• Looking for more integration problems
— Wanna partner?
![Page 28: What Does ‘Big Data’ Mean · 2 The Meaning of Big Data - 3 V’s •Big Volume — With simple (SQL) analytics — With complex (non-SQL) analytics •Big Velocity — Drink from](https://reader036.vdocuments.net/reader036/viewer/2022062919/5ee2c84fad6a402d666d0e37/html5/thumbnails/28.jpg)
28
Take away
• One size does not fit all
• Plan on (say) 6 DBMS architectures
— Use the right tool for the job
• Elephants are not competitive
— At anything
— Have a bad ‗innovator‘s dilemma‘ problem