how big is big data? and nosql databases
DESCRIPTION
How Big is Big Data? And NoSQL Databases . University of California, Berkeley School of Information IS 257: Database Management. Announcement. Change to presentations Presentation is now OPTIONAL for extra credit Now will be during finals week Monday Dec. 16 th from 1-4 - PowerPoint PPT PresentationTRANSCRIPT
2013.11.21- SLIDE 1IS 257 – Fall 2013
How Big is Big Data? And NoSQL Databases
University of California, BerkeleySchool of Information
IS 257: Database Management
2013.11.21- SLIDE 2
Announcement• Change to presentations
– Presentation is now OPTIONAL for extra credit
– Now will be during finals week• Monday Dec. 16th from 1-4
– Final report also due then (or before)
IS 257 – Fall 2013
2013.11.21- SLIDE 3IS 257 – Fall 2013
Lecture Outline• Review
– Big Data (introduction)• More on Big Data and what it means• RDBMS vs NoSQL databases
2013.11.21- SLIDE 4IS 257 – Fall 2013
Big Data and Databases• “640K ought to be enough for anybody.”
– Attributed to Bill Gates, 1981
2013.11.21- SLIDE 5IS 257 – Fall 2013
The Grid: On-Demand Access to Electricity
Time
Qua
lity,
eco
nom
ies
of s
cale
Source: Ian Foster
2013.11.21- SLIDE 6
Big Data and Databases• We have already mentioned some Big
Data – The Walmart Data Warehouse– Information collected by Amazon on users
and sales and used to make recommendations
• Most modern web-based companies capture EVERYTHING that their customers do– Does that go into a Warehouse or someplace
else?
IS 257 – Fall 2013
2013.11.21- SLIDE 7IS 257 – Fall 2013
Why the Grid?(1) Revolution in Science• Pre-Internet
– Theorize &/or experiment, aloneor in small teams; publish paper
• Post-Internet– Construct and mine large databases of
observational or simulation data– Develop simulations & analyses– Access specialized devices remotely– Exchange information within
distributed multidisciplinary teamsSource: Ian Foster
2013.11.21- SLIDE 8IS 257 – Fall 2013
Why the Grid?(2) Revolution in Business• Pre-Internet
– Central data processing facility• Post-Internet
– Enterprise computing is highly distributed, heterogeneous, inter-enterprise (B2B)
– Business processes increasingly computing- & data-rich
– Outsourcing becomes feasible => service providers of various sorts
Source: Ian Foster
2013.11.21- SLIDE 9
How Big is Big Data• How big is big?
IS 257 – Fall 2013
1 Kilobyte 1,000 bits/byte1 megabyte 1,000,0001 gigabyte 1,000,000,0001 terabyte 1,000,000,000,0001 petabyte 1,000,000,000,000,0001 exabyte 1,000,000,000,000,000,0001 zettabyte 1,000,000,000,000,000,000,000
2013.11.21- SLIDE 10
What is Big Data?• Ran across some interesting slides from a
decade ago that already frame the problem and did a fair job of predicting where we are today
– Slides by Jim Gray and Tony Hey : “In Search of Petabyte Databases” ca. 2001
IS 257 – Fall 2013
2013.11.21- SLIDE 11
Summary from Gray & Hey
• DBs own the sweet-spot:– 1GB to 100TB
• Big data is not in databases• HPTS crowd is not really high
performance storage (BIG DATA)• Cost of storage is people:
–Performance goal:1 Admin per PB
From Jim Gray and Tony Hey : “In Search of Petabyte Databases” ca. 2001IS 257 – Fall 2013
2013.11.21- SLIDE 12
Why People?
IS 257 – Fall 2013
One row of one of Google’s data centers
2013.11.21- SLIDE 13
What counts as Big Data• More OPS (other people’s slides)
– Taming the Big Data Fire Hose • by John Hugg, VoltDB
IS 257 – Fall 2013
2013.11.21- SLIDE 14
RDBMS vs NoSQL• From a course at Dalhousie Univ. in
Canada (including slides from Keith W. Hare “A Comparison of SQL and NoSQL Databases”)
IS 257 – Fall 2013
2013.11.21- SLIDE 15
You can buy Big Data…• Oracle will be happy to sell you systems
(hardware and software) to manage your exabytes…
IS 257 – Fall 2013
Oracle Exadata Database Machine X3-8
2013.11.21- SLIDE 16
And NoSQL too…• Oracle Big Data Appliance• With Oracle NoSQL Database (BerkeleyDB)
IS 257 – Fall 2013