introduction to big data an analogy between sugar cane & big data
DESCRIPTION
TRANSCRIPT
![Page 1: Introduction to Big Data An analogy between Sugar Cane & Big Data](https://reader035.vdocuments.net/reader035/viewer/2022081413/5495968db47959b3088b4681/html5/thumbnails/1.jpg)
Introduction to Big DataAn analogy between Sugar Cane & Big Data
Jean-Marc Desvaux – March 2012
Image Source: MicFarris.comImage Source: alternative-energy-fuels.com
![Page 2: Introduction to Big Data An analogy between Sugar Cane & Big Data](https://reader035.vdocuments.net/reader035/viewer/2022081413/5495968db47959b3088b4681/html5/thumbnails/2.jpg)
Session Abstract :
What is Big Data ? Where does it apply ?What are the technologies behind it ?Is it going to replace your RDBMS ? …
![Page 3: Introduction to Big Data An analogy between Sugar Cane & Big Data](https://reader035.vdocuments.net/reader035/viewer/2022081413/5495968db47959b3088b4681/html5/thumbnails/3.jpg)
Big data, It’s all Silicon Valley is talking about. It’s the new buzz word after ‘cloud.’
“Everybody is speaking of it and many are convinced it is the only way forward. As always, such dramatic statements are not only dangerous but serve to put some people off the concept. “
![Page 4: Introduction to Big Data An analogy between Sugar Cane & Big Data](https://reader035.vdocuments.net/reader035/viewer/2022081413/5495968db47959b3088b4681/html5/thumbnails/4.jpg)
Source: Tom Kyte’s Big Data Are you ready ? presentation
![Page 5: Introduction to Big Data An analogy between Sugar Cane & Big Data](https://reader035.vdocuments.net/reader035/viewer/2022081413/5495968db47959b3088b4681/html5/thumbnails/5.jpg)
What is Big Data ?
![Page 6: Introduction to Big Data An analogy between Sugar Cane & Big Data](https://reader035.vdocuments.net/reader035/viewer/2022081413/5495968db47959b3088b4681/html5/thumbnails/6.jpg)
Big Data is data that exceeds the processing capacity of conventional database systems.
It’s too big, too fast or does not fit the structures of database architectures.To gain value from this type of data you need an alternative way to process it.
Why this is happening ?Data is growing faster than computers are getting bigger.
![Page 7: Introduction to Big Data An analogy between Sugar Cane & Big Data](https://reader035.vdocuments.net/reader035/viewer/2022081413/5495968db47959b3088b4681/html5/thumbnails/7.jpg)
A catch-all term.Includes Social Networks data, Web logs, MP3s, Web pages unstructured content, XML, GPS tracking data, Vehicles Telemetry, financial market data and many more…
Can be characterized by the 3 Vs :-
Image Source: Tom Kyte’s Big Data Are you ready ? presentation
![Page 8: Introduction to Big Data An analogy between Sugar Cane & Big Data](https://reader035.vdocuments.net/reader035/viewer/2022081413/5495968db47959b3088b4681/html5/thumbnails/8.jpg)
VolumeData growing faster than machines getting bigger. Data sources adding up..
VelocityRate of acquisition and desired rate of consumption.
VarietyExtends beyond structured data, includes unstructured data of all varieties.
Image Source: Tom Kyte’s Big Data Are you ready ? presentation
![Page 9: Introduction to Big Data An analogy between Sugar Cane & Big Data](https://reader035.vdocuments.net/reader035/viewer/2022081413/5495968db47959b3088b4681/html5/thumbnails/9.jpg)
Where does Big Data apply?
![Page 10: Introduction to Big Data An analogy between Sugar Cane & Big Data](https://reader035.vdocuments.net/reader035/viewer/2022081413/5495968db47959b3088b4681/html5/thumbnails/10.jpg)
Big Data value to an Organisation falls into two main categories :
Analytical Use
Enabling new products and services
![Page 11: Introduction to Big Data An analogy between Sugar Cane & Big Data](https://reader035.vdocuments.net/reader035/viewer/2022081413/5495968db47959b3088b4681/html5/thumbnails/11.jpg)
Analytical Use
To reveal insights previously hidden because hard to record and exploit.
An edge on classic Analytics based on sampling and more “static” & predetermined reports.
It promotes an investigative approach to data and put the data scientist and analyst in the spotlight.
Hal Varian, chief economist at Google“I keep saying that the sexy job in the next 10 years will be statisticians”
![Page 12: Introduction to Big Data An analogy between Sugar Cane & Big Data](https://reader035.vdocuments.net/reader035/viewer/2022081413/5495968db47959b3088b4681/html5/thumbnails/12.jpg)
Some terms linked to the Analytical Use of Big Data
Sentiment Analysis :Mining the Web in real time and getting a quick read of what people are thinking.
Named-entity recognition (NER) (also known as entity identification and entity extraction) is a subtask of information extraction that seeks to locate and classify atomic elements in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.(ex: Big B in a tweet is for Big
Brother or Amitabh Bachan)
![Page 13: Introduction to Big Data An analogy between Sugar Cane & Big Data](https://reader035.vdocuments.net/reader035/viewer/2022081413/5495968db47959b3088b4681/html5/thumbnails/13.jpg)
Product/Service Enabler
Some products and services cannot exist if not backed up by Big Data technologies:-Need to Scale-Need a fast Feedback Loop on complex analytics.
Highly successful Web startups pioneering Big Data technologies through R&D to enable new type of products are a good example:Google, Yahoo, Amazon,Facebook.
![Page 14: Introduction to Big Data An analogy between Sugar Cane & Big Data](https://reader035.vdocuments.net/reader035/viewer/2022081413/5495968db47959b3088b4681/html5/thumbnails/14.jpg)
Sectors with Fast Adoption and High Potential
Financial SectorTelecommunications
GovernmentHealthRetail
![Page 15: Introduction to Big Data An analogy between Sugar Cane & Big Data](https://reader035.vdocuments.net/reader035/viewer/2022081413/5495968db47959b3088b4681/html5/thumbnails/15.jpg)
Big Data Sources :Internal & Data Marketplaces.
![Page 16: Introduction to Big Data An analogy between Sugar Cane & Big Data](https://reader035.vdocuments.net/reader035/viewer/2022081413/5495968db47959b3088b4681/html5/thumbnails/16.jpg)
Internal sources
Time Attendance logsRFID sensors logs
Security LogsVehicles GPS tracking
Machinery/Telemetry LogsPictures & videos
Enterprise Social NetworksService Forum/Discussions
….
Mostly anything unstructured or simply structured
![Page 17: Introduction to Big Data An analogy between Sugar Cane & Big Data](https://reader035.vdocuments.net/reader035/viewer/2022081413/5495968db47959b3088b4681/html5/thumbnails/17.jpg)
Source: DataSift.com
External Sources (feeders/data marketplaces)Examples: Infochimps.com, DataSift.com, datamarket.azure.com
![Page 18: Introduction to Big Data An analogy between Sugar Cane & Big Data](https://reader035.vdocuments.net/reader035/viewer/2022081413/5495968db47959b3088b4681/html5/thumbnails/18.jpg)
An Enterprise Architecture for Big Data
An analogy with a Sugar Cane Factory
![Page 19: Introduction to Big Data An analogy between Sugar Cane & Big Data](https://reader035.vdocuments.net/reader035/viewer/2022081413/5495968db47959b3088b4681/html5/thumbnails/19.jpg)
AQUIRE (HARVEST)
EXTRACT/SCHRED
EVAPORATE/DISTILL/BOIL
DRY/STORE/SUGAR
A Sugar Factory
= VALUEBOTTOM LINE
SUGAR CANE FIELDS
![Page 20: Introduction to Big Data An analogy between Sugar Cane & Big Data](https://reader035.vdocuments.net/reader035/viewer/2022081413/5495968db47959b3088b4681/html5/thumbnails/20.jpg)
An Enterprise Big Data Factory
AQUIRE (HARVEST)
ORGANIZE(EXTRACT)
ANALYSE (SCHRED/DISTILL/BOIL)
BUSINESS INTELLIGENCE
(DECIDE)
= VALUEBOTTOM LINE
DATA SOURCES(RDBMS &
Data Marketplaces)
HDFS(Hadoop Distributed FS)
NoSQL Database(Hadoop Distributed FS)
RDBMSEnterprise Applications
Map Reduce(Hadoop)
Big DataConnectors
RDBMSConnectors
Data Warehousing / RDBMS stores
Analytic Applicationsthe sweet part (sugar/rhum)
![Page 21: Introduction to Big Data An analogy between Sugar Cane & Big Data](https://reader035.vdocuments.net/reader035/viewer/2022081413/5495968db47959b3088b4681/html5/thumbnails/21.jpg)
Some Factories & architectures from vendors
![Page 22: Introduction to Big Data An analogy between Sugar Cane & Big Data](https://reader035.vdocuments.net/reader035/viewer/2022081413/5495968db47959b3088b4681/html5/thumbnails/22.jpg)
Greenplum (EMC2)An Example of a Turnkey Factory Solution
![Page 23: Introduction to Big Data An analogy between Sugar Cane & Big Data](https://reader035.vdocuments.net/reader035/viewer/2022081413/5495968db47959b3088b4681/html5/thumbnails/23.jpg)
Another “Turnkey Factory” Example from OracleTargeting high-end Analytics
AQUIRE (HARVEST)
ORGANIZE(EXTRACT)
ORGANIZE(EXTRACT)ANALYSE
(SCHRED/DISTILL/BOIL)
BUSINESS INTELLIGENCE
(DECIDE)
Image Source: Tom Kyte’s Big Data Are you ready ? presentation
![Page 24: Introduction to Big Data An analogy between Sugar Cane & Big Data](https://reader035.vdocuments.net/reader035/viewer/2022081413/5495968db47959b3088b4681/html5/thumbnails/24.jpg)
+ Of Course, you can build your own factory using OpenSource widely available and on which most
turnkey factory are built.
The Microsoft way
![Page 25: Introduction to Big Data An analogy between Sugar Cane & Big Data](https://reader035.vdocuments.net/reader035/viewer/2022081413/5495968db47959b3088b4681/html5/thumbnails/25.jpg)
Technologies behind Big Data
![Page 26: Introduction to Big Data An analogy between Sugar Cane & Big Data](https://reader035.vdocuments.net/reader035/viewer/2022081413/5495968db47959b3088b4681/html5/thumbnails/26.jpg)
Factory blocks & screws used for engineering solutions
![Page 27: Introduction to Big Data An analogy between Sugar Cane & Big Data](https://reader035.vdocuments.net/reader035/viewer/2022081413/5495968db47959b3088b4681/html5/thumbnails/27.jpg)
NoSQL will kill SQL ?!
![Page 28: Introduction to Big Data An analogy between Sugar Cane & Big Data](https://reader035.vdocuments.net/reader035/viewer/2022081413/5495968db47959b3088b4681/html5/thumbnails/28.jpg)
Turning RDBMS to a legacy data store ?
Not at all.
We need RDBMS to store high value data and for its feature rich approach (feature first).
NoSQL (scale first) is not a superset of RDBMS technologies (a bit like Einstein Relativity to Newton Physics).
Remember NoSQL is not “No SQL” but “Not Only SQL”
![Page 29: Introduction to Big Data An analogy between Sugar Cane & Big Data](https://reader035.vdocuments.net/reader035/viewer/2022081413/5495968db47959b3088b4681/html5/thumbnails/29.jpg)
Big Data future
![Page 30: Introduction to Big Data An analogy between Sugar Cane & Big Data](https://reader035.vdocuments.net/reader035/viewer/2022081413/5495968db47959b3088b4681/html5/thumbnails/30.jpg)
Rise of Data Marketplaces
Data Science tools development:More powerful & expressive toolsets for analysis
Streaming Data processing emerging tools(Twitter Storm, Yahoo s4, Streambase) :Real-time enablement / Live BI
Further cloud-enablement
Ease of integration to Enterprise Sources
![Page 31: Introduction to Big Data An analogy between Sugar Cane & Big Data](https://reader035.vdocuments.net/reader035/viewer/2022081413/5495968db47959b3088b4681/html5/thumbnails/31.jpg)
Conclusion
![Page 32: Introduction to Big Data An analogy between Sugar Cane & Big Data](https://reader035.vdocuments.net/reader035/viewer/2022081413/5495968db47959b3088b4681/html5/thumbnails/32.jpg)
To leverage Big Data you need something like a Sugar Factory.It can be very entry level factory (Excel – Azure Source)or more complex. The more complex and complete the more value at the end of the processing chain
To turn Big Data technologies from developer-centric solutions to enterprise solutions, they must be combined with SQL solutions into a single proven infrastructure meeting manageability and security requirements of enterprises.
![Page 33: Introduction to Big Data An analogy between Sugar Cane & Big Data](https://reader035.vdocuments.net/reader035/viewer/2022081413/5495968db47959b3088b4681/html5/thumbnails/33.jpg)
The challenge for Enterprises is to simplify Big Data integration/engineering and leverage it where possible to improve their processes at tactical and strategic levels.
Architects & DBAs will be able to make choices for datastores technologies and will need to understand where one is better than the other.
Big Data has to be part of the Enterprise Applications EcoSystem where it will be turned to value.
![Page 34: Introduction to Big Data An analogy between Sugar Cane & Big Data](https://reader035.vdocuments.net/reader035/viewer/2022081413/5495968db47959b3088b4681/html5/thumbnails/34.jpg)
Thank you.