essential tools for your big data arsenal
DESCRIPTION
For some, Hadoop is synonymous with “Big Data,” but Hadoop is just one component of a successful Big Data architecture. Depending on one’s application, it may not even be the most important part. NoSQL solutions like MongoDB also play a dominant role for storage and real-time data processing, helping companies keep pace with the scale of their data requirements. But NoSQL figures even more prominently in helping enterprises consume a wide variety of data sources at speeds not currently possible in Hadoop. NoSQL, then, offers a useful complement to Hadoop, as well as the transaction-based data of traditional RDBMSs. Tackling Big Data is not a one-tool job, and so the orchestration of the appropriate NoSQL database with Hadoop and RDBMS is essential. In this session, we’ll dig deep into the different types of NoSQL, identifying how they differ and the types of Big Data workloads for which they’re best suited. We’ll also explore the trade-offs one makes in choosing NoSQL databases like MongoDB or Neo4j over an RDBMS like MySQL, and when it makes sense to use both Hadoop and NoSQL and when it’s more appropriate to use NoSQL on its own.TRANSCRIPT
![Page 1: Essential Tools For Your Big Data Arsenal](https://reader035.vdocuments.net/reader035/viewer/2022062513/554bf8a3b4c9053f078b461e/html5/thumbnails/1.jpg)
Matt Asay (@mjasay)VP, Business Development & Strategy, MongoDB
Essential Tools For Your Big Data Arsenal
![Page 2: Essential Tools For Your Big Data Arsenal](https://reader035.vdocuments.net/reader035/viewer/2022062513/554bf8a3b4c9053f078b461e/html5/thumbnails/2.jpg)
The Big Data Unknown
![Page 3: Essential Tools For Your Big Data Arsenal](https://reader035.vdocuments.net/reader035/viewer/2022062513/554bf8a3b4c9053f078b461e/html5/thumbnails/3.jpg)
3
Top Big Data Challenges?
Translation? Most struggle to know what Big Data is, how to manage it and who can manage it
Source: Gartner
![Page 4: Essential Tools For Your Big Data Arsenal](https://reader035.vdocuments.net/reader035/viewer/2022062513/554bf8a3b4c9053f078b461e/html5/thumbnails/4.jpg)
4
Understanding Big Data – It’s Not Very “Big”
from Big Data Executive Summary – 50+ top executives from Government and F500 firms
64% - Ingest diverse, new data in real-time
15% - More than 100TB of data
20% - Less than 100TB (average of all? <20TB)
![Page 5: Essential Tools For Your Big Data Arsenal](https://reader035.vdocuments.net/reader035/viewer/2022062513/554bf8a3b4c9053f078b461e/html5/thumbnails/5.jpg)
Innovation As Iteration
![Page 6: Essential Tools For Your Big Data Arsenal](https://reader035.vdocuments.net/reader035/viewer/2022062513/554bf8a3b4c9053f078b461e/html5/thumbnails/6.jpg)
“I have not failed. I've just found 10,000 ways that won't work.” ― Thomas A. Edison
![Page 7: Essential Tools For Your Big Data Arsenal](https://reader035.vdocuments.net/reader035/viewer/2022062513/554bf8a3b4c9053f078b461e/html5/thumbnails/7.jpg)
7
Back in 1970…Cars Were Great!
![Page 8: Essential Tools For Your Big Data Arsenal](https://reader035.vdocuments.net/reader035/viewer/2022062513/554bf8a3b4c9053f078b461e/html5/thumbnails/8.jpg)
8
So Were Computers!
![Page 9: Essential Tools For Your Big Data Arsenal](https://reader035.vdocuments.net/reader035/viewer/2022062513/554bf8a3b4c9053f078b461e/html5/thumbnails/9.jpg)
9
Lots of Great Innovations Since 1970
![Page 10: Essential Tools For Your Big Data Arsenal](https://reader035.vdocuments.net/reader035/viewer/2022062513/554bf8a3b4c9053f078b461e/html5/thumbnails/10.jpg)
10
Including the Relational Database
![Page 11: Essential Tools For Your Big Data Arsenal](https://reader035.vdocuments.net/reader035/viewer/2022062513/554bf8a3b4c9053f078b461e/html5/thumbnails/11.jpg)
11
RDBMS Makes Development Hard
Relational Database
Object Relational Mapping
Application
Code XML Config DB Schema
![Page 12: Essential Tools For Your Big Data Arsenal](https://reader035.vdocuments.net/reader035/viewer/2022062513/554bf8a3b4c9053f078b461e/html5/thumbnails/12.jpg)
12
And Even Harder To Iterate
New Table
New Table
New Column
Name Pet Phone Email
New Column
3 months later…
![Page 13: Essential Tools For Your Big Data Arsenal](https://reader035.vdocuments.net/reader035/viewer/2022062513/554bf8a3b4c9053f078b461e/html5/thumbnails/13.jpg)
13
RDBMS
From Complexity to Simplicity
MongoDB
{
_id : ObjectId("4c4ba5e5e8aabf3"),
employee_name: "Dunham, Justin",
department : "Marketing",
title : "Product Manager, Web",
report_up: "Neray, Graham",
pay_band: “C",
benefits : [
{ type : "Health",
plan : "PPO Plus" },
{ type : "Dental",
plan : "Standard" }
]
}
![Page 14: Essential Tools For Your Big Data Arsenal](https://reader035.vdocuments.net/reader035/viewer/2022062513/554bf8a3b4c9053f078b461e/html5/thumbnails/14.jpg)
14
So…Use Open Source
![Page 15: Essential Tools For Your Big Data Arsenal](https://reader035.vdocuments.net/reader035/viewer/2022062513/554bf8a3b4c9053f078b461e/html5/thumbnails/15.jpg)
15
Big Data != Big Upfront Payment
![Page 16: Essential Tools For Your Big Data Arsenal](https://reader035.vdocuments.net/reader035/viewer/2022062513/554bf8a3b4c9053f078b461e/html5/thumbnails/16.jpg)
16
RDBMS Is Expensive To Scale
“Clients can also opt to run zEC12 without a raised datacenter floor -- a first for high-end IBM mainframes.”
IBM Press Release 28 Aug, 2012
![Page 17: Essential Tools For Your Big Data Arsenal](https://reader035.vdocuments.net/reader035/viewer/2022062513/554bf8a3b4c9053f078b461e/html5/thumbnails/17.jpg)
17
Spoiled for choice
1 Oracle Relational DBMS 1583.84 54.232 MySQL Relational DBMS 1331.34 25.583 Microsoft SQL Server Relational DBMS 1207 -106.784 PostgreSQL Relational DBMS 177.01 -5.225 DB2 Relational DBMS 175.83 3.586 MongoDB NoSQL Document Store 149.48 -2.717 Microsoft Access Relational DBMS 142.49 -4.218 SQLite Relational DBMS 77.88 -4.99 Sybase Relational DBMS 73.66 -1.68
10 Teradata Relational DBMS 54.41 3.32
DB-Engines.com Database Ranking
![Page 18: Essential Tools For Your Big Data Arsenal](https://reader035.vdocuments.net/reader035/viewer/2022062513/554bf8a3b4c9053f078b461e/html5/thumbnails/18.jpg)
18
Remember the Long Tail?
![Page 19: Essential Tools For Your Big Data Arsenal](https://reader035.vdocuments.net/reader035/viewer/2022062513/554bf8a3b4c9053f078b461e/html5/thumbnails/19.jpg)
19
It Didn’t Work Out So Well
![Page 20: Essential Tools For Your Big Data Arsenal](https://reader035.vdocuments.net/reader035/viewer/2022062513/554bf8a3b4c9053f078b461e/html5/thumbnails/20.jpg)
20
Use Popular, Well-Known Technologies
Source: Silicon Angle, 2012
![Page 21: Essential Tools For Your Big Data Arsenal](https://reader035.vdocuments.net/reader035/viewer/2022062513/554bf8a3b4c9053f078b461e/html5/thumbnails/21.jpg)
21
Ask the Right Questions…
“Organizations already have people who know their own data better than mystical data scientists….Learning Hadoop [or MongoDB] is easier than learning the company’s business.”
(Gartner, 2012)
![Page 22: Essential Tools For Your Big Data Arsenal](https://reader035.vdocuments.net/reader035/viewer/2022062513/554bf8a3b4c9053f078b461e/html5/thumbnails/22.jpg)
22
Leverage Existing Skills
![Page 23: Essential Tools For Your Big Data Arsenal](https://reader035.vdocuments.net/reader035/viewer/2022062513/554bf8a3b4c9053f078b461e/html5/thumbnails/23.jpg)
23
Search as a Sign?
![Page 24: Essential Tools For Your Big Data Arsenal](https://reader035.vdocuments.net/reader035/viewer/2022062513/554bf8a3b4c9053f078b461e/html5/thumbnails/24.jpg)
When To Use Hadoop, NoSQL
![Page 25: Essential Tools For Your Big Data Arsenal](https://reader035.vdocuments.net/reader035/viewer/2022062513/554bf8a3b4c9053f078b461e/html5/thumbnails/25.jpg)
25
Enterprise Big Data Stack
EDWHadoop
Man
agem
ent
& M
on
ito
rin
gS
ecurity &
Au
ditin
g
RDBMS
CRM, ERP, Collaboration, Mobile, BI
OS & Virtualization, Compute, Storage, Network
RDBMS
Applications
Infrastructure
Data Management
Online Data Offline Data
![Page 26: Essential Tools For Your Big Data Arsenal](https://reader035.vdocuments.net/reader035/viewer/2022062513/554bf8a3b4c9053f078b461e/html5/thumbnails/26.jpg)
26
Consideration – Online vs. Offline
• Long-running• High-Latency• Availability is lower
priority
• Real-time• Low-latency• High availability
Online Offlinevs.
![Page 27: Essential Tools For Your Big Data Arsenal](https://reader035.vdocuments.net/reader035/viewer/2022062513/554bf8a3b4c9053f078b461e/html5/thumbnails/27.jpg)
27
Consideration – Online vs. Offline
Online Offlinevs.
![Page 28: Essential Tools For Your Big Data Arsenal](https://reader035.vdocuments.net/reader035/viewer/2022062513/554bf8a3b4c9053f078b461e/html5/thumbnails/28.jpg)
28
Hadoop Is Good for…
Risk Modeling Churn AnalysisRecommendation
Engine
Ad TargetingTransaction
AnalysisTrade
Surveillance
Network Failure Prediction
Search Quality Data Lake
![Page 29: Essential Tools For Your Big Data Arsenal](https://reader035.vdocuments.net/reader035/viewer/2022062513/554bf8a3b4c9053f078b461e/html5/thumbnails/29.jpg)
29
MongoDB/NoSQL Is Good for…
360° View of the Customer
Mobile & Social Apps
Fraud Detection
User Data Management
Content Management &
DeliveryReference Data
Product CatalogsMachine to
Machine AppsData Hub
![Page 30: Essential Tools For Your Big Data Arsenal](https://reader035.vdocuments.net/reader035/viewer/2022062513/554bf8a3b4c9053f078b461e/html5/thumbnails/30.jpg)
How To Use The Two Together?
![Page 31: Essential Tools For Your Big Data Arsenal](https://reader035.vdocuments.net/reader035/viewer/2022062513/554bf8a3b4c9053f078b461e/html5/thumbnails/31.jpg)
31
Finding Waldo
![Page 32: Essential Tools For Your Big Data Arsenal](https://reader035.vdocuments.net/reader035/viewer/2022062513/554bf8a3b4c9053f078b461e/html5/thumbnails/32.jpg)
32
Customer example: Online Travel
Travel
• Flights, hotels and cars
• Real-time offers• User profiles, reviews• User metadata
(previous purchases, clicks, views)
• User segmentation• Offer recommendation
engine• Ad serving engine• Bundling engine
Algorithms
MongoDB Connector for
Hadoop
![Page 33: Essential Tools For Your Big Data Arsenal](https://reader035.vdocuments.net/reader035/viewer/2022062513/554bf8a3b4c9053f078b461e/html5/thumbnails/33.jpg)
33
Predictive Analytics
Government
• Predictive analytics system for crime, health issues
• Diverse, unstructured (incl. geospatial) data from 30+ agencies
• Correlate data in real-time
• Long-form trend analysis• MongoDB data dumped
into Hadoop, analyzed, re-inserted into MongoDB for better real-time response
Algorithms
MongoDB
+ Hadoop
![Page 34: Essential Tools For Your Big Data Arsenal](https://reader035.vdocuments.net/reader035/viewer/2022062513/554bf8a3b4c9053f078b461e/html5/thumbnails/34.jpg)
34
Data Hub
Insurance
• Insurance policies• Demographic data• Customer web data• Call center data• Real-time churn
detection
• Customer action analysis
• Churn prediction algorithms
Churn Analysis
MongoDB Connector for
Hadoop
![Page 35: Essential Tools For Your Big Data Arsenal](https://reader035.vdocuments.net/reader035/viewer/2022062513/554bf8a3b4c9053f078b461e/html5/thumbnails/35.jpg)
35
Machine Learning
Ad-Serving
• Catalogs and products
• User profiles• Clicks• Views• Transactions
• User segmentation• Recommendation
engine• Prediction engine
Algorithms
MongoDB Connector for
Hadoop
![Page 36: Essential Tools For Your Big Data Arsenal](https://reader035.vdocuments.net/reader035/viewer/2022062513/554bf8a3b4c9053f078b461e/html5/thumbnails/36.jpg)
36
• Makes MongoDB a Hadoop-enabled file system
• Read and write to live data, in-place
• Copy data between Hadoop and MongoDB
• Full support for data processing
– Hive
– MapReduce
– Pig
– Streaming
– EMR
MongoDB + Hadoop Connector
MongoDB Connector for
Hadoop
![Page 37: Essential Tools For Your Big Data Arsenal](https://reader035.vdocuments.net/reader035/viewer/2022062513/554bf8a3b4c9053f078b461e/html5/thumbnails/37.jpg)
@mjasay