big data use cases
DESCRIPTION
Everyone is awash in the new buzzword, Big Data, and it seems as if you can’t escape it wherever you go. But there are real companies with real use cases creating real value for their businesses by using big data. This talk will discuss some of the more compelling current or recent projects, their architecture & systems used, and successful outcomes.TRANSCRIPT
1
Big Data Use Cases*
DevNexus Conference2/18/2013
*Fully buzzword-compliant title
2
whoami• Brad Anderson• Solutions Architect at MapR (Atlanta)• ATLHUG co-chair• NoSQL East Conference 2009• “boorad” most places (twitter, github)• [email protected]
3
Service Bureau
Client/Server
Application Service Provider
Cloud
B2B
Software-as-a-Service
Virtualization
Social Media
Mobile
Web 2.0
4
BIG DATA
5
6
Business Value
7
Business Value
8
Big Data is not new!but the tools are.
9
Ship the Function to the Data
SAN/NAS
data data data
data data data
data data data
data data data
data data data
function
RDBMS
Traditional Architecture
data
function
data
function
data
function
data
function
data
function
data
function
data
function
data
function
data
function
data
function
data
function
data
function
Distributed Computing
10
Variation: Multiple MapReducesExample: Fraud Detection in User Transactions
LDA training
Transaction data
LDA scoring
HBase /MapR M7 Edition
G2 score
Candidate events for analyst review
95 %-ile LDA anomaly
MapReduce
http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation
11
MapR Distribution for Apache Hadoop
Complete Hadoop distribution
Comprehensive management suite
Industry-standard interfaces
Enterprise-grade dependability
Higher performance
Pig
Hive
HBase
Mahout
Oozie
Whirr
Map Reduce
Cascading
Nagios
Ganglia
MapR Control System
MapR Data Platform
MapR Control System
MapR Data Platform
Flume
Sqoop
HCatalog
Zookeeper
Avro
Map
Reduc
e
12
Big Data Ecosystem
13
Use Case Company Data Source(s) Technique(s) Business Value
14
Proactive Monitoring
15
Server Telemetry Monitoring Logs Network Flow
Data Sources
16
Pattern Recognition Proactive Monitoring Early Alert Delivery
Techniques
17
Business Value
18
Telecommunications Giant
ETL Offload
19
Customer Records Contract Data Purchase Orders Call Center
Data SourcesTelecommunications
20
Techniques
AnalyticsETL
Telecommunications
21
Techniques
+
ETL (Hadoop) Analytics (Teradata)
Telecommunications
22
Business ValueTelecommunications
23
Customer Purchase History Merchant Designations Merchant Special Offers
Data Sources
Credit CardIssuer
24
Techniques
PurchaseHistory
Merchant Information
Merchant Offers
RecommendationEngine Results
(Mahout)
PresentationData Store
(DB2)
App
App
App
App
App
Hadoop Export(4 hrs)
Import(4 hrs)
Credit CardIssuer
25
Techniques
PurchaseHistory
Merchant Information
Merchant Offers
RecommendationEngine Results
(Mahout)
RecommendationSearch Index
(Solr)
App
App
App
App
App
Hadoop
IndexUpdate(2 min)
Credit CardIssuer
26
Business Value
Credit CardIssuer
27
Idle Alerts
Waste & Recycling Leader
28
Truck Geolocation Data– 20,000 trucks– 5 sec interval
Landfill Geographic Boundaries
Data Sources
29
Techniques
TruckGeolocation
Data
Realtime Stream Computation(Storm)
Batch Computation(MapReduce)
ImmediateAlerts
Tax ReductionReporting
HadoopStorage
Shortest PathGraph Algorithm
Route Optimization
30
Business Value
31
Fraud DetectionData Lake
32
Anti-Money Laundering Consumer Transactions
Data Sources
33
TechniquesAnti-Money Laundering
SystemConsumer Transactions
System
34
Techniques
AML
Consumer Transactions
Data Lake(Hadoop)
Suspicious Events
Latent Dirichlet Allocation,Bayesian Learning Neural Network,
Peer Group Analysis
Analyst
35
Business Value
36
Machine LearningSearch Relevance
DNA Matching
37
Birth, Death, Census, Military, Immigration records
Search Behavior Activity DNA SNP (snips)
Data Sources
38
Techniques Record Linking Search Relevance Clickstream Behavior Security Forensics DNA Matching
39
Business Value
40
Traffic Analytics
41
Inrix Road Segment Data– Avg Speed / minute / segment– Reference Speeds
Road Segment Geolocation Data
Data Sources
42
Techniques Bottleneck Detection Algorithm Time Offset Correlations– Alternate Routes
Predictive Congestion Analysis– Growth & Term Assumptions
43
44
45
Business Value
46
Similar Characteristics Lots of Data Structured, Semi-Structured, Unstructured Varied Systems Interoperating
– Hadoop, Storm, Solr, MPP, Visualizations
Increase Revenue Decrease Costs
47
Thank You