complement your existing data warehouse with big data & hadoop
Post on 06-May-2015
2.263 Views
Preview:
DESCRIPTION
TRANSCRIPT
© 2013 Datameer, Inc. All rights reserved.
Complement Your Existing Data Warehouse with Big Data & Hadoop
View Recording ▪ You can view the recording of this
webinar at: ▪ http://info.datameer.com/Slideshare-Complement-Your-Existing-EDW-with-
Hadoop-OnDemand.html
About our Speakers
Karen Hsu – Karen is Senior Director, Product Marketing
at Datameer. With over 15 years of experience in enterprise software, Karen Hsu has co-authored 4 patents and worked in a variety of engineering, marketing and sales roles.
– Most recently she came from Informatica where she worked with the start-ups Informatica purchased to bring data quality, master data management, B2B and data security solutions to market.
– Karen has a Bachelors of Science degree in Management Science and Engineering from Stanford University.
About our Speakers Jeff Bean
– Jeff Bean has been at Cloudera since 2010. He's helped several of Cloudera's most important customers and partners through their adoptions of Hadoop and HBase, including cluster sizing, deployment, operations, application design, and optimization. "
– Jeff has also spent time on Cloudera's training team, where he focused on partner enablement, training hundreds of field personnel in Hadoop, it's usage, and it's position in the market. Jeff currently does partner engineering at Cloudera, where he handles field support, certifications, and joint engagements with partners such as Datameer. "
© 2013 Datameer, Inc. All rights reserved.
How Big Data Analytics and Hadoop Complement Your Existing Data Warehouse Jeff Bean, Cloudera Karen Hsu, Datameer
Agenda • Why optimize? • What to optimize? • How to optimize?
• Who has optimized already? • Conclusion
Data Has Changed in the Last 30 Years D
ATA
GR
OW
TH
END-USER APPLICATIONS THE INTERNET
MOBILE DEVICES
SOPHISTICATED MACHINES
STRUCTURED DATA – 10%
1980 2013
UNSTRUCTURED DATA – 90%
EDW Expansion: A Vicious Cycle § Increasing
numbers of users
§ Growing volumes of data
§ Addi7onal data sources
§ New use cases
§ Degraded quality of service and inability to meet SLAs
§ Constant pressure to purchase addi7onal capacity
Enterprise Data
Warehouse
Hadoop vs. Data Warehouse:Freeing up Capacity for High Value Workloads
Today All growth accommodated by incremental investment
in DW
100 TB
Data Warehouse $20,000 -‐ $100,000 / TB
100 TB 100 TB
More Capacity in Data Warehouse
Incremental Spend: $2 to $10 Million
100% Data Growth +
11
Hadoop vs. Data Warehouse:Freeing up Capacity for High Value Workloads
FutureHadoop offloads data and workloads to defer/avoid incremental spend and reduce data management TCO
Lower Value Data
50 TB 100 TB
Cloudera / Datameer (Total Cost of Cluster) $1,000 -‐ $2,000 / TB Incremental Spend:
$240,000-‐ $300,000 ACV
Keep the Right Data in the Data Warehouse System • Opera7onal Analy7cs • Repor7ng • Business Analy7cs
Use Hadoop for Everything ElseSavings: $1.85 to 9.8 MM • Historical Data • Data Processing • Ad Hoc Exploratory • Transforma7on / Batch • Data Hub
100 TB High Value Data 50 TB
Agenda • Why optimize? • What to optimize? • How to optimize?
• Who has optimized already? • Conclusion
Data Warehouse
Operational Business Intelligence
Analytics Self-Service BI
Data Processing (ELT)
Staged Data
Operational Data Archival Data
WO
RK
LOA
DS
D
ATA
Assessing Workloads and Data
▪ Data Processing (ELT) – Staged data, to be processed – Temp tables, BLOB/CLOB types, …
▪ Analytics / Machine Learning – Deep and broad data sets, within
and beyond the warehouse
▪ Self-Service BI (Ad-Hoc Query) – Operational data, actively used for BI – Archival data, inactively used for BI
14
Offload Data Processing (ELT)
15
High-scale batch data processing
Integrate any type of data with pre-built connectors
High availability, disaster recovery, downtime-less upgrades
Low-latency SQL processing
What?
Benefits of Cloudera and Datameer
Key Capabilities
Over 2X the performance at 1/10th the cost 96% reduction in ETL time
Offload Analytics / Machine Learning
Training & scoringpredictive models
Deep and broad data sets
Drag-and-drop Data Mining and Machine Learning for a business analyst
Automated support for Clustering, Recommendations, Decision Tree, and Column Dependencies
Ability to run SAS, R natively on the same cluster
What? Key Capabilities
Benefits of Cloudera and Datameer
Greater flexibility at 1/10th the cost Expand data mining and machine learning to analysts
Offload Self-Service Business Intelligence
Self-Service BI,Exploratory BI,Data Discovery
Unknown Questions
Workload
250+ prebuilt analytics functions
Transparency and governance
Open source interactive SQL
Key Capabilities
Benefits of Cloudera and Datameer
Better flexibility at 1/10th the cost Reduce analysis time from 4 weeks to 3 days
Complementing the Data Warehouse
19
OLTP
Enterprise Applications
Business Intelligence
Data Warehouse
Query(High $/Byte)
CLOUDERA / DATAMEER ETL
Load Archive
Operational BI
Archival Data, Exploration, Analytics
Batch Process
Storage
Search Analyze Integrate
Vis
Agenda • Why optimize? • What to optimize? • How to optimize?
• Who has optimized already? • Conclusion
Process!
Integrate!
Prepare and!Analyze!
Visualize and !Validate!
Define!Deploy!
Ad Hoc
Production
© 2013 Datameer, Inc. All rights reserved.
Define!Profile and Assess " Workloads in EDW"" Ability to migrate"" Size of data set"
Prioritize " Constraints"" Portability"" Disruption"
Identify " Use cases"" Return on investment"
© 2013 Datameer, Inc. All rights reserved.
Codeless Integration " ELT, not ETL"" 50+ Datameer connectors, plug-in API"
Integrate!Migration " Data ingest paths"" Map EDW workload to Cloudera"
© 2013 Datameer, Inc. All rights reserved.
Interactive Data Preparation
" Ensure Data Quality"
" Enrich data"
Interactive + Smart Analytics
" 250+ built-in functions"
" Automated machine learning"
Transparency + Governance
" Visual data lineage"
" Complete audit trail"
" Metadata catalog"
Prepare and Analyze!
© 2013 Datameer, Inc. All rights reserved.
Validate " Verify results"
" Tune"
Visualization Anywhere " Infographic or dashboard"
" Run on tablets and smart phone devices"
Visualize and Validate!
© 2013 Datameer, Inc. All rights reserved.
Scheduling " Dependency triggers"
" Data synchronization"
" External scheduling integration"
Monitoring " Monitoring system, jobs, performance, throughput"
" Error handling"
" Log management"
Deploy!Security " LDAP / Active Directory "
" Role based access control"
" Support for Kerberos"
Role Responsibilities
Admin Set up and maintain environment
Business Analyst Work with partners to define requirements and define goals
Deployment Team Set up monitoring and scheduling
ETL Architect Prepare and cleanse data
Roles Mapped to Process!
Define
Integrate
Prepare & Analyze
Visualize
Deploy
BA
Admin
BA / Arch.
BA
Admin /Deploy. Team
Define goals, results, sources, requirements
Source data, secure for ad hoc
Cleanse, combine, enrich data Create analysis
Create infographics, dashboards
Business: Validate with end users Technical: Secure, monitor schedule
Use Cases
Operational Customer Fraud and Compliance
Customer
Reduce customer acquisition costs by 30%
H E L L O my name is
greg 7-ELEVEN
$4.10
$3.22 $4.55
$5.15 $4.15
$3.95
Location Data Transactions Authorizations POS Reports
Identify $2B in fraudulent transactions
Structured Logs
Network Data
Unstructured Logs
Doubling in size every 15 months
111001 110010 01101001 01100100 10011101 01101110
Improve customer service, development, sales
Calculating ROI is a process
Apply ROI to Multiple Projects
Calculating Return
Business Benefits
Funnel Optimization
Behavioral Analytics
Fraud Prevention
Customer Segmentation
Increase Customer conversion by 3x
Increase Revenue by 2x
Identify $2B in potential fraud
Lower Customer Acquisition Costs by 30%
EDW Optimization
Enterprise Data Warehouse
Discover fraud in less time – from 2 days to 2 hours, save $30M on DR
Avoid tens of millions in expansion purchases
Offload 90% of all data
Shrank EDW footprint by 4PB, 20x performance boost
Call to Action ▪ ROI and Solution Development
Consultation ▪ Join us at Hadoop World ▪ Contacts
– Jeff Bean jwfbean@cloudera.com – Karen Hsu khsu@datameer.com
top related