tech primer: big data in the cloud
TRANSCRIPT
![Page 1: Tech Primer: Big Data In the Cloud](https://reader034.vdocuments.net/reader034/viewer/2022052419/58a1b2631a28ab7d4d8c4c81/html5/thumbnails/1.jpg)
May 23, 2016 | Confidential
Tech Primer: Big Data In the Cloud
Hannah Smalltree, Cazena
Big Data & Cloud Expo
New York, June 2016
![Page 2: Tech Primer: Big Data In the Cloud](https://reader034.vdocuments.net/reader034/viewer/2022052419/58a1b2631a28ab7d4d8c4c81/html5/thumbnails/2.jpg)
Slide #2 | Confidential
Agenda
• Why Manage and Analyze Big
Data in the Cloud?
• Categories – Cloud and
Emerging Data Categories
• Criteria – Picking the Best
Solution for Your Needs
• Use Cases – How Techs Are
Being Used
![Page 3: Tech Primer: Big Data In the Cloud](https://reader034.vdocuments.net/reader034/viewer/2022052419/58a1b2631a28ab7d4d8c4c81/html5/thumbnails/3.jpg)
Slide #3 | Confidential
Agenda
• Why Manage and Analyze Big Data in the Cloud?
• Categories – Cloud and Emerging Data Categories
• Criteria – Picking the Best Solution for Your Needs
• Use Cases – How Techs Are Being Used
Hannah Smalltree Director, Cazena
Former Editorial Director/Reporter, TechTarget
![Page 4: Tech Primer: Big Data In the Cloud](https://reader034.vdocuments.net/reader034/viewer/2022052419/58a1b2631a28ab7d4d8c4c81/html5/thumbnails/4.jpg)
Slide #4 | Confidential
Level Set!
Why we’re keeping
things higher-level
for the next 30
minutes….
Data Platforms Map – June 2015 (C) 2015 By 451 Research LLC. All rights reserved GET THE FULL MAP FREE:
https://451research.com/blog/13-have-you-seen-our-data-platforms-map
![Page 5: Tech Primer: Big Data In the Cloud](https://reader034.vdocuments.net/reader034/viewer/2022052419/58a1b2631a28ab7d4d8c4c81/html5/thumbnails/5.jpg)
Slide #5 | Confidential
Why (or Why Not) Cloud for Big Data?
On-Prem Cloud
Best (or worst!) of both worlds
Hybrid
Existing architecture
Data sources (on-prem)
Existing processes
Security perceptions
Cost
Status quo
Elasticity (volume, compute)
Data sources (cloud)
Automation
Sharing (resources, data)
Cost
New capabilities
![Page 6: Tech Primer: Big Data In the Cloud](https://reader034.vdocuments.net/reader034/viewer/2022052419/58a1b2631a28ab7d4d8c4c81/html5/thumbnails/6.jpg)
Slide #6 | Confidential
Shifting Data Gravity
![Page 7: Tech Primer: Big Data In the Cloud](https://reader034.vdocuments.net/reader034/viewer/2022052419/58a1b2631a28ab7d4d8c4c81/html5/thumbnails/7.jpg)
Slide #7 | Confidential
How Companies Use the Cloud
Offload compute or
storage intensive
workloads
Create flexible sandboxes and
self-serve analytics environments
Improve data access and
performance for employees
and stakeholders
Reduce costs for disaster
recovery, testing/dev and
other functions
$
Collect, Store and Analyze
data generated in the cloud
Share and monetize Data with
Partners/customers
![Page 8: Tech Primer: Big Data In the Cloud](https://reader034.vdocuments.net/reader034/viewer/2022052419/58a1b2631a28ab7d4d8c4c81/html5/thumbnails/8.jpg)
Slide #8 | Confidential
Big Data Services Cross Categories
Software as a Service Apps: Salesforce, Workday, etc.
Data: BI, Analytics, Analytic Applications
Platform as a Service (Middleware) 16 categories of xPaaS offerings: Application,
Database, Integration, Communication, Data…
Infrastructure as a Service Amazon Web Services (AWS), Microsoft
Azure, Google Cloud Platform
Hosted private clouds
Big Data
Services
![Page 9: Tech Primer: Big Data In the Cloud](https://reader034.vdocuments.net/reader034/viewer/2022052419/58a1b2631a28ab7d4d8c4c81/html5/thumbnails/9.jpg)
Slide #9 | Confidential
Cloud Databases
• Transactional: Power
sites, apps, etc.
• Analytical: Data
Warehouses, Data
Lakes, Big Data
Platforms, etc.
• SQL, Hadoop, NoSQL,
in-memory, etc.
• Solutions often include
storage, processing,
integration, visualization
What is a
Data Lake?
![Page 10: Tech Primer: Big Data In the Cloud](https://reader034.vdocuments.net/reader034/viewer/2022052419/58a1b2631a28ab7d4d8c4c81/html5/thumbnails/10.jpg)
Slide #10 | Confidential
As a Service Trend…
• Big Data as a Service
• Data Warehouse as a
Service
• Hadoop as a Service
• Data Lake as a Service
• Spark as a Service
• Managed Services
• Cloud Service
• Database Platform as a
Service
• Data Management as a Service
• Cloud Application Services
![Page 11: Tech Primer: Big Data In the Cloud](https://reader034.vdocuments.net/reader034/viewer/2022052419/58a1b2631a28ab7d4d8c4c81/html5/thumbnails/11.jpg)
Slide #11 | Confidential
Definitions…
• Gartner, Market Guide dbPaaS (June 2015):
A database platform as a service (dbPaaS) is a database
management system (DBMS) or data store engineered as a
scalable, elastic, multitenant service, with a degree of self-
service and sold and supported by a cloud service provider
(CSP), or a third-party software vendor on CSP infrastructure.
• Gartner, Cool Vendors in DBMS (April 2016):
“Enter the concept of ‘big data as a service,’ where vendors
are combining components of analytic platforms in the
cloud with multiple processing engines, hybrid on-premises
integration, and secure data movement. The use of such
services can speed up the adoption of analytics in the cloud,
address skills shortages within the enterprise, and make it easier
to transition from, and integrate with, existing on-premises
investments.”
• Forrester, Big Data Tech Radar (January 2016):
Big-data-as-a-service technology provides capture
management and operations capability delivered as-a-
service in the public or hybrid cloud. Uses generally include
SQL analytics (data warehouse or data mart), data lake,
machine learning, and operational analytics application support.
☑ Data processing
☑ Automated provisioning
☑ Faster implementation
☑ Support, service
☑ Subscription
☑ Maintenance
? Data movement
? Integration
? Security
? Ease of use
Common Attributes
![Page 12: Tech Primer: Big Data In the Cloud](https://reader034.vdocuments.net/reader034/viewer/2022052419/58a1b2631a28ab7d4d8c4c81/html5/thumbnails/12.jpg)
Slide #12 | Confidential
Best Advice: Focus on Requirements!
Best fit for workloads, provisioning
Security, encryption and governance
Integration with existing data flow
Data movement, connectors, etc.
Operations, support, maintenance, etc.
Contracts, pricing model
Futures, growth, lock-in
![Page 13: Tech Primer: Big Data In the Cloud](https://reader034.vdocuments.net/reader034/viewer/2022052419/58a1b2631a28ab7d4d8c4c81/html5/thumbnails/13.jpg)
Slide #13 | Confidential
Sample Cloud Use Cases
Consolidate data
Collect cloud, SaaS,
purchased data
Share and
monetize data
Analytics data
science sandbox
Offload EDW jobs
Disaster recovery
Data pipeline
Log, sensor and
IOT data
![Page 14: Tech Primer: Big Data In the Cloud](https://reader034.vdocuments.net/reader034/viewer/2022052419/58a1b2631a28ab7d4d8c4c81/html5/thumbnails/14.jpg)
Slide #14 | Confidential
What to Consider During
Evaluations….
![Page 15: Tech Primer: Big Data In the Cloud](https://reader034.vdocuments.net/reader034/viewer/2022052419/58a1b2631a28ab7d4d8c4c81/html5/thumbnails/15.jpg)
Slide #15 | Confidential
Evaluation Considerations: Workloads
CRITERIA
Data and Analytic
Workload Data type, volume, velocity, source, format, frequency...
Analytic Requirements Functions, tools, applications, API/dev requirements…
Processing
Engine(s) Price-performance, fit for purpose, maintenance, stability…
Scalability and Growth Likely growth in workload or analytic functions…
Security and Governance Compliance, Encryption, Tenancy, Access, Logs, Mgmt…
![Page 16: Tech Primer: Big Data In the Cloud](https://reader034.vdocuments.net/reader034/viewer/2022052419/58a1b2631a28ab7d4d8c4c81/html5/thumbnails/16.jpg)
Slide #16 | Confidential
Evaluation Considerations: Integration
CRITERIA
Data Collection, Movement
& Pipeline Ingest, structure, storage, frequency, movement…
Data Quality, Prep,
Integration Format, integration, identifiers, MDM, quality…
Existing Infrastructure Systems, processes, standards, integration, firewall…
Access and Delivery User locations, tools, applications, APIs, futures…
![Page 17: Tech Primer: Big Data In the Cloud](https://reader034.vdocuments.net/reader034/viewer/2022052419/58a1b2631a28ab7d4d8c4c81/html5/thumbnails/17.jpg)
Slide #17 | Confidential
Evaluation Considerations: Operational
CRITERIA
Implementation
“Time to Analytics”
Provisioning, project timeline, risk points, infra vs. analytics
Skills Available?, training, learning curve, culture..
Agility Implementation, value, change, fast fail…
Pricing, Budget Models, sourcing, lock-in, contingencies…
Service, Support Level, method, boundaries, components…
Vendor Stability, heritage, culture, agility…
Success Metrics Hard, soft, business, incremental, agility…
![Page 18: Tech Primer: Big Data In the Cloud](https://reader034.vdocuments.net/reader034/viewer/2022052419/58a1b2631a28ab7d4d8c4c81/html5/thumbnails/18.jpg)
Slide #18 | Confidential
Recommended Reading
• Forrester – Big Data Tech Radar, Q1 2016
– Big Data Options in the Cloud, Gualtieri & Staten, Dec 2014
• Gartner – Cool Vendors in DBMS and Big Data, April 2016
– Market Guide for Database Platform as a Service, June 2015
– Answering Big Data's 10 Biggest Planning & Implementation Questions, January 2015
– Toolkit: Big Data Business Opportunities From Over 100 Use Cases, July 2013
• Eckerson Group – Selecting a Big Data Platform: Building a Data Foundation for the Future, Dec 2015
– Big Data Analytics Benchmark Report, May 2015
• Others by request! ([email protected])
![Page 19: Tech Primer: Big Data In the Cloud](https://reader034.vdocuments.net/reader034/viewer/2022052419/58a1b2631a28ab7d4d8c4c81/html5/thumbnails/19.jpg)
Slide #19 | Confidential
Q&A and Thank You!
Hannah Smalltree
Cazena
Big Data as a Service
Cazena makes it easy for
enterprises to process big data in
the cloud, offering data marts, data
warehouses and data lakes as a
service, securely connected into
existing enterprise infrastructure.
![Page 20: Tech Primer: Big Data In the Cloud](https://reader034.vdocuments.net/reader034/viewer/2022052419/58a1b2631a28ab7d4d8c4c81/html5/thumbnails/20.jpg)
Slide #20 | Confidential
Additional Cloud Big Data Use
Cases (appendix for discussion
and sharing)
![Page 21: Tech Primer: Big Data In the Cloud](https://reader034.vdocuments.net/reader034/viewer/2022052419/58a1b2631a28ab7d4d8c4c81/html5/thumbnails/21.jpg)
Slide #21 | Confidential
Data Mart
Data Sources
Cloud Data
Sources
Cloud Data
Sources
Cloud Data
Sources
BI/Analytics Tools
• Consolidate data
from multiple cloud
and on-premises
systems in one
place for analytics
• Ensure data is
easily accessible
Consolidate Data for Agility,
Access
![Page 22: Tech Primer: Big Data In the Cloud](https://reader034.vdocuments.net/reader034/viewer/2022052419/58a1b2631a28ab7d4d8c4c81/html5/thumbnails/22.jpg)
Slide #22 | Confidential
Data Mart of
Data Lake
Enterprise Data
Warehouse
BI/Analytics Tools ETL
• Offload data or
compute-intensive
workloads from
existing data
warehouse to cloud
• Free capacity in on-
premises systems
Data Warehouse Offload
to the Cloud
![Page 23: Tech Primer: Big Data In the Cloud](https://reader034.vdocuments.net/reader034/viewer/2022052419/58a1b2631a28ab7d4d8c4c81/html5/thumbnails/23.jpg)
Slide #23 | Confidential
Data Sharing and Monetization
• Provide separate,
secure environment
for external
users/partners;
enable new analytic
capabilities
• Monetize data by
selling to customers
or creating/
enhancing data
products
Customer Partner Colleague
Data Marts
Enterprise Data
Warehouse
![Page 24: Tech Primer: Big Data In the Cloud](https://reader034.vdocuments.net/reader034/viewer/2022052419/58a1b2631a28ab7d4d8c4c81/html5/thumbnails/24.jpg)
Slide #24 | Confidential
Data Lake or Mart for
External Data
Cloud Data
Sources
SaaS or
Mobile Apps Purchased
Datasets
Data Mart
of Data
Lake
Enterprise Data
Warehouse
BI/Analytics
Tools
• Leverage new data
sources: web,
mobile, social, etc.
• Store, manage and
analyze cloud data
in the cloud, reduce
costs of managing
on-premises
• Or use cloud to
collect and pre-
process data before
bringing back to on-
premises systems
![Page 25: Tech Primer: Big Data In the Cloud](https://reader034.vdocuments.net/reader034/viewer/2022052419/58a1b2631a28ab7d4d8c4c81/html5/thumbnails/25.jpg)
Slide #25 | Confidential
Data Science Sandbox
On-premises
Datasets Analytical Tools
Data Mart or
Data Lake
Cloud Data
Sources New
Datasets
Statistical Tools
(R, R Studio, etc.)
• Self-service
environment for
analysts, data
scientists
• Track utilization and
costs separately
from production
systems
![Page 26: Tech Primer: Big Data In the Cloud](https://reader034.vdocuments.net/reader034/viewer/2022052419/58a1b2631a28ab7d4d8c4c81/html5/thumbnails/26.jpg)
Slide #26 | Confidential
Data Warehouse Disaster
Recovery
Data Mart of
Data Lake
Enterprise Data
Warehouse BI/Analytics
Tools
Enterprise Data
Warehouse
Old way
X
• Build a Disaster
Recovery
environment that
scales as DW
grows
• No need to buy
upfront capacity
• Replaces
expensive
traditional method
of duplicating data
warehouse
environment
![Page 27: Tech Primer: Big Data In the Cloud](https://reader034.vdocuments.net/reader034/viewer/2022052419/58a1b2631a28ab7d4d8c4c81/html5/thumbnails/27.jpg)
Slide #27 | Confidential
Data Mart of
Data Lake
Enterprise Data
Warehouse
BI/Analytics Tools ETL
• Offload data or
compute-intensive
workloads from
existing data
warehouse to cloud
• Free capacity in on-
premises systems
Data Warehouse Offload
to the Cloud