the time has come for big-data-as-a-service

31
#HadoopSummit The Time Has Come for Big-Data-as-a-Service Kris Applegate – Cloud and Big Data Solution Architect, Dell Tom Phelan – Co-Founder and Chief Architect, BlueData

Upload: bluedata-inc

Post on 09-Jan-2017

836 views

Category:

Software


1 download

TRANSCRIPT

Page 1: The Time Has Come for Big-Data-as-a-Service

#HadoopSummit

The Time Has Come for Big-Data-as-a-Service

Kris Applegate – Cloud and Big Data Solution Architect, Dell

Tom Phelan – Co-Founder and Chief Architect, BlueData

Page 2: The Time Has Come for Big-Data-as-a-Service

#HadoopSummit

Agenda

• A Brief History of Hadoop• Data Storage and Networking Evolution• The Virtualization Revolution• Rise of Big-Data-as-a-Service• Big-Data-as-a-Service (BDaaS) Defined• BDaaS – Public Cloud or On-Premises?• Q & A

Page 3: The Time Has Come for Big-Data-as-a-Service

#HadoopSummit

A Brief History of Hadoop

Page 4: The Time Has Come for Big-Data-as-a-Service

#HadoopSummit

In the Beginning (circa 2003) …• Networks were slow (1 Gigabit per

second maximum)

• Siloed storage was expensive (proprietary and often required special hardware)

• Local HDDs were cheap and fast enough for big data needs

Source: http://static.googleusercontent.com/media/research.google.com/en//archive/gfs-sosp2003.pdf

Page 5: The Time Has Come for Big-Data-as-a-Service

#HadoopSummit

Bringing the Compute to the Data

Compute Storage

Co-LocateCompute & Storage

Hadoop and HDFS are Born

Page 6: The Time Has Come for Big-Data-as-a-Service

#HadoopSummit

Network Improvements

Page 7: The Time Has Come for Big-Data-as-a-Service

#HadoopSummit

Data Compression Options in HDFSSource: www.slideshare.net/Hadoop_Summit/singh-kamat-june27425pmroom210c

Page 8: The Time Has Come for Big-Data-as-a-Service

#HadoopSummit

Result: Is Disk-Locality Irrelevant?

Source: https://amplab.cs.berkeley.edu/wp-content/uploads/2011/06/disk-irrelevant_hotos2011.pdf

Less relevant may be more accurate•Faster data center networks

•Distributed/non-distributed caching platforms

• Example: Alluxio (Tachyon)

•Compute and storage separation

Page 9: The Time Has Come for Big-Data-as-a-Service

#HadoopSummit

• Virtualization / “cloud” technology is not absolutely required

• But realistically … the flexibility and elasticity of BDaaS cannot be economically provided without these underlying technologies

BDaaS and Cloud

Page 10: The Time Has Come for Big-Data-as-a-Service

#HadoopSummit

The Virtualization Revolution

VMware

KVM

Docker

HyperVLXC

Page 11: The Time Has Come for Big-Data-as-a-Service

#HadoopSummit

Virtualization enabled several key benefits including:

•Automation, flexibility, elasticity• Cost reduction and consolidation• Higher utilization, less hardware overprovisioning

•Multi-tenancy• Security• VxLAN

• Fault isolation

The Virtualization Revolution

Page 12: The Time Has Come for Big-Data-as-a-Service

#HadoopSummit

But …. the overhead involved in the virtualization of storage and networking within a hypervisor

make it difficult to meet the performance needs of Big Data workloads (SLAs, QoS)

The Virtualization Revolution

Page 13: The Time Has Come for Big-Data-as-a-Service

#HadoopSummit

• Linux Containers• OS virtualization reduces CPU,

memory, network, and storage virtualization overhead

• Docker file format makes containers easy to use and share

The Virtualization Revolution

Page 14: The Time Has Come for Big-Data-as-a-Service

#HadoopSummit

Rise of Big-Data-as-a-Service

Page 15: The Time Has Come for Big-Data-as-a-Service

#HadoopSummit

Big Data New Realities

Big Data Traditional Assumptions

Bare-metal

Disk-locality

HDFS on local disks

Big Data New Realities

Containers

Compute and storage separation

In-place access on remote data stores

New Benefits and Value

Big-Data-as-a-Service

Agility and cost savings

Faster time-to-insights

Page 16: The Time Has Come for Big-Data-as-a-Service

#HadoopSummit

Journey to BDaaS

2003 Google paper

2012 Hadoop 1.0.2Snappy Compression

2012 10 Gbit networking in

data center

2008 Initial release of Linux

containers

2002 Initial release of

VMware ESX

2015 BlueData EPIC 2.0 with

Docker

2016 BDaaS available

on-prem or cloud

2004 Big Data

era begins

2002 2016

2014 VxLANs

available

2013 Dell Hadoop Performance

Analysis

2011 Dell first to launch optimized Apache Hadoop solution

2007 Hadoop release 0.14.1

2009 Dell DCS delivers first Big

Data server

2013 Initial release

of Docker2015 40 Gb

networking indata center

2014 BlueData wins Strata +

Hadoop World Showcase

2009 Amazon Launches EMR

Page 17: The Time Has Come for Big-Data-as-a-Service

#HadoopSummit

BDaaS – The Time Has Come

All the pieces are now available:

•Fast network hardware and good data compression Compute and storage separation Low overhead virtualization (containers) Ability to run network and storage-intensive workloads

•No sacrifice in performance•Demand from end users for agility, flexibility, & speed

Page 18: The Time Has Come for Big-Data-as-a-Service

#HadoopSummit

Big-Data-as-a-Service Defined

“A mechanism for the delivery of statistical analysis tools and information that helps organizations understand and use insights gained from large

information sets in order to gain a competitive advantage.”

On-Demand, Self-Service, ElasticBig Data Infrastructure, Applications, Analytics

Source: www.semantikoz.com/blog/big-data-as-a-service-definition-classification

Page 19: The Time Has Come for Big-Data-as-a-Service

#HadoopSummit

• Core BDaaS

• Performance BDaaS

• Feature BDaaS

• Integrated BDaaS

Four Types of BDaaS

Source: www.semantikoz.com/blog/big-data-as-a-service-definition-classification

Page 20: The Time Has Come for Big-Data-as-a-Service

#HadoopSummit

Core BDaaS• Minimal platform, such as Hadoop with YARN

Performance BDaaS • “Downwards” vertical integration• Includes optimized infrastructure• Tight integration with Core BDaaS

Source: www.semantikoz.com/blog/big-data-as-a-service-definition-classification

Four Types of BDaaS

Page 21: The Time Has Come for Big-Data-as-a-Service

#HadoopSummit

Four Types of BDaaS

Feature BDaaS • “Upwards” vertical integration• Include features beyond Hadoop• Support for multiple Core BDaaS providers

Integrated BDaaS• Full vertical integration and optimization• Includes both Performance BDaaS & Feature BDaaS

Source: www.semantikoz.com/blog/big-data-as-a-service-definition-classification

Page 22: The Time Has Come for Big-Data-as-a-Service

#HadoopSummit

BDaaS – Public Cloud or On-Prem?

Page 23: The Time Has Come for Big-Data-as-a-Service

#HadoopSummit

Public Cloud

• Low Capex, high Opex• “Infinite” expandability• Less secure?• Less control: software,

SLAs, configs, etc

On-Premises (Private Cloud)

•High Capex, low Opex•Eventually reach resource limit •More secure? •More control: software, SLAs, configs, etc.

BDaaS – Public Cloud or On-Prem

Page 24: The Time Has Come for Big-Data-as-a-Service

#HadoopSummit

Challenge: Public cloud services can be proprietaryGoal: Deliver API-compatible on-prem + public cloud• BDaaS layer (e.g. BlueData)

• PaaS layer (e.g. Cloudforms, Cloud Foundry)

• API-compatible private cloud (e.g. Microsoft Azure Pack/Stack, OpenStack, VMware)

BDaaS – Workload Portability

Page 25: The Time Has Come for Big-Data-as-a-Service

#HadoopSummit

• Workloads with a shorter life than 16 months* (e.g. Dev/Test)

• When data is in the cloud too

• Public-facing services

Example Public Cloud Use Cases

BDaaS – Public Cloud

* www.dell.com/learn/us/en/555/business~solutions~whitepapers~en/documents~microsoft-private-cloud-tco-0914.pdf

Page 26: The Time Has Come for Big-Data-as-a-Service

#HadoopSummit

Example On-Prem Use Cases

• High performance clusters• Data security• Data compliance• Persistent clusters with > 16 month lifespan*• High capacity clusters• When SLAs are needed

* The BlueData EPIC software platform addresses this potential limitation

BDaaS – On-Premises / Private Cloud

Page 27: The Time Has Come for Big-Data-as-a-Service

#HadoopSummit

• BDaaS software platform, using Docker containers• Self-service, on-demand Hadoop / Spark clusters• Bring your own application / distribution / version• Compute and storage separation

Scale resources independently Clusters with < 16 month lifespan well supported (e.g. transient) No HDFS data ingestion penalty

• Secure multi-tenancy, Quality of Service (QoS)

BlueData EPIC – Integrated BDaaS

Page 28: The Time Has Come for Big-Data-as-a-Service

#HadoopSummit

Big Data On-PremisesTraditional Big Data On-Prem

IT

ManufacturingSalesR&DServices

< 30% Utilization

Duplication of data

Management complexity

Weeks to build each cluster

Complex, painful

upgrades

BlueData EPIC Software Platform

ManufacturingSalesR&DServices

BI/Analytics Tools

> 90% Utilization

BDaaS On-Prem with BlueData

No Duplication of Data

Simplified Management

Multi-Tenant

Simple, instant

upgrades

Self-service, on-demand

clusters

with BlueData

Page 29: The Time Has Come for Big-Data-as-a-Service

#HadoopSummit

NEW – BDaaS On-Prem and Cloud

• BlueData announced AWS and multi-cloud strategy Extending the user experience and value of BlueData to public cloud Single pane of glass for on-prem and off-prem Big Data workloads Initial AWS support; then MS Azure, Google Cloud Platform, others

• Support for data on-prem and compute in the cloud Leverage cloud compute elasticity while keeping data on-premises Eliminate challenge of data movement from on-prem to cloud

Page 30: The Time Has Come for Big-Data-as-a-Service

#HadoopSummit

BlueData and Dell Partnership

• Joint solution for Big-Data-as-a-Service

• BlueData = Certified Dell Technology Partner

• Installed, tested, validated on Dell hardware

• Featured in Dell’s Global Customer Solution Centers

Page 31: The Time Has Come for Big-Data-as-a-Service

#HadoopSummit

Kris [email protected]/bigdata

Tom [email protected]

Q & A