the time has come for big-data-as-a-service

Post on 09-Jan-2017

836 Views

Category:

Software

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

#HadoopSummit

The Time Has Come for Big-Data-as-a-Service

Kris Applegate – Cloud and Big Data Solution Architect, Dell

Tom Phelan – Co-Founder and Chief Architect, BlueData

#HadoopSummit

Agenda

• A Brief History of Hadoop• Data Storage and Networking Evolution• The Virtualization Revolution• Rise of Big-Data-as-a-Service• Big-Data-as-a-Service (BDaaS) Defined• BDaaS – Public Cloud or On-Premises?• Q & A

#HadoopSummit

A Brief History of Hadoop

#HadoopSummit

In the Beginning (circa 2003) …• Networks were slow (1 Gigabit per

second maximum)

• Siloed storage was expensive (proprietary and often required special hardware)

• Local HDDs were cheap and fast enough for big data needs

Source: http://static.googleusercontent.com/media/research.google.com/en//archive/gfs-sosp2003.pdf

#HadoopSummit

Bringing the Compute to the Data

Compute Storage

Co-LocateCompute & Storage

Hadoop and HDFS are Born

#HadoopSummit

Network Improvements

#HadoopSummit

Data Compression Options in HDFSSource: www.slideshare.net/Hadoop_Summit/singh-kamat-june27425pmroom210c

#HadoopSummit

Result: Is Disk-Locality Irrelevant?

Source: https://amplab.cs.berkeley.edu/wp-content/uploads/2011/06/disk-irrelevant_hotos2011.pdf

Less relevant may be more accurate•Faster data center networks

•Distributed/non-distributed caching platforms

• Example: Alluxio (Tachyon)

•Compute and storage separation

#HadoopSummit

• Virtualization / “cloud” technology is not absolutely required

• But realistically … the flexibility and elasticity of BDaaS cannot be economically provided without these underlying technologies

BDaaS and Cloud

#HadoopSummit

The Virtualization Revolution

VMware

KVM

Docker

HyperVLXC

#HadoopSummit

Virtualization enabled several key benefits including:

•Automation, flexibility, elasticity• Cost reduction and consolidation• Higher utilization, less hardware overprovisioning

•Multi-tenancy• Security• VxLAN

• Fault isolation

The Virtualization Revolution

#HadoopSummit

But …. the overhead involved in the virtualization of storage and networking within a hypervisor

make it difficult to meet the performance needs of Big Data workloads (SLAs, QoS)

The Virtualization Revolution

#HadoopSummit

• Linux Containers• OS virtualization reduces CPU,

memory, network, and storage virtualization overhead

• Docker file format makes containers easy to use and share

The Virtualization Revolution

#HadoopSummit

Rise of Big-Data-as-a-Service

#HadoopSummit

Big Data New Realities

Big Data Traditional Assumptions

Bare-metal

Disk-locality

HDFS on local disks

Big Data New Realities

Containers

Compute and storage separation

In-place access on remote data stores

New Benefits and Value

Big-Data-as-a-Service

Agility and cost savings

Faster time-to-insights

#HadoopSummit

Journey to BDaaS

2003 Google paper

2012 Hadoop 1.0.2Snappy Compression

2012 10 Gbit networking in

data center

2008 Initial release of Linux

containers

2002 Initial release of

VMware ESX

2015 BlueData EPIC 2.0 with

Docker

2016 BDaaS available

on-prem or cloud

2004 Big Data

era begins

2002 2016

2014 VxLANs

available

2013 Dell Hadoop Performance

Analysis

2011 Dell first to launch optimized Apache Hadoop solution

2007 Hadoop release 0.14.1

2009 Dell DCS delivers first Big

Data server

2013 Initial release

of Docker2015 40 Gb

networking indata center

2014 BlueData wins Strata +

Hadoop World Showcase

2009 Amazon Launches EMR

#HadoopSummit

BDaaS – The Time Has Come

All the pieces are now available:

•Fast network hardware and good data compression Compute and storage separation Low overhead virtualization (containers) Ability to run network and storage-intensive workloads

•No sacrifice in performance•Demand from end users for agility, flexibility, & speed

#HadoopSummit

Big-Data-as-a-Service Defined

“A mechanism for the delivery of statistical analysis tools and information that helps organizations understand and use insights gained from large

information sets in order to gain a competitive advantage.”

On-Demand, Self-Service, ElasticBig Data Infrastructure, Applications, Analytics

Source: www.semantikoz.com/blog/big-data-as-a-service-definition-classification

#HadoopSummit

• Core BDaaS

• Performance BDaaS

• Feature BDaaS

• Integrated BDaaS

Four Types of BDaaS

Source: www.semantikoz.com/blog/big-data-as-a-service-definition-classification

#HadoopSummit

Core BDaaS• Minimal platform, such as Hadoop with YARN

Performance BDaaS • “Downwards” vertical integration• Includes optimized infrastructure• Tight integration with Core BDaaS

Source: www.semantikoz.com/blog/big-data-as-a-service-definition-classification

Four Types of BDaaS

#HadoopSummit

Four Types of BDaaS

Feature BDaaS • “Upwards” vertical integration• Include features beyond Hadoop• Support for multiple Core BDaaS providers

Integrated BDaaS• Full vertical integration and optimization• Includes both Performance BDaaS & Feature BDaaS

Source: www.semantikoz.com/blog/big-data-as-a-service-definition-classification

#HadoopSummit

BDaaS – Public Cloud or On-Prem?

#HadoopSummit

Public Cloud

• Low Capex, high Opex• “Infinite” expandability• Less secure?• Less control: software,

SLAs, configs, etc

On-Premises (Private Cloud)

•High Capex, low Opex•Eventually reach resource limit •More secure? •More control: software, SLAs, configs, etc.

BDaaS – Public Cloud or On-Prem

#HadoopSummit

Challenge: Public cloud services can be proprietaryGoal: Deliver API-compatible on-prem + public cloud• BDaaS layer (e.g. BlueData)

• PaaS layer (e.g. Cloudforms, Cloud Foundry)

• API-compatible private cloud (e.g. Microsoft Azure Pack/Stack, OpenStack, VMware)

BDaaS – Workload Portability

#HadoopSummit

• Workloads with a shorter life than 16 months* (e.g. Dev/Test)

• When data is in the cloud too

• Public-facing services

Example Public Cloud Use Cases

BDaaS – Public Cloud

* www.dell.com/learn/us/en/555/business~solutions~whitepapers~en/documents~microsoft-private-cloud-tco-0914.pdf

#HadoopSummit

Example On-Prem Use Cases

• High performance clusters• Data security• Data compliance• Persistent clusters with > 16 month lifespan*• High capacity clusters• When SLAs are needed

* The BlueData EPIC software platform addresses this potential limitation

BDaaS – On-Premises / Private Cloud

#HadoopSummit

• BDaaS software platform, using Docker containers• Self-service, on-demand Hadoop / Spark clusters• Bring your own application / distribution / version• Compute and storage separation

Scale resources independently Clusters with < 16 month lifespan well supported (e.g. transient) No HDFS data ingestion penalty

• Secure multi-tenancy, Quality of Service (QoS)

BlueData EPIC – Integrated BDaaS

#HadoopSummit

Big Data On-PremisesTraditional Big Data On-Prem

IT

ManufacturingSalesR&DServices

< 30% Utilization

Duplication of data

Management complexity

Weeks to build each cluster

Complex, painful

upgrades

BlueData EPIC Software Platform

ManufacturingSalesR&DServices

BI/Analytics Tools

> 90% Utilization

BDaaS On-Prem with BlueData

No Duplication of Data

Simplified Management

Multi-Tenant

Simple, instant

upgrades

Self-service, on-demand

clusters

with BlueData

#HadoopSummit

NEW – BDaaS On-Prem and Cloud

• BlueData announced AWS and multi-cloud strategy Extending the user experience and value of BlueData to public cloud Single pane of glass for on-prem and off-prem Big Data workloads Initial AWS support; then MS Azure, Google Cloud Platform, others

• Support for data on-prem and compute in the cloud Leverage cloud compute elasticity while keeping data on-premises Eliminate challenge of data movement from on-prem to cloud

#HadoopSummit

BlueData and Dell Partnership

• Joint solution for Big-Data-as-a-Service

• BlueData = Certified Dell Technology Partner

• Installed, tested, validated on Dell hardware

• Featured in Dell’s Global Customer Solution Centers

#HadoopSummit

Kris Applegatekris_applegate@dell.comwww.dell.com/bigdata

Tom Phelantap@bluedata.comwww.bluedata.com

Q & A

top related