20150314 sahara intro and the future plan for open stack meetup

21
Big Data Technologies Sahara Intro & Future Plan Weiting Chen [email protected]

Upload: wei-ting-chen

Post on 17-Aug-2015

35 views

Category:

Technology


3 download

TRANSCRIPT

Big Data Technologies

Sahara Intro & Future Plan

Weiting Chen

[email protected]

SSG / STO / BDT

Legal Disclaimers

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.This document contains information on products, services and/or processes in development.  All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps.The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request.

© 2015 Intel Corporation.

SSG / STO / BDT

WHO WE ARE

Bring Cloudera CDH 5.3 Plugin into OpenStack SaharaComplete to add all the services in Cloudera CDH 5.3 and integrate them into Sahara CDH Plugin

Provide Complete Integration Test to Help a Better User ExperienceA complete integration testing in OpenStack Sahara to help deliver a good user experience in Sahara CDH Plugin

Rank #3 Commits Company in Sahara ContributionRanked after #1 Mirantis and #2 Red Hat

SSG / STO / BDT

OPENSTACK HISTORY

Austin

BexarCactusDiablo

EssexFolsom

GrizzlyHavana

IcehouseJuno

Kilo

NovaSwift

GlanceHorizonKeystoneQuantumCinder

CeilometerTroveSahara

Ironic

• Zaqar• Manila• Designat

e• Barbican

Incubation2010

20112012

20132014

2015

SSG / STO / BDT

Move Focus from IaaS to PaaS and SaaSmore and more applications(xxx-as-a-service) based on OpenStack infrastructure

SSG / STO / BDT

~ 25.9% CAGR

Big Data Market expects to grow from 16.5 billion (2014) to 41.5 billion (2018), it also includes cloud infrastructure segment from 1.2 billion (2014) to 4.7 billion (2018)

200 Billion

Cloud market will hit 118 billion in 2015, 200 billion by 2018, from 95.8 million market reached in 2014.

Trend

Source from IDC 2014

Cloud-based solution will shape IT spending for years. IDC estimates cloud services spending will continue to grow at double-digit rates for the next few years.

FROM THE MARKET

Big Data Cloud Market X-as-a-Service

SSG / STO / BDT

Big DataInternet Of Thing

THE VISION

Cloud ComputingDifferent data source

will come from diversity of devices.

Using data processing model to process the data and transfer it become high value.

A shared resources infrastructure to support a flexible IT environment and fulfill the requirement on demand.

SSG / STO / BDT

OpenStack vs Hadoop

Most Companies using OpenStack cluster in their IT environment are also preparing another Hadoop cluster for Big Data analytics.

Sahara is a solution to bring Hadoop and OpenStack together.

SSG / STO / BDT

SAHARA BACKGROUND

Basic Idea comes from Amazon Elastic MapReduce (EMR)

To provide users easily provisioning Hadoop clusters by specifying several parameters

Analytics as a Service for data scientist or analyst

SSG / STO / BDT

ARCHITECTURE

SSG / STO / BDT

Sahara Key Features - Provision Cluster

Create/Terminate Cluster

• Heat API/Nova Direct API

• Neutron/Nova Network

• Floating IP Management

• Anti-affinity

Cluster Scaling

• Add Node/Remove Node

Support Plugins

• Vanilla/Hortonworks Data Platform/Cloudera/Spark/MapR

SSG / STO / BDT

Sahara Key Features - Elastic Data Processing

Support Job Type

• Hive/Pig/MapReduce/MapReduce Streaming/Java/Spark/Shell/HBase

Support Data Locality

• Rack/Hypervisor/Swift

Data Source

• Internal: Ephemeral Disk/Cinder

• External: Swift

Run Job in Transient Cluster

*Different Plugin provide different capabilities

SSG / STO / BDT

WORKING FLOW

Fast Cluster Provisioning

Select Hadoop Version

Select Base Image w/ Hadoop

Define Cluster

Configuration

Provision Cluster

Operate Cluster

Terminate Cluster

Analytic as a Service using Elastic Data Processing

Select Hadoop Version

Configure JobsSet Limit for Cluster

Execute Jobs Get The Result

• Choose type of the job: pig, hive, jar-file, etc.• Select input and output data location (Swift support)• Cluster will be removed automatically after the job completion

• Provide the details Hadoop configuration, like size, topology, and others• Sahara will provision VMs, install and configure Hadoop• Support Scale out Cluster to add/remove nodes

SSG / STO / BDT

CLOUDERA CDH PLUGIN

Controller Computing Node1

VM1 - Master VM2 - Slave

Cloudera Manager(Cloudera Express v5.1.3,

CDH v5.0.0 & CM API v7)

Job History

Resource Manager

Oozie Server

Name Node

Secondary

Name Node

Data Node

Node Manager

Cloudera Manager API Python Client

(Migrate from CM-API Client)

Sahara Service

Horizon(OpenStack Dashboard)

CDH Plugin

Step1: Create VM via Heat by using Cluster Template. CM must be included in one master machine.Step2: Use CM API Client to connect to CM and provision the other services in the cluster.

STEP1

STEP2

CDH ClusterEnd Customer

SSG / STO / BDT

DATA PROCESSING MODEL

Swift

OpenStack

Virtual Clusters

OpenStack

Virtual Clusters

HDFS

Collector Agent

Data Stream

Pattern 2: External - SwiftPattern 1: Internal - HDFS Only

Collector Agent

Collecting DataCollecting Data

OpenStack use Swift as a data source to store input and output data. The benefit is to process the data directly and persist the data via Swift.

OpenStack support to create HDFS on Cinder or Ephemeral Disk. This method can provide a better data processing performance via Ephemeral Disk or to persist the data via Cinder with lower performance.

Cinder

Ephemeral Disk

MapReduce MapReduce

SSG / STO / BDT

Current Issue

~30% Performance Loss

We use Sahara with KVM to create a Hadoop Cluster(HDFS in Ephemeral Disk) and compare with a Bare Metal Hadoop in the same servers.

Different workloads(Hi-Bench) may shown different results.

SSG / STO / BDT

Beyond The Performance…Performance may always be an issue compare with Hypervisor and Bare Metal

SSG / STO / BDT

IT Integration

Sahara must provide an elastic platform to fulfill the customer’s request and to adopt big data’s infrastructure. To support more technologies can help Sahara seamless integrating to customer’s IT environment.

EDP should provide a simple interface to help data scientists only need to focus on their own expertise and no worry about how to deploying clusters. Analytics-as-a-Service is a trend in the future.

Workload-based EDP

SSG / STO / BDT

MORE …Bare Metal Support

• OpenStack Ironic

Docker Support

• Nova-docker driver, OpenStack Magnum

Support More Storage Backend

• OpenStack Manila, External HDFS

Complete to Support More Data Processing Model

• Hadoop, Spark, …etc

SSG / STO / BDT

WHAT’S NEW IN KILO

• Vanilla support Hadoop v1.2.1 and Hadoop 2.6

• Spark Plugin

• Cloudera CDH Plugin

• MapR Plugin

• Storm Plugin

• New Horizon UI with New Guide Panel

• Default Template Support