hadoop everywhere

16
1 © Copyright 2015 EMC Corporation. All rights reserved. HADOOP EVERYWHERE GEO-DISTRIBUTED STORAGE FOR BIG DATA

Upload: dataworks-summithadoop-summit

Post on 16-Apr-2017

292 views

Category:

Technology


0 download

TRANSCRIPT

POWERPOINT TEMPLATES (8/13/15)

Hadoop Everywhere

Geo-Distributed Storage for Big Data

# Copyright 2015 EMC Corporation. All rights reserved.Good afternoon everyone, welcome to the session on Geo-Distributed storage for Big Data. Hope you have enjoyed the conference so far.

I would like to start our session today with a couple of show of hands questions:How many of you have clusters in more than than one region/GEO?How many of you use distcp today?Other tools apart from distcp?

#TITLE

About us

Nikhil Joshi

Consultant Product [email protected]@nikhilj0shi

Vishrut Shah

Director of [email protected]

# Copyright 2015 EMC Corporation. All rights reserved.About us#TITLE

CLOUD apps & THEIR data are global

Paris

Denver

BeijingBUT ANALYTICS IS CONFINED TO A SINGLE DATACENTER

# Copyright 2015 EMC Corporation. All rights reserved.1. Todays applications generate and consume data from all across the globe. But Hadoop has mostly remained confined to a single datacenter.2. Hadoop has grown and matured around HDFS as its filesystem. HDFS did a great job in providing a petabyte scale, reliable storage system built on commodity hardware. 3. Its storage layer is yet to evolve into a global filesystem with characteristics like geo-access, active-active access, selective replication.4. Most vendor tools have relied on distcp based mechanisms ----------New age applications such as Mobile, social and IoT apps create and consume data across the globe. This gives rise to multiple apps creating silos of data that are accessed via different protocols like S3, NFS, Swift etc depending on the application. This data is then further copied into other silos specifically for analytics purposes. This leads to overall operational complexity as well higher storage costs.

#TITLE

WorkflowReal-time analytics on transactional systemPush to Risk-Analytics infra via distcpPush to a cold-archive for complianceScale3 datacenters, 7 tenants200gb a day. Expect 3x increase in 2 years27 Hadoop clusters (test, dev, prod, DR)Regain control from the shadow ITBeyond your first hadoop clusterWhat we heard from a large European bank

# Copyright 2015 EMC Corporation. All rights reserved.Geo-DISTRIBUTED analytics INFRASTRUCTUREPolyglotHadoop-ready Storage3x ReplicationExabyte ScaleMulti-tenant

Large/Small FilesGeo-accessibleWAN-efficientActive-activeStrong ConsistencyNEEDS TO ADDRESS SILOS, UNDERUTILIZATION, RUNAWAY REPLICATION

# Copyright 2015 EMC Corporation. All rights reserved.[Animated]

Spend a lot of time on this slide. This will be framed as desirable characteristics of a general geo-hadoop platform. Stress of active-active and strong consistency.

Not ECS specific yet. #TITLE

ECS: A Thoroughbred geo-platform

Global Namespace

DenverBeijingParis

STORAGE EFFICIENCY

GEO CACHINGSTRONG CONSISTENCYACTIVE-ACTIVE w/ FAILOVERGeo-distributed DataGeo-distributed DataApps

# Copyright 2015 EMC Corporation. All rights reserved.Reiterate the concept from previous slide Mention WAN distances

#TITLE

ECS as A Hadoop storage backendHCFS*Vendor AgnosticPrimaryFilesystemGeo-distributedKerberosEnabledApache Hadoop 2.7Ambari CompatibleMulti-tenant

# Copyright 2015 EMC Corporation. All rights reserved.Spend time here. Hopefully, HCFS has already been explained before.#TITLE

SHARED STORAGE

Hadoop Compute ClusterShared Storage Backend

# Copyright 2015 EMC Corporation. All rights reserved.HCFS Client library

Client Library

# Copyright 2015 EMC Corporation. All rights reserved.HaaS

Tenant 1Tenant 2Tenant 3

# Copyright 2015 EMC Corporation. All rights reserved.

#TITLE

Hadoop as a Service

AnalystsIndependently scalable primary storage to run Hadoop analytics

Site

Compute Grid

COMMON CONFIGURATIONS

Disaster Recovery

Geo-distribution

Site 1

Site 2

Compute GridArchive/failover tier to Hadoop

Multi-site analytics

Geo-distribution

Site 1

Site 2

Compute GridDistributed storage to run in-place Hadoop analytics across sites

Compute Grid

# Copyright 2015 EMC Corporation. All rights reserved.Hortonworks certified

HDP 2.3 ECS 2.2Certified and ValidatedAmbari 2.2Joint SupportCollateralTraining

# Copyright 2015 EMC Corporation. All rights reserved.Enough talk. Show me the demo.

# Copyright 2015 EMC Corporation. All rights reserved.Vishrut can take over from here.

I will try to do a humorous hand off. Something like Now, my technical colleague will show you that this is not all smoke and mirrors. He has also graciously agreed to field all hard questions.#TITLE

# Copyright 2015 EMC Corporation. All rights reserved.

# Copyright 2015 EMC Corporation. All rights reserved.

# Copyright 2015 EMC Corporation. All rights reserved.More about the partnership

# Copyright 2015 EMC Corporation. All rights reserved.[Animated] Point audience to some material available for further reading about this partnership#TITLE

Ambari-ecs walkthrough

# Copyright 2015 EMC Corporation. All rights reserved.Yet to decide if we want to move this inside our video demo, or do this via a screenshot-based walkthough#TITLE

# Copyright 2015 EMC Corporation. All rights reserved.

# Copyright 2015 EMC Corporation. All rights reserved.

# Copyright 2015 EMC Corporation. All rights reserved.

# Copyright 2015 EMC Corporation. All rights reserved.

# Copyright 2015 EMC Corporation. All rights reserved.

# Copyright 2015 EMC Corporation. All rights reserved.

# Copyright 2015 EMC Corporation. All rights reserved.

# Copyright 2015 EMC Corporation. All rights reserved.Weave a story connected car, network intrusion detection in an enterprise datacenter, oil and gasTo explain who would need an infra. like the one we are about to describe.The use case

# Copyright 2015 EMC Corporation. All rights reserved.ArchitecturePretty architecture diagram explaining everything that will happen as part of the demo

# Copyright 2015 EMC Corporation. All rights reserved.

Global infra. For hadoop

# Copyright 2015 EMC Corporation. All rights reserved.

Hadoop as a Service

AnalystsIndependently scalable primary storage to run Hadoop analytics

Site

Compute Grid

ECS Analytics Use Cases

# Copyright 2015 EMC Corporation. All rights reserved.

Hadoop as a Service

AnalystsIndependently scalable primary storage to run Hadoop analytics

Site

Compute Grid

ECS Analytics Use Cases

Disaster Recovery

Geo-distribution

Site 1

Site 2

Compute GridArchive/failover tier to Hadoop

# Copyright 2015 EMC Corporation. All rights reserved.

Hadoop as a Service

AnalystsIndependently scalable primary storage to run Hadoop analytics

Site

Compute Grid

ECS Analytics Use Cases

Disaster Recovery

Geo-distribution

Site 1

Site 2

Compute GridArchive/failover tier to Hadoop

Multi-site analytics

Geo-distribution

Site 1

Site 2

Compute GridDistributed storage to run in-place Hadoop analytics across sites

Compute Grid

# Copyright 2015 EMC Corporation. All rights reserved.Low resource utilization and Silos

MR/Hive/PigYARNHDFS

MR/Hive/PigYARNHDFS

MR/Hive/PigYARNHDFS

GLOBAL HADOOP STORAGETest/DevProductionDR/Backup

# Copyright 2015 EMC Corporation. All rights reserved.[Animated] Simple animation to ensure that audience understands global hadoop only applies to storage. Apps (MR/Hbase) are not geo-accessible.And to demonstrate this solution only replaces the storage layer (HDFS)#TITLE