cwin17 frankfurt / cloudera

29
Connected Services Stefan Lipp/Jochen Faltermeier CWIN 2017 - Frankfurt

Upload: capgemini

Post on 21-Jan-2018

163 views

Category:

Presentations & Public Speaking


2 download

TRANSCRIPT

1© Cloudera, Inc. All rights reserved.

Connected Services

Stefan Lipp/Jochen FaltermeierCWIN 2017 - Frankfurt

2© Cloudera, Inc. All rights reserved. 2© Cloudera, Inc. All rights reserved.

Cloudera at-a-glance

Customer successLarge enterprises fueling growth

48% 140%+customer growth net expansion

Last 4 years Global 8000 customers

Expansion driven by data and new

use cases

Open partner networkBest of breed solutions

3000+partners

Vast ecosystem of solution &

service providers

First to marketOpen source innovation

2008founded

1600+Clouderans

Global team doing business in 28 countries

Big data innovators from Google,

Yahoo and Oracle

3© Cloudera, Inc. All rights reserved.

The data-driven enterprise

Explosion of data and devices (IoT)

30Bconnected

devices

440x more data

Transformation of IT infrastructure

open source

cloud

machine learning

$200Btotal

market1

1 IDC Worldwide Big Data and Business Analytics Market Through 2020

4© Cloudera, Inc. All rights reserved.

We believe data can make what is impossible

today, possible tomorrow

5© Cloudera, Inc. All rights reserved.

We empower people to transform complex data into clear and actionable insights

DRIVE CUSTOMER INSIGHTS

CONNECTPRODUCTS & SERVICES (IoT)

PROTECTBUSINESS

6© Cloudera, Inc. All rights reserved.

We deliverthe modern platform for machine learning and analytics

optimized for the cloud

RUNS ANYWHERECloudMulti-cloudOn-premises

SCALABLEElasticCost-effectiveLower TCO

ENTERPRISE GRADESecurePerformantCompliant

7© Cloudera, Inc. All rights reserved.

DRIVE CUSTOMER INSIGHTS CONNECT PRODUCTS & SERVICES (IoT) PROTECT BUSINESS

Delivering greater value through improved customer understanding

Powering predictive analytics to increaseperformance and reduce fleet downtime

Creating new revenue streams with an advanced anti-fraud solution

Cloudera powering data-driven customers

8© Cloudera, Inc. All rights reserved.

IntroductionNavistar is a leading manufacturer of commercial trucks, buses, defense vehicles and engines. Since 1831, our history has been interwoven with some of the most defining moments in world history. Whether it was America's westward expansion or WWII, we were there, pushing the limits of what's possible and driving history forward. But that doesn't mean we're stuck in the past. We're determined to keep delivering smart, sustainable technologies - because we believe that innovation defines America's future, too.

9© Cloudera, Inc. All rights reserved.

The Data Challenge & Pre-Hadoop ChallengeIn late 2013, Navistar launched OnCommand™ Connection. OnCommand™ Connection is part of the OnCommand™ family of fleet Management Services from Navistar.

OnCommand™ Connection leverages data feeds from telematics service providers and marries it with Meteorological, Geographical, Engineering, Vehicle Usage, Traffic, Historical Warranty, Service and Part Inventory Data to provide:

Real-time vehicle performance data streamlined within a single portal.Service Advisory’s and Scheduling before problems occurOptimized service plans and part delivery to the nearest dealer when problems do occur

We now actively monitor more than 300,000 vehicles and are adding to that total daily

10© Cloudera, Inc. All rights reserved.

Using Predictive Maintenance to Improve Performance and Reduce Fleet Downtime

• OnCommand Connection is collecting telematics and geolocation data across the fleet

• Reduced maintenance costs to $.03 per mile from $.12-$.15 per mile

• Centralizing data from 13 systems with varying frequency and semantic definitions

• Real-time visibility of ca. 300,000 trucks in order to improve uptime and vehicle performance

MANUFACTURING» SERVICE IMPROVEMENT» PREDICTIVE ANALYTICS» PROCESS IMPROVEMENT

11© Cloudera, Inc. All rights reserved.

Benefits & Impact

Quantifying Hadoop’s impact:By having literally all of our data in one place, we can perform analytics on an ad-hoc basis. Historically, simple questions required months to answer as we built out subject areas and transformed data.Our “Publish” Cluster brings the data to the consumer and it is certified. We have reduced hard dollar spending on proprietary hardware and expensive disk solutions, but also soft dollars in our speed to deliver answers.We can evaluate “what if” scenarios without the risk of impacting production processes.We can evaluate billions of rows of data and deliver answers in hours not weeks.

12© Cloudera, Inc. All rights reserved.

Data/Software > Analytics > Automation > AI is eating the world

„the innovation foodchain“ Marc Andreessen

Navistar IR Deck – H1 2017

− Connected services to reducemaintenance cost and improvevehicle uptime− Advanced driver assistancesystems and platooning toimprove fuel efficiency

and safety− Automated record-keeping toenhance driver productivity

13© Cloudera, Inc. All rights reserved.

#1 Telematics provider with 130 billion miles of driving data collected from black boxes in connected cars

Challenge:• Drive analytics on 12 million miles of

driving data collected every hour

Solution:• Telematics solution based on Cloudera to

process data from black boxes• Analytics around driving behavior, risks,

location, braking patterns, contextual elements and crash information

• Provide Usage Based Insurance services

TELEMATICS» CONNECTED VEHICLES» INSURANCE TELEMATICS» PREDICTIVE ANALYTICS

Connected Car Telematics for Insurance

CASE STUDY

DATA-DRIVENPROCESS

IOT & Connected Products

14© Cloudera, Inc. All rights reserved.

15© Cloudera, Inc. All rights reserved.

The IoT Ecosystem & Architecture

IoT Gateway

Gateway• Edge-Processing• Edge-Analytics

IoT Data Storage, Processing & Analytics

Centralized IoT Analytics• Time Series Data, Trends• Machine Learning • Context Enrichment• Deeper business insights

Distributed Data Processing & Analytics• Cloud & On-PremiseConnected Things

• Analytics at the edge• For immediate response

Data Center

Cloud

IoT Analytics

Enterprise Data Sources

Combining sensor data with contextual data is the key to value creation from IoT

16© Cloudera, Inc. All rights reserved.

17© Cloudera, Inc. All rights reserved.

TheClouderaPlatformforIoT– DataMgmt.ValueChain

Data Sources Data Ingest Data Storage & Processing Serving, Analytics &Machine Learning

ENTERPRISEDATAHUB

Apache KafkaStream or batch ingestion of IoT data

Apache SqoopIngestion of data from relational sources

Apache HadoopStorage (HDFS) & deep batch processing

Apache KuduStorage & serving for fast changing data

Apache HBaseNoSQL data store for real time

applications

Apache ImpalaMPP SQL for fast analytics

Cloudera SearchReal time searchConnectedThings/Data

Sources

StructuredDataSources Security, Scalability & Easy Management

DeploymentFlexibility:

Datacenter Cloud

Apache SparkStream & iterative processing, ML

18© Cloudera, Inc. All rights reserved.

ClouderaforIoT– KeyInnovations/Differentiators

Ideal for real-time analytics on IoTand time series data. Simplifies Lambda architectures for running real-time analytics on streaming data

Preserve business flexibility and data portability and minimize cloud lock-inby running in any one of the three major public cloud providers or in private cloud

Kudu:Real-TimeAnalytics SharedDataExperienceSDX DataScienceWorkbenchCollaborative hub for enterprisedata science and an integrated development environment for running Python, R, & Scala with support for Spark

19© Cloudera, Inc. All rights reserved.

HDFS

FastScans,AnalyticsandProcessingof

StoredData

FastOn-LineUpdates&DataServing

ArbitraryStorage(ActiveArchive)

FastAnalytics(onfast-changingor

frequently-updateddata)

Kudu– FastAnalyticsonFastDataReal Time Use cases that fall between HDFS and HBase were difficult to manage

Unchanging

FastChangingFrequentUpdates

HBase

Append-Only

Real-Time

ComplexHybridArchitectures

AnalyticGap

PaceofAnalysis

PaceofD

ata

20© Cloudera, Inc. All rights reserved.

S3 | ADLS | HDFS | KUDU

Cloudera Enterprise

20CONFIDENTIAL— RESTRICTED

The modern platform for machine learning and analytics optimized for the cloud

EXTENSIBLE SERVICES

CORE SERVICES DATA

ENGINEERINGOPERATIONAL

DATABASEANALYTIC DATABASE

DATA CATALOG

INGEST & REPLICATIONSECURITY GOVERNANCE WORKLOAD

MANAGEMENT

DATA SCIENCE

SHARED DATAEXPERIENCE

SHARED STORAGE

21© Cloudera, Inc. All rights reserved.

• Unified security – protects sensitive data with consistent controls, even for transient and recurring workloads

• Consistent governance – enables secure self-service access to all relevant data and increases compliance

• Easy workload management – increases user productivity and boosts job predictability

• Flexible ingest and replication – aggregates a single copy of all data, provides disaster recovery, and eases migration

• Shared catalog – defines and preserves structure and business context of data for new applications and partner solutions

Open platform servicesBuilt for multi-function analytics | Optimized for cloud

SHARED DATA

EXPERIENCE

22© Cloudera, Inc. All rights reserved.

Shared: Data, Operations, Governance, Security, Metadata

Data Engineering Data Science Deployment

Data Wrangling

Visualization and Analysis

Model Training & Testing Batch Scoring

Online Scoring

ServingData GovernanceCuration

Processing

Acquisition

Reports, Dashboards

Dev: Collaboration, Version Control Ops: Deployment, Scheduling, Orchestration

Support the complete data science workflowFrom data to exploration to action

23© Cloudera, Inc. All rights reserved.

Accelerates data science from development to production with:

● Secure self-service data access● On-demand compute● Support for Python, R, and Scala● Project dependency isolation for

multiple library versions● Workflow automation, version

control, collaboration and sharing

Cloudera Data Science WorkbenchSelf-service data science for the enterprise

24© Cloudera, Inc. All rights reserved.

A modern data science architecture

CDH CDH

Cloudera Manager

gateway nodes CDH nodes

● Built on Docker and Kubernetes● Runs on dedicated gateway nodes● User sessions run in isolated “engine”

containers which:○ Host Kerberos-authenticated

Python/R/Scala runtimes○ Interact with Spark via YARN

client mode (Driver runs in container, workers on CDH)

● Single-cluster only (for now)

Hive, HDFS, ...

CDSW CDSW

...

Master

...

Engine

EngineEngine

EngineEngine

25© Cloudera, Inc. All rights reserved.

“Our data scientists want GPUs, but we can’t find a way to deliver multi-tenancy.If they go to the cloud on their own, it’s expensive and we lose governance.”

● Extend existing CDSW benefits to GPU-optimized deep learning tools

● Schedule & share GPU resources● Train on GPUs, deploy on CPUs● Works on-premises or cloud

Accelerated deep learning on-demand with GPUs

Data Science Workbench

GPUCPU

CDH

CPU

CDH

CPU

single-node training distributedtraining, scoring

Multi-tenant GPU support on-premises or cloud

26© Cloudera, Inc. All rights reserved.

Open Ecosystem Black Box

An open ecosystem for agility and innovation

27© Cloudera, Inc. All rights reserved.

Run anywhere. Deploy any way.

Simple Unified Enterprise

Proven at scaleTrusted security

Hybrid or multi cloudPlatform-as-a-Service

Simplifies operationsWorks with your tools

28© Cloudera, Inc. All rights reserved.

Realtime Analytics bzw. Operational Analytics?

my definition

„apply logic and mathematics real-time on data to improve operations“

Model Analyze Repeat

# Aggregate relational, NoSQL, structured & unstructured data# Accelerate data science from exploration to production using R, Python, Spark and more# Deploy pipelines and models on-premise or in the cloud.

Seeking Abnormal Behavior

# Serve real-time data at scale for real-time decision making# Stream processing & analytics on changing operational data

29© Cloudera, Inc. All rights reserved.

Lohnt sich das überhaupt?

HW > Data/Software > Analytics > Automation > AI/ML Technology Foodchain aus „Digital or Dead“