dice: a model-driven devops framework for big...

23
DICE Horizon 2020 Project Grant Agreement no. 644869 http://www.dice-h2020.eu Funded by the Horizon 2020 Framework Programme of the European Union DICE: a Model-Driven DevOps Framework for Big Data Giuliano Casale Imperial College London

Upload: others

Post on 26-Apr-2020

9 views

Category:

Documents


1 download

TRANSCRIPT

DICE Horizon 2020 Project Grant Agreement no. 644869 http://www.dice-h2020.eu Funded by the Horizon 2020

Framework Programme of the European Union

DICE: a Model-Driven DevOps Framework for Big Data

Giuliano Casale Imperial College London

• DevOps for Big data • DataOps: DevOps & data engineering • DevOps for data intensive applications

• DevOps with Big data • Mining monitoring data for feedbacks/RCA • Novel uses of data science to address

software engineering problems

DevOps & Big data

2 ©DICE 12/21/2017

DICE: DevOps for DIAs Horizon 2020 research project (4M€, 2015-18) 9 partners (Academia & SMEs), 7 EU countries

3 ©DICE 12/21/2017

Mission: support SMEs in developing high-quality cloud-based data-intensive applications (DIAs)

1. Deliver a DevOps toolchain for DIAs • From requirements to CI/CD • Shared view of the application across roles

2. Increase automation of quality analysis during DIA development and operation • Analysis of efficiency, reliability,

correctness • Architectural optimisation for DIAs

Challenges

4 ©DICE 12/21/2017

o Reliability

o Efficiency

o Correctness

What do we mean by Quality?

5

Availability Fault-tolerance

Performance Costs

©DICE 12/21/2017

Privacy & security Temporal metrics

DIA dimensions

6 ©DICE 12/21/2017

Data and its Properties

Location

Velocity Volume

Variety Privacy

Platforms

NoSQL Hadoop Spark

Storm

Methods & Tools

MDE

Delivery DevOps

Cloud

Integration QA

1. Model-driven orchestration o Big Data technologies are in movement

2. QA through M2M transformation of requirements/SLAs/architectural information o Test (and stress test) generation o Simulation o Verification o Optimization (costs, SLAs, config) o Architectural anti-pattern detection

Our case for model-driven DevOps

7 ©DICE 12/21/2017

Eases QA skill shortage

Prototype

Deploy

Monitor

Enhance

Design

DICE: Quality-Aware DevOps for Big Data

8 ©DICE 12/21/2017

Big Data Technologies

Cloud (Priv/Pub) `

DICE Framework

9

DICE IDE

Profile

Plugins

Quality Analysis

UML-Based MDE Methodology

Feedback & Iterative

Enhancement

Data Intensive Application (DIA)

Continuous Delivery & Testing

©DICE 12/21/2017

DICE Workflow - Dev

10 ©DICE 12/21/2017

Requirements & Design

Transform to Formal Models

Simulate & Verify Optimize & Anti-patterns

Enhance Architecture

DICE Eclipse IDE

o Early-stage scalability analysis o Effective what-if analysis for in-memory wklds

Why simulating DIAs?

11 ©DICE 12/21/2017

4 days 2 days 1 day 4 days 2 days 1 day <1h <1h

Example: SAP HANA Sim vs Response surfaces

Why verification in DIAs?

12 ©DICE 12/21/2017

o Privacy-by-design (GDPR) o M2M transformation propagates changes to

runtime log verification

DICE Workflow - Ops

13 ©DICE 12/21/2017

Deploy (TOSCA) Quality Testing Fault

injection Configuration Optimization

Monitoring feedbacks

TOSCA technology library for DIA

©DICE 12/21/2017

StormCluster: type: dice.components.storm.Nimbus relationships: - {type: dice.relationships.ContainedIn, target: StormMasterVM} - {type: dice.relationships.storm.ConnectedToZookeeperQuorum, target: ZookeeperCluster} properties: configuration: {taskTimeout: '30', supervisorTimeout: '60', monitorFrequency: '10', queueSize: '100000', retryTimes: '5', retryInterval: '2000'}

14

DevOps for BD: Lessons learned

©DICE 12/21/2017

1. Choosing the right Big data technology upfront

remains an open problem.

2. Model-based orchestration an easier sell than model-based design.

3. We are still far apart from convergence of DevOps with DataOps.

4. Number of tools can be a problem, methodology is key.

15

16

• Code changes mean that workloads change • Wrong configurations can have high performance costs

DevOps with BD: Configuration

©DICE 12/21/2017

Configuring a DIA architecture

©DICE 12/21/2017

o Example: a Storm-based application

!

17

o Batch exploration of alternative DIA configurations o Algorithmic use of automated deployment

How to evolve configuration?

©DICE 12/21/2017 18

Configuration optimization (CO)

©DICE 12/21/2017

o Formally, a sequential blackbox optimization o Objective function is unknown

before the experiments (e.g., response time)

o Fixed budget of experiments o User inputs range of parameters

o Several techniques available o Design of experiments o Bayesian optimization o …

19

CO case study: Storm-based DIA

©DICE 12/21/2017

• 2.5 months to exhaustively test 4000 alternatives • Bayesian optimization can find good configurations in

just a few tens of tests

20 Tests (5 minutes/each)

BO

Dist

ance

to o

ptim

al c

onfig

20

Cassandra example

©DICE 12/21/2017

• Transfer learning methods helps reusing performance data from prior DIA releases

21

DevOps with BD: Lessons learned

©DICE 12/21/2017

1. Works out of the box, little expertise needed

2. Risk needs to be a tunable parameter

3. People still want to understand how it works

4. Scalability to tens, not hundreds of parameters

22

Questions?

23 ©DICE 12/21/2017

Thanks!