openstack - getting it all up magically - and when the magic fails

27
Magic of Openstack.. Magic of Openstack.. Anshu Anshu Prateek Prateek

Upload: anshu-prateek

Post on 28-Jan-2018

231 views

Category:

Internet


0 download

TRANSCRIPT

Page 1: Openstack - getting it all up magically - and when the magic fails

Magic of Openstack..Magic of Openstack..

Anshu Anshu PrateekPrateek

Page 2: Openstack - getting it all up magically - and when the magic fails

23/01/16 Footnote 2

Openstack..Openstack..

Getting it all up magically..And..

When the magic fails :!

Page 3: Openstack - getting it all up magically - and when the magic fails

23/01/16 Footnote 3

Me..Me..

https://about.me/anshuprateek

large scale environments for the last seven years. Starting with Yahoo! search, where I spent the first four years - moment of truth - I have “touched” each and every machine in the search inventory!

Next stopover - Aerospike, - A million TPS in less than a millisecond!

Reliance Jio – Cloud efforts

Page 4: Openstack - getting it all up magically - and when the magic fails

23/01/16 Footnote 4

Openstack..Openstack..

Millions of AT&T wireless subscribers are connected to virtualized network

services ... based on OpenStack.

— John DonovanSenior Executive Vice President of

Technology and Network OperationsAT&T

Page 5: Openstack - getting it all up magically - and when the magic fails

23/01/16 Footnote 5

DeploymentDeployment

Dev setup – devstack

Prod –

PuppetFuel

Ansible

Page 6: Openstack - getting it all up magically - and when the magic fails

23/01/16 Footnote 6

DeploymentDeployment

Its like piloting a plane

Despite so much technology and automation it still needs “pilots”

Pilots are expensive and niche!

What we need is the driverless cars!

A system which can set itself upFix the problems itself

Page 7: Openstack - getting it all up magically - and when the magic fails

23/01/16 Footnote 7

ExperienceExperience

Are we ready to trust driverless cars?

Would still need active monitoring and oncalls

A breakdown can only be so much “self-healed”

Google cars have done 1.2M Miles

10K each week

Page 8: Openstack - getting it all up magically - and when the magic fails

23/01/16 Footnote 8

ExperienceExperience

Completely automated openstack setup

40 nodes labs

With known failure points

CI-CD enabled

1 build every 40 minutes

+ Manual builds by ~50 engineers

Page 9: Openstack - getting it all up magically - and when the magic fails

23/01/16 Footnote 9

How?How?

A puppet module

Whch manages other puppet modules

And uses itself..As a module of itself.. !

++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++.

Page 10: Openstack - getting it all up magically - and when the magic fails

23/01/16 Footnote 10

BrainfuckBrainfuck

https://en.wikipedia.org/wiki/Brainfuck

(Nothing to do with this talk – I just wanted to say that out without having to beep myself :D )

Page 11: Openstack - getting it all up magically - and when the magic fails

23/01/16 Footnote 11

Puppet modulePuppet module

https://github.com/JioCloud/puppet-rjil

Puppet – R – J – I – L

or

Puppet - Rijil

Page 12: Openstack - getting it all up magically - and when the magic fails

23/01/16 Footnote 12

How it worksHow it works

Master less – stand alone setup

Each node runs the same copy of puppet code

Depending upon node name, appropriate includes are done

Hiera for more seggregation

Page 13: Openstack - getting it all up magically - and when the magic fails

23/01/16 Footnote 13

How it worksHow it works

Entry point – site.pp

CI-CD perspective – 13 minimum nodes

The “nodes” can be Vms on top of openstack

Or Vagrant

Or docker (WIP)

Page 14: Openstack - getting it all up magically - and when the magic fails

23/01/16 Footnote 14

How it worksHow it works

Consul for service discovery

Local build and packaging system for running our own code

Everything is packaged – no git clones!

Page 15: Openstack - getting it all up magically - and when the magic fails

23/01/16 Footnote 15

Development modelDevelopment model

No branches in central repo

Fork and branch and develop and test

PR

Review, test and merge

Similar to Openstack model

Page 16: Openstack - getting it all up magically - and when the magic fails

23/01/16 Footnote 16

SubstitutionsSubstitutions

Storage – Ceph

Boot by volume only to support migration

Network – OpenContrail

Consul for local DNS and service discovery and rudimentary alerts console

Page 17: Openstack - getting it all up magically - and when the magic fails

23/01/16 Footnote 17

The magicThe magic

Please test this

Let me go for a coffee and a chat

Do some more chatting

~40 minutes – result

If pass → code merged

Acceptance test

Jorc trigger_update `date +%s`

Deployed!

Page 18: Openstack - getting it all up magically - and when the magic fails

23/01/16 Footnote 18

The beautyThe beauty

Local check scripts

If a check fails → try to self heal

Works most of the time

Except you cant self-heal a bleeding wound :!

At times it may aggravate the problem

Page 19: Openstack - getting it all up magically - and when the magic fails

23/01/16 Footnote 19

The beastThe beast

Added a comma

Please test this

40 minutes

It failed!

But not my code

Cant merge yet :(

Page 20: Openstack - getting it all up magically - and when the magic fails

23/01/16 Footnote 20

WolvesWolves

Labs is broken

I know the fix :D

Oops, please test this fails (see point 1)

Cant merge

Argh – ergo labs is still broken :(

Page 21: Openstack - getting it all up magically - and when the magic fails

23/01/16 Footnote 21

Yet another wolf..Yet another wolf..

Labs is broken due to network

I am a storage developer

Labs is broken

Cant merge code :(

:(

:(

Page 22: Openstack - getting it all up magically - and when the magic fails

23/01/16 Footnote 22

Why we did so?Why we did so?

Developer inertia for known issues

Priorities features over fixes

This system ensures nothing till its all fixed

Page 23: Openstack - getting it all up magically - and when the magic fails

23/01/16 Footnote 23

ProblemsProblems

Too many wasted man hours till its fixed

Too few people who could actually fix :(

Page 24: Openstack - getting it all up magically - and when the magic fails

23/01/16 Footnote 24

New attemptNew attempt

Make it decentralised

Micro services

Every one is a customer of everyone else

Outages become component wise rather than labs wise

Page 25: Openstack - getting it all up magically - and when the magic fails

23/01/16 Footnote 25

Little Red Riding HoodLittle Red Riding Hood

https://github.com/JioCloud/puppet-rjil-keystoneSome teams going ansible as well as other routes

Page 26: Openstack - getting it all up magically - and when the magic fails

23/01/16 Footnote 26

LearningsLearnings

Big automated systems are fun

Scalable

But needs coordination between teams (and component)

Avoid feedback-loop-blockers

Monolithic systems are a pain to work on

Kernel developers – we feel you

Page 27: Openstack - getting it all up magically - and when the magic fails

23/01/16 Footnote 27

LearningsLearnings

CI-CD helps (and works!)

Self-heal needs to be done using surgical knife (or a laser cut) than a butcher's knife

Band-aid at best – but could hold the dam!

Micro-services are better

Nimble and agile

Though tough to get the big-picture

Move fast – break often