baidu cloudfoundry english

54
A private cloud platform based on CloudFoundry TRANSLATED VERSION @Weiyu Wang(王炜煜)Operations Department @Baidu weibo.com/wwy1640 2013-7-19

Upload: james-watters

Post on 29-Nov-2014

6.604 views

Category:

Technology


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Baidu cloudfoundry english

A private cloud platform based on CloudFoundry TRANSLATED VERSION

@Weiyu Wang(王炜煜),Operations Department @Baidu weibo.com/wwy1640 2013-7-19

Page 2: Baidu cloudfoundry english

Outline �

Background and Objectives

Practice and Reform(Part 1、2)

Processes and Standard

Reform operations

Future plans TRANSLATED VERSION

Page 3: Baidu cloudfoundry english

1. Background and Objectives

TRANSLATED VERSION

Page 4: Baidu cloudfoundry english

Operation and PaaS �

Storage

Servers

Networking

O/S

Middleware

Virtualization

Data

Applications

Runtime

OP(SRE),operation

PaaS (and IaaS)

TRANSLATED VERSION

Page 5: Baidu cloudfoundry english

Objectives �Automation

Business life cycle management,for example, modification 、monitor、fault handling and so on.

Resource utilization is elastic.

Standardization Flow

Instance standard

System environment、runtime、framework

Unification Integrate the third-party service,for example DB、Cache、log、FS and so on.

Linkage with other system platform

TRANSLATED VERSION

Page 6: Baidu cloudfoundry english

Why CloudFoundry ?

Automation

Standard Unification

Machine Management (The downstream department)

Automation

Standard Unification

TRANSLATED VERSION

Page 7: Baidu cloudfoundry english

Why CF ? �

Automation

Unification

Standard

TRANSLATED VERSION

Page 8: Baidu cloudfoundry english

2. Practice and Reform(Part1) Java,base on cf 1.0

TRANSLATED VERSION

Page 9: Baidu cloudfoundry english

Java Apps �

•  Number of Product Categories >100

•  APP >200

•  Instances>2000

•  Average single-instance 10G(Memory)

•  Average Daily total pv > 1billion

•  The numbers of developers and testers for APP > 700

•  Tomcat5/6/7、jdk1.5/1.6、Standalone

TRANSLATED VERSION

Page 10: Baidu cloudfoundry english

Implementation and Preparation �

•  Relevant modification based on CentOS ü Deploy each CF component independently

⁺  Analyze BOSH、chef,implementation based on physical machine

ü OS environment initialization

⁺  apt-get is changed to yum

ü Ubuntu-cmd to CentOS

⁺  DEA(v1.0),agent.rb、secure.rb

yum install -y make gcc gcc-c++ kernel-devel.x86_64 openssl-devel.x86_64 libxml2.x86_64 libxml2-devel.x86_64 libxslt.x86_64 libxslt-devel.x86_64 git.x86_64 sqlite.x86_64 ruby-sqlite3.x86_64 sqlite-devel.x86_64 unzip.x86_64 zip.x86_64 ruby-devel.x86_64 ruby-mysql.x86_64 mysql-devel.x86_64 curl-devel.x86_64 postgresql-libs.x86_64 postgresql-devel.x86_64 zlib-devel.x86_64 readline-devel.x86_64 ImageMagick.x86_64 ImageMagick-devel.x86_64 php-magickwand.x86_64

TRANSLATED VERSION

Page 11: Baidu cloudfoundry english

Cluster capacity assessment �•  Number of instances,NATS capacity assessment

ü  Number of instances hosted by single DEA(<100),the pressure to NATS-Server has little

effect

ü  Single NATS-Server can host 330 DEAs by a conservative estimate,The number of single

instance is 5~30.

ü  Multiple NATS-Server,extendable

Deplay (ms)

Number of DEAs (10 ~ 340)

Number of Single DEA instances(5 ~ 30)

Critical line 330 DEAs

TRANSLATED VERSION

Page 12: Baidu cloudfoundry english

In cluster, component redundant, LB design �

•  NATS ü Cluster,multiple NATS, synchronous heartbeat ü Cache information from client side. If network is cut down,it

should keep to reconnect. ü Multiple NATS does load balance(Client > 0.5.beta.6)

NATS-Server1 NATS-Server2

NATS-Client (caching message)

NATS-Server1/2, Random list

TRANSLATED VERSION

Page 13: Baidu cloudfoundry english

Multiple cluster redundant design �•  Multiple independent cluster ,logic independent

ü  The first layer’s switch,modify DNS A record,for multiple domain names(CNAME to this A record), they will uniformly switch to to different clusters

ü  The second layer’s switch,modify “interface layer”(For its application layer’s function ,it can be simply understood as Nginx’s reverse proxy )

ü  Ensure App (stateless) capacity,or expand the capacity quickly to prevent overload when the traffic switch back

Baidu GateWay Front End

Router

A记录

Baidu GateWay Front End

Router

app1 app1

CNAME(formal domain name)

CNAME(formal domain name)

www.baidu.com CNAME www.a.shifen.com. www.baidu.cn CNAME www.a.shifen.com. www.a.shifen.com. A 119.75.218.77 www.a.shifen.com. A 119.75.217.56

TRANSLATED VERSION

Page 14: Baidu cloudfoundry english

Core components, distributed �

Router_1

NATS_1

Router

NATS CC HM

Stager

DEA

PG_DB Redis

TRANSLATED VERSION

Page 15: Baidu cloudfoundry english

Framework(cf1.0) �

DEA

Logging Name Service Monitoring

jvm

Stager

File Persistence

HM

Router

CC

Baidu GateWay / Front End

jvm jvm

API Bridge

UAA

jvm

jvm jvm jvm jvm

Router(Cluster 02)

N A T S

DB

TRANSLATED VERSION

Page 16: Baidu cloudfoundry english

New features �•  Support RPC, Single instance with multiple

ports ü  One instance will open multiple ports,and provide API to search the

IP ,ports in real time

ü  Linkage with “name service”,synchronize dynamic IP/port’s

relationship with name.

ü  RPC caller will connect the instance directly according to name

TRANSLATED VERSION

Page 17: Baidu cloudfoundry english

DEA server

Support RPC、 Single instance with multiple ports �

Instance01:port

Instance02:port

API Bridge

NS server

TXT record ip:port ip:port

RPC caller

NS client

Domain ip:port ip:port

ip_local_port_range

10000 ~ 60000

Port pool(There is freeze

period after allocation)

61000 ~ 65000

TRANSLATED VERSION

Page 18: Baidu cloudfoundry english

New features �

•  Support JMX ü  API to search the IP and Jconsole port in real time, then implement to

collect JMX data in real time.

TRANSLATED VERSION

Page 19: Baidu cloudfoundry english

DEA

Support JMX �

Instance01: Jconsole 端口

Instance02: Jconsole 端口

{ "instances": [ { "index": 0, "state": "RUNNING", "since": 438249600, "jconsole_ip": "10.1.1.1", "jconsole_port": 61111 }, { "index": 1, "state": "RUNNING", "since": 438249600, "jconsole_ip": "10.1.1.1", "jconsole_port": 62222 }

Monitoring Metrics

CpuUseRateDaemonThreadCount MemPool_OldGen_UseRate

NonHeapMemoryUsage_used TotalCompilationTime

TotalPeakThreadCount TotalStartedThreadCount

UnloadedClassCount GC_Major_Frequency GC_Major_Time

… …

Stager: java \ -Dcom.sun.management.jmxremote.port={VCAP_JCONSOLE_PORT} -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false

TRANSLATED VERSION

Page 20: Baidu cloudfoundry english

New features �

•  Enhancement to health monitor ü  Seven layers’ detection

ü  Number of file handler detection

TRANSLATED VERSION

Page 21: Baidu cloudfoundry english

DEA Server DEA agent.rb

Health Manger

instance

http avaliability

instance

CPU MEM DISK ……

report

Enhancement to health monitor �

handler

TRANSLATED VERSION

Page 22: Baidu cloudfoundry english

DEA(v1.0), logical enhancement �•  Ports Management

ü  Description ⁺  Single DEA, multiple instance,parallel to assign and start the port,there is no

critical line,but there is the port competition issue

ü  Solution ⁺  Reference DEA(v2.0)’s logic(Notes: it’s DEA_NG, not compatible with CF1.0)

⁺  Define ip_local_port_range as 10000~61000,it is dynamic ports’ range

⁺  Make 61001~65000 as DEA scheduling assigned ports

⁺  For assigned port,add “[release time、port num]” data structure

⁺  It resolve the port competition by delaying to release the port

ü  Note ⁺  CF2.0 has resolved this problem by the same method above.

TRANSLATED VERSION

Page 23: Baidu cloudfoundry english

DEA(v1.0),logical enhancement �

•  Instance resource information management ü Description

⁺  Du command takes long time to calculate the disk space, as a result, the

following commands’ calculation is not consistent

⁺  When calculate the CPU utilization, it doesn’t consider the number of cores

ü  Solution

⁺  Adjust the related command’s order

⁺  When calculate the CPU utilization, it should be divided by the number of

cores

ü Notes ⁺  CF2.0 has resolved this problem.

TRANSLATED VERSION

Page 24: Baidu cloudfoundry english

New features (Linkage with peripheral system) �

•  File persistent ü  Use MFS(Moose File System)

ü  DEA deply MFS-Client and mount /mfs/path to let instance use

ü  MFS service provide the HTTP interface to get the data

•  Route based on URL,distinguish APP ü  foo.baidu.com/app1 à app1.foo.baidu.com

ü  foo.baidu.com/app2 à app2.foo.baidu.com

•  Monitor linkage ü  APP’s life cycle,to interact with external monitor system’s API, to implement the

monitor item’s automatic modification.

•  The SDK ü  Automatic release(encapsulate vmc)

ü  View file

TRANSLATED VERSION

Page 25: Baidu cloudfoundry english

Summary of key reform point(CF V1.0) �•  Relevant reform based on CentOS

•  NATS-Cluster usage、NATS-Client retry and cache

•  Support RPC、single instance with multiple ports

•  Support dynamic JMX、Jconsole

•  Enhance the health monitor

•  Ports management

•  Instance resource information management

•  Peripheral component:File persistent、Monitor linkage、URI Route、The SDK

TRANSLATED VERSION

Page 26: Baidu cloudfoundry english

2. Practice and Reform(Part2) C/C++,base on cf 2.0

TRANSLATED VERSION

Page 27: Baidu cloudfoundry english

Several key problems of C/C++ Apps �•  Container’s runtime is isolated with resource

ü  Kernel/GNU

ü  Resource isolation

ü  Snapshot,Core Dump

•  Single instance, multiple processes ü  Health monitor

ü  The order of processes’ execution

ü  Communication within instance and among process

ü  Multiple ports

ü  The isomorphism of multiple instances

TRANSLATED VERSION

Page 28: Baidu cloudfoundry english

Several key problems of C/C++ Apps �•  Big instance

ü  Big instance number(100 thousands)

ü  Large amount of data(single instance,2TB)

ü  High memory usage(single instance,100G)

ü  Long start time(30mins)

ü  Large flow(single instance,daily total PV2 hundred million)

ü  When drift,to prevent insufficient resources

•  APP communication ü  Network layer communication,authorization、flow control

ü  Output file,need to get from outside

ü  Input file,need to push from outside

ü  RPC,none-HTTP protocol,not containing PATH info,can’t route

TRANSLATED VERSION

Page 29: Baidu cloudfoundry english

Instance’s OS-Level environment preparation �•  Container’s runtime environment

ü Kernel is consistent with host machine

ü Make Container’s file environment

warden/warden/root/linux/rootfs/setup.sh

if grep -q -i centos /etc/issue then exec $(dirname $0)/centos.sh $@ fi

TRANSLATED VERSION

Page 30: Baidu cloudfoundry english

Relationship between Container and host machine �

Warden

Networking,Bridge / NAT / Firewall / FlowControl

DEA

init─┬─xxx ├─xxx─xxx ├─xxx

mount r usr/ lib/ etc/ mount rw xxx/

network interface(sub net)

Cgroup – CPU / MEM

Name space init─┬─xxx ├─xxx─xxx ├─xxx

mount r usr/ lib/ etc/ mount rw xxx/

network interface(sub net)

Cgroup – CPU / MEM

Name space

TRANSLATED VERSION

Page 31: Baidu cloudfoundry english

Package management �•  Buildpack API

ü  detect , check

ü  Compile,environment preparation

⁺  Directory structure

⁺  Program files,and relevant supporting program

⁺  Startup script, and ensure the startup order of process …

⁺  Monitor script,it can periodically execute and check the whole instance’s health

ü  Release,information to publish

ü  Procfile,parameter passing(e.g. port)

ü  .profile.d,environment variable

TRANSLATED VERSION

Page 32: Baidu cloudfoundry english

Point to enhance health monitor �•  Self-defined monitor scripts

ü  self-defined monitor scripts, which is published together with instance and periodically to modify the content of stat_file

ü  DEA will check the stat_file periodically

Instance

stat_file

monitor.sh

process-1

process-2

DEA

HM

TRANSLATED VERSION

Page 33: Baidu cloudfoundry english

Reform to APP �•  For RPC,support NS Client

ü  Dynamic configuration file to replace route

ü  Port management,freeze time

•  Input/Output file ü  Input file need to get from outside actively

ü  Output file,pushed to the transit(e.g. cloud storage ),or service based on NS

•  Multiple process management, startup scripts ü  Multiple processes,to control their startup order

ü  Process control

•  File persistent ü  Remote log

ü  Use the cloud storage

TRANSLATED VERSION

Page 34: Baidu cloudfoundry english

Framework(CF2.0) �

DEA

Logging Name Service Monitoring File

Persistence

HM

gorouter(RPC,not applicable)

CC

Baidu GateWay / Front End

API Bridge

UAA

(Cluster 02)

N A T S

Container

process-1

process-2

Warden

NS Client

Container

process-1

process-2

Container

process-1

process-2 DB

TRANSLATED VERSION

Page 35: Baidu cloudfoundry english

Reform Summary(cf v2.0) �•  Relevant reform based on the CentOS

•  Container’s environment order

•  Buildpack’s order

•  Support RPC, single instance, multiple ports

•  Enhance the health monitor

•  Peripheral: file persistent, monitor linkage, URI Route, SDK

TRANSLATED VERSION

Page 36: Baidu cloudfoundry english

3. Processes and Standard

TRANSLATED VERSION

Page 37: Baidu cloudfoundry english

Working Process Description �

Review •  Standard •  Capacity •  SLA

Access •  Org

relationship •  Name info •  Operation info

Process approval •  Authorizatio

n apply •  Name apply •  Release opt

Release update •  PreRelea

se •  Gray

scale •  Rollback

Failure handling •  availabilit

y •  Security •  Issue

mgmt

TRANSLATED VERSION

Page 38: Baidu cloudfoundry english

Standard and Capacity Example �

•  Standard information collection ü  App related name, related interface people(R&D, QA, operation,

related manager, and so on)

ü  Runtime is isolated with container’s version

ü  Stateless, RPC, URI Route

ü  Dynamic and static files are isolated

ü  File persistence

•  Capacity information collection ü  PV、QPS

ü  Single instance’s CPU, memory, disk, bandwidth, restarting time

ü  Number of instances

TRANSLATED VERSION

Page 39: Baidu cloudfoundry english

SLA examples �•  Service object

ü  Java Application(“APP” for short in the following) ü  APP that conforms to the standard

•  Servicing time ü  24×365 all year round

•  Way to communication ü  Mail、Tel、interface people information

•  Stability related indicators ü  Core components,availability >99.99%(by month),MTTR<20mins,

MTBF>5days ü  Control services,availability >99.95%(the whole year) ü  APP’s self SLA, it won’t cause bad effect because of platform its self. ü  Notes:APP’s self problem,beyond the scope of SLA,for example,

bug, capacity forest error, external system’s failure(e.g. DB, Cache) and so on

TRANSLATED VERSION

Page 40: Baidu cloudfoundry english

Organization, Layer �•  Product line(Org)

• Module(Space)

• Group(APP)

•  Version (APP-*)

Product line -2

Product line-1 (Org)

Module-2

Module-1 (Space)

Group-1(A)

Group-2(B)

实例,版本-1 (APP-1-1)

实例,版本-2 (APP-1-2)

实例,版本-1 (APP-2-1)

实例,版本-2 (APP-2-2)

Instance,v1 (A-1)

Instance,V2 (A-2)

Instance,v1 (B-1)

Instance,V2 (B-2)

It is one APP,but multiple

instances in the dashed frame.

TRANSLATED VERSION

Page 41: Baidu cloudfoundry english

Further encapsulation to CC �

Product line(Org) OrgName

Module(Space) OrgName_SpaceName

Module group OrgName_SpaceName_GroupTag

Module version OrgName_SpaceName_GroupTag_VersionTag

Instance(Unique id) OrgName_SpaceName_GroupTag_VersionTag_Index

TRANSLATED VERSION

Page 42: Baidu cloudfoundry english

GroupTag、VersionTag �• GroupTag

•  It can distinguish: configuration number、computer room、rack … from different dimension

•  Version Tag •  It can distinguish:program, data, configuration file and so on

•  Including: four version number, timestamp

•  Instance full name,for example

•  Org_Space_GroupA_1-1-1-1-438249600_1

•  Org_Space_GroupB_1-1-1-1-438249600_1

TRANSLATED VERSION

Page 43: Baidu cloudfoundry english

Examination, approval and release �

•  Distribute form and approve ü  APP information(program version, capacity information, related

instruction and so on)

ü  Approval(related manager, and the people who should know)

ü  Operator、Operating time

ü  Monitor information(Monitoring and controlling strategy、Interface people and so on)

•  Start to distribute operation, and add

monitor ü  Before release,related approval processes must pass

ü  Operator, program version, MD5、time information and so on,it

must keep consistent with approval

ü  It must be consistent and pass the processes,then it can

release

ü  After successful release, add the monitor

Distribute form

Approval

Release APP

Add Monitor

TRANSLATED VERSION

Page 44: Baidu cloudfoundry english

Pre-release, release, rollback �

app_v1 instance01 app_v1.paas.baidu.com

app_v1 instance02

app_v2 instance01

app_v2 instance02

app_v3 instance01

app_v3 instance02 app_v3.paas.baidu.com

app.baidu.com

Generic domain name, map/unmap, multiple versions of app

Ahead, Release

Retreat, roll back

Pre-release,Offline observation in inner network

TRANSLATED VERSION

Page 45: Baidu cloudfoundry english

Basic grays scale release �

app_v1 instance01 app_v1.paas.baidu.com

app_v1 instance02

app_v2 instance01

app_v2 instance02

app_v3 instance01

app_v3 instance02

app.baidu.com

1、Make one formal domain name point to multiple apps at the same time 2、Adjust the proportion of many instances’ number,then adjust the proportion of traffic.

app.baidu.com

app_v2 instance03

By adjusting the proportion of the many instance’s number, to adjust the proportion of gray scale traffic

TRANSLATED VERSION

Page 46: Baidu cloudfoundry english

“The path to sermon”, The platform popularization �•  The medal, who own the other half ?

ü  Support app

⁺  New service needs to follow the PaaS related standard and thought

⁺  Old service,need R&D to reform and QA to do regression test

ü  Periphery support ⁺  DB, Cache, storage, interface, security, monitor and so on

•  Clear the benefits,establish the win-win ecosystem ü  Deliver faster, save more resource, and make it more simple

ü  One-stop and all-in-all service,hand in hand to popularize

TRANSLATED VERSION

Page 47: Baidu cloudfoundry english

Some solutions: �•  Give users(APP developers) noble imperial

enjoyment ü  For important APP,do some specific service

ü  For important managers,it should have a set of complete, timely communication, such

as reports, etc

ü  The principle is “capitalism”, rather than “socialism”

•  Event “marketing” ü  E.g. “struts2 0day”

⁺  Actively cooperate with R&D and QA to do the issues identification, repair and

implementation

⁺  Actively report the progress and do the event managment

⁺  Late,for this to actively promote and participate the discussion and make decision,

for example, security, and architecture group

⁺  The principle is “win-win”,rather than shirking the responsibility

TRANSLATED VERSION

Page 48: Baidu cloudfoundry english

4. Reform Operation

TRANSLATED VERSION

Page 49: Baidu cloudfoundry english

Reform operation �

“NoOps” PaaS(and IaaS) overall functionality >= Traditional operation work

Storage

Servers

Networking

O/S

Middleware

Virtualization

Data

Applications

Runtime

OP(SRE),operation

PaaS (and IaaS)

TRANSLATED VERSION

Page 50: Baidu cloudfoundry english

How to reform,Example �• Automatic fault recovery

ü  Add the health monitor mechanism based on the

traditional monitoring

ü  Instance automatically restart and “drift”

ü  Reduce the traditional alarm and man power

⁺  It will only alarm, when automatic recovery fail

Monitor

Whole instance name_1 ip:port

… …

Health

monitor

API … …

Real instance_1 ip:port

Instance after drifting_1

•  ”drift” is a normal phenomenon, it doesn’t alarm

•  It only need the alarm, when “drift” fail

•  It refinins to monitor instance,every time according to

name,detect and return ip:port

TRANSLATED VERSION

Page 51: Baidu cloudfoundry english

How to reform, Example �• More agile

ü  Make developer forget the servers, instead of resource oriented

ü  It has a complete configuration management and automatic deployment

function

ü  Release, pre-release, rollback, extremely simple, and it doesn’t need the

extra complex deployment tool

ü  Elastic extension, extremely simple

ü  Use Buildpack,implement cloud compiling and run directly

•  Experience of all in one and one-stop ü  From distribute form, release and modify the monitor,the working process is

totally automatic

ü  Integrate the third-party service, unify the management entrance

TRANSLATED VERSION

Page 52: Baidu cloudfoundry english

5. Future plans

TRANSLATED VERSION

Page 53: Baidu cloudfoundry english

Future plans �• Feedback to community

•  For private cloud function,try best to encapsulate the native components(based

CF2.0) , then make the new component open source

•  If affect the native components,try best to merge to the master branch

•  Write more document and tips, and actively to participate in communication

• Development orientation •  For large application(big instance)related

•  Intelligent scheduling related

•  Information Security

•  Further continuous integration

•  UI

TRANSLATED VERSION

Page 54: Baidu cloudfoundry english

We are hiring !

@Weiyu Wang(王炜煜) weibo.com/wwy1640

Thanks

TRANSLATED VERSION