サルでもわかるmesos schedulerの作り方

サルでもわかるMesos Scheduler

の作り方Mesos Tokyo 勉強会 #1

Feb 2015@wallyqs

https://twitter.com/wallyqs

About me

Name: Wally (ワリ)Twitter:

Github: From Mexico :)

https://twitter.com/wallyqshttps://github.com/wallyqs

https://twitter.com/wallyqs

https://github.com/wallyqs

Interestsインフラ、分散システム、次世代PaaS

I like fast deploys with high scalability &availability

文芸的プログラミングOrg mode heavy user

CommunitiesGoogle Summer of Code

CommunitiesHackerSchool alumni

Programmer's retreat based in New York.Great community!

Org mode activityOrg mode Ruby parser

Used at Github for rendering .org files!Added syntax highlighting support and many otherimprovements/bugs

AgendaWhy Mesos? Why implementing our ownScheduler?What does a Mesos Scheduler do?

Communication flow between componentsBasic scheduler implementation stylesExamples

Why Mesos?

BackgroundExperience operating PaaS for 3 yearsnow

Short term benefits but..

Full Stack PaaS means fork is almostunavoidable

Following communityPriorities mismatch

Platform too tightly coupledCan only deploy webworkloads

Conway's LawEtc…

How does Mesos helps?By adding another level of indirection!

A set of APIs instead of a Full Stackapproach

We can implement an scheduler with the logic we needNo vendor lock-inNot becoming unable of following OSS community due tofork.No roadmap mismatch issue

But how does Mesos works???

Basic components of a Mesos ClusterSome Mesos MasterMany Mesos SlavesSchedulers

Also calledframeworks

Executors

All communication done via HTTP

Communication flow betweencomponents

ExampleMaster running at 192.168.0.7:5050Slave running at 192.168.0.7:5051Scheduler running at192.168.0.7:59508Executor running at 192.168.0.7:58006

Discovery between Master to SlavesSlaves announce themselves to the Master

Master pings slave:POST /slave(1)/PING HTTP/1.0User-Agent: libprocess/slave-observer(1)@192.168.0.7:5050Connection: Keep-AliveTransfer-Encoding: chunked

Slave pongs back:POST /slave-observer(1)/PONG HTTP/1.0User-Agent: libprocess/slave(1)@192.168.0.7:5051Connection: Keep-Alive

Scheduler starts and Registers to themaster

POST /master/mesos.internal.RegisterFrameworkMessage HTTP/1.1Host: 192.168.0.7:5050User-Agent: Go 1.1 package httpContent-Length: 44Connection: Keep-AliveContent-Type: application/x-protobufLibprocess-From: scheduler(1)@192.168.0.7:59508Accept-Encoding: gzip

Master ACKs the registering to thescheduler

POST /scheduler(1)/mesos.internal.FrameworkRegisteredMessage HTTP/1.0User-Agent: libprocess/[email protected]:5050Connection: Keep-AliveTransfer-Encoding: chunked

Then Master starts giving resources to theScheduler

POST /scheduler(1)/mesos.internal.ResourceOffersMessage HTTP/1.0User-Agent: libprocess/[email protected]:5050Connection: Keep-AliveTransfer-Encoding: chunked

cpu 2slave(1)@192.168.0.7:5051

Scheduler accumulates offerings andlaunches tasks to the Master

The Master will give an Slave resource to run the job.POST /master/mesos.internal.LaunchTasksMessage HTTP/1.1Host: 192.168.0.7:5050User-Agent: Go 1.1 package httpContent-Length: 260Connection: Keep-AliveContent-Type: application/x-protobufLibprocess-From: scheduler(1)@192.168.0.7:59508Accept-Encoding: gzip

Master submits job from scheduler to theSlave

POST /slave(1)/mesos.internal.RunTaskMessage HTTP/1.0User-Agent: libprocess/[email protected]:5050Connection: Keep-AliveTransfer-Encoding: chunked

Executor is started and registers back tothe Slave

POST /slave(1)/mesos.internal.RegisterExecutorMessage HTTP/1.0User-Agent: libprocess/executor(1)@192.168.0.7:58006Connection: Keep-AliveTransfer-Encoding: chunked

Slave ACKs to the executor that it is awareof it

POST /executor(1)/mesos.internal.ExecutorRegisteredMessage HTTP/1.0User-Agent: libprocess/slave(1)@192.168.0.7:5051Connection: Keep-AliveTransfer-Encoding: chunked

Then Slave submits a job to the ExecutorPOST /executor(1)/mesos.internal.RunTaskMessage HTTP/1.0User-Agent: libprocess/slave(1)@192.168.0.7:5051Connection: Keep-AliveTransfer-Encoding: chunked

Executor will constantly be sharing statusto the slave

POST /slave(1)/mesos.internal.StatusUpdateMessage HTTP/1.0User-Agent: libprocess/executor(1)@192.168.0.7:58006Connection: Keep-AliveTransfer-Encoding: chunked

Then the Slave will escalate the status tothe Master

POST /master/mesos.internal.StatusUpdateMessage HTTP/1.0User-Agent: libprocess/slave(1)@192.168.0.7:5051Connection: Keep-AliveTransfer-Encoding: chunked

And so on, and so on…

Responsibilities of theScheduler and Executor

Scheduler:Receive resource offerings and launchtasksProcess status updates about the tasks

Executor:Run tasksUpdate status of the tasks

Basic Example:CommandScheduler

サルでも分からなきゃいけないから、ScalaではなくてGoを使います。:P

超簡単 CommandScheduler (like mesos-exec in C++ but in Go)デフォルトのMesos Executorの機能でOK

https://github.com/mesos/mesos-go

/usr/local/libexec/mesos/mesos-executor

Usage:

go run command_scheduler.go -address=192.168.0.7:5050 -task-count=2 -cmd="while true; do echo helloworld; done"

https://github.com/mesos/mesos-go

Importspackage main

import ( "flag" "fmt" "net" "strconv"

"github.com/gogo/protobuf/proto" mesos "github.com/mesos/mesos-go/mesosproto" util "github.com/mesos/mesos-go/mesosutil" sched "github.com/mesos/mesos-go/scheduler")

CommandScheduler typeImplement the Scheduler interface

type CommandScheduler struct { tasksLaunched int tasksFinished int totalTasks int}

Scheduler のインタフェースResourceOffers と StatusUpdate くらい実装すればいい他のメソードは一旦ペンディングで

ResourceOffersStatusUpdateRegisteredReregisteredDisconnectedOfferRescindedFrameworkMessageSlaveLostExecutorLostError

ResourceOffers の実装The Master will be giving the scheduler offeringsSome of the important information contained within the offeringsare

Resources available: disk, cpu, memId of the slave that contains such resources

コードfunc (sched *CommandScheduler) ResourceOffers(driver sched.SchedulerDriver, offers []*mesos.Offer) {

for _, offer := range offers { cpuResources := util.FilterResources(offer.Resources, func(res *mesos.Resource) bool { return res.GetName() == "cpus" }) cpus := 0.0 for _, res := range cpuResources { cpus += res.GetScalar().GetValue() }

memResources := util.FilterResources(offer.Resources, func(res *mesos.Resource) bool { return res.GetName() == "mem" }) mems := 0.0 for _, res := range memResources { mems += res.GetScalar().GetValue() }

fmt.Println("Received Offer <", offer.Id.GetValue(), "> with cpus=", cpus, " mem=", mems) remainingCpus := cpus remainingMems := mems

コード

ポイント #0: Scheduler is responsible of using resources correctlyポイント #1: TaskId needs to be unique somehowポイント #2: For a task to run it needs a SlaveId which is containedin the offervar tasks []*mesos.TaskInfofor sched.tasksLaunched < sched.totalTasks && CPUS_PER_TASK <= remainingCpus && // ポイント#0 MEM_PER_TASK <= remainingMems {

sched.tasksLaunched++ // ポイント#1 taskId := &mesos.TaskID{ Value: proto.String(strconv.Itoa(sched.tasksLaunched)), } task := &mesos.TaskInfo{ Name: proto.String("go-cmd-task-" + taskId.GetValue()), TaskId: taskId, SlaveId: offer.SlaveId, // ポイント#2 Resources: []*mesos.Resource{ util.NewScalarResource("cpus", CPUS_PER_TASK), util.NewScalarResource("mem", MEM_PER_TASK), }, Command: &mesos.CommandInfo{ Value: proto.String(*jobCmd), }, } fmt.Printf("Prepared task: %s with offer %s for launch\n", task.GetName(), offer.Id.GetValue()) tasks = append(tasks, task) remainingCpus -= CPUS_PER_TASK remainingMems -= MEM_PER_TASK

StatusUpdate の実装

コード

ポイント#0: Use taskId (status.TaskId.GetValue()) to handlewhat to doポイント#1: In this example, the schedule stops if one task dies.func (sched *CommandScheduler) StatusUpdate(driver sched.SchedulerDriver, status *mesos.TaskStatus) {

// ポイント#0: status.TaskId.GetValue() fmt.Println("Status update: task", status.TaskId.GetValue(), " is in state ", status.State.Enum().String())

if status.GetState() == mesos.TaskState_TASK_FINISHED { sched.tasksFinished++ }

if sched.tasksFinished >= sched.totalTasks { fmt.Println("Total tasks completed, stopping framework.") driver.Stop(false) }

if status.GetState() == mesos.TaskState_TASK_LOST || status.GetState() == mesos.TaskState_TASK_KILLED || // ポイント#1 status.GetState() == mesos.TaskState_TASK_FAILED { fmt.Println( "Aborting because task", status.TaskId.GetValue(), "is in unexpected state", status.State.String(), "with message", status.GetMessage(), ) driver.Abort() }}

最後に、mainOnly thing we need to do is pass the scheduler to the configuration.func main() {

fwinfo := &mesos.FrameworkInfo{ User: proto.String(""), Name: proto.String("Go Command Scheduler"), }

bindingAddress := parseIP(*address)

config := sched.DriverConfig{ Scheduler: &CommandScheduler{ tasksLaunched: 0, tasksFinished: 0, totalTasks: *taskCount, }, Framework: fwinfo, Master: *master, BindingAddress: bindingAddress, } driver, err := sched.NewMesosSchedulerDriver(config)}

Done!go run examples/command_scheduler.go -address="192.168.0.7" -master="192.168.0.7:5050" -logtostderr=true -task-count=4 -cmd="ruby -e '10.times { puts :hellooooooo; sleep 1}'"

Initializing the Command Scheduler...Framework Registered with Master &MasterInfo{Id:*20150225-174751-117483712-5050-13334,Ip:*117483712,Port:*5050,Pid:*[email protected]:5050,Hostname:*192.168.0.7,XXX_unrecognized:[],}Received Offer < 20150225-174751-117483712-5050-13334-O0 > with cpus= 4 mem= 2812Prepared task: go-cmd-task-1 with offer 20150225-174751-117483712-5050-13334-O0 for launchPrepared task: go-cmd-task-2 with offer 20150225-174751-117483712-5050-13334-O0 for launchPrepared task: go-cmd-task-3 with offer 20150225-174751-117483712-5050-13334-O0 for launchPrepared task: go-cmd-task-4 with offer 20150225-174751-117483712-5050-13334-O0 for launchLaunching 4 tasks for offer 20150225-174751-117483712-5050-13334-O0Status update: task 1 is in state TASK_RUNNINGStatus update: task 3 is in state TASK_RUNNINGStatus update: task 2 is in state TASK_RUNNINGStatus update: task 4 is in state TASK_RUNNING

What about containers?Mesos 0.20からContainerInfoも使えます。例：

task := &mesos.TaskInfo{ Name: proto.String("go-cmd-task-" + taskId.GetValue()), TaskId: taskId, SlaveId: offer.SlaveId, // Executor: sched.executor, Resources: []*mesos.Resource{ util.NewScalarResource("cpus", CPUS_PER_TASK), util.NewScalarResource("mem", MEM_PER_TASK), }, Command: &mesos.CommandInfo{ Value: proto.String(*jobCmd), }, Container: &mesos.ContainerInfo{ // ポイント Type: mesos.ContainerInfo_DOCKER.Enum(), Docker: &mesos.ContainerInfo_DockerInfo{ Image: proto.String(*dockerImage), // Network: mesos.ContainerInfo_DockerInfo_BRIDGE.Enum(), // PortMappings: []*ContainerInfo_DockerInfo_PortMapping{}, }, },}

Examplesudo docker ps

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES1a8b3c964c3e redis:latest "\"/bin/sh -c redis- 17 minutes ago Up 17 minutes mesos-88de0870-b613-4bda-9ed4-30995834ccab

What about fault tolerance?

First, Sheduler needs to be aware of alltasks info too

type FaultTolerantCommandScheduler struct { tasksLaunched int tasksFinished int totalTasks int tasksList []*mesos.TaskInfo}

ResourceOffers handlerA task in order to be valid, it needs an SlaveID.In ResourceOffers we only execute the ones withoutSlaveID

var tasksToLaunch []*mesos.TaskInfo

for _, task := range sched.tasksList { // Check if it is running already or not (has an SlaveID) if task.SlaveId == nil { fmt.Println("[OFFER ] ", offer.SlaveId, "will be used for task:", task) task.SlaveId = offer.SlaveId remainingCpus -= CPUS_PER_TASK remainingMems -= MEM_PER_TASK tasksToLaunch = append(tasksToLaunch, task) }}

if len(tasksToLaunch) > 0 { fmt.Println("[OFFER] Launching ", len(tasksToLaunch), "tasks for offer", offer.Id.GetValue()) driver.LaunchTasks([]*mesos.OfferID{offer.Id}, tasksToLaunch, &mesos.Filters{RefuseSeconds: proto.Float64(1)})}

StatusUpdate handlerStatusUpdate を受け取たら、ハンドリングできる。次のResourceOffers が行われるときに、リスケジュールされる。

if status.GetState() == mesos.TaskState_TASK_KILLED { taskId, _ := strconv.Atoi(*status.GetTaskId().Value) fmt.Println("[STATUS] TASK_KILLED: ", taskId) sched.tasksList[taskId - 1].SlaveId = nil}

if status.GetState() == mesos.TaskState_TASK_FAILED { taskId, _ := strconv.Atoi(*status.GetTaskId().Value) fmt.Println("[STATUS] TASK_FAILED: ", taskId) sched.tasksList[taskId - 1].SlaveId = nil}

ConclusionsNot so complicated to create your own custom schedulersEasy to extend and wrap around HTTP APIs to build desiredlogic.Good pluggable solution!

ご静聴ありがとうございます。

サルでもわかるmesos schedulerの作り方

Technology