introduction to apache mesos

59
Tomas Barton (@barton_tomas)

Upload: tomasbart

Post on 08-Sep-2014

4.370 views

Category:

Technology


5 download

DESCRIPTION

What is Apache Mesos and how to use it. A short introduction to distributed fault-tolerant systems with using ZooKeeper and Mesos. #installfest Prague 2014

TRANSCRIPT

Page 1: Introduction to Apache Mesos

Tomas Barton (@barton_tomas)

Page 2: Introduction to Apache Mesos

Load balancing

• one server is not enough

Page 3: Introduction to Apache Mesos

Time to launch new instance• new server up and runing in a few seconds

Page 4: Introduction to Apache Mesos

Job allocation problem

• run important jobs first

• how many servers do we need?

Page 5: Introduction to Apache Mesos

Load trends• various load patterns

Page 6: Introduction to Apache Mesos

Goals

• effective usage of resources

Page 7: Introduction to Apache Mesos

Goals• scalable

Page 8: Introduction to Apache Mesos

Goals• fault tolerant

Page 9: Introduction to Apache Mesos

Infrastructure

1 scalable

2 fault-tolerant

3 load balancing

4 high utilization

Page 10: Introduction to Apache Mesos

Someone musthave done it before . . .

Page 11: Introduction to Apache Mesos

Someone musthave done it before . . .

Yes, it was Google

Page 12: Introduction to Apache Mesos

Google Research

2004 - MapReduce paper• MapReduce: Simplified Data Processing on Large Clusters

by Jeffrey Dean and Sanjay Ghemawat from Google Lab

⇒ 2005 - Hadoop• by Doug Cutting and Mike Cafarella

Page 13: Introduction to Apache Mesos

Google’s secret weapon

“I prefer to call it thesystem that will not benamed.”

John Wilkes

Page 14: Introduction to Apache Mesos

NOT THIS ONENOT THIS ONE

Page 15: Introduction to Apache Mesos

“Borg”unofficial name

X distributes jobs betweencomputers

X saved cost of building at leastone entire data center

− centralized, possiblebottleneck

− hard to adjust for different jobtypes

Page 16: Introduction to Apache Mesos

“Borg”

200x – no Borg paper at all

2011 – Mesos: a platform for fine-grained resource sharingin the data center.

• Benjamin Hindman, Andy Konwinski, Matei Zaharia,

Ali Ghodsi, Anthony D. Joseph, Randy Katz, Scott

Shenker, and Ion Stoica. Berkeley, CA, USA

Page 17: Introduction to Apache Mesos

The future?

2013 – Omega: flexible, scalable schedulers for largecompute clusters

• Malte Schwarzkopf, Andy Konwinski, MichaelAbd-El-Malek, John Wilkes (SIGOPS EuropeanConference on Computer Systems (EuroSys), ACM,Prague, Czech Republic (2013), pp. 351-364)

• compares different schedulers• monolithic (Borg)• Mesos (first public release, version 0.9)• Omega (next generation of Borg)

Page 18: Introduction to Apache Mesos

Benjamin Hindman

• PhD. at UC Berkley• research on multi-core

processors

“Sixty-four cores or 128 cores on a single chip looks alot like 64 machines or 128 machines in a data center”

Page 19: Introduction to Apache Mesos

“We wanted people to be able to program for the data“We wanted people to be able to program for the data

center just like they program for their laptop.”center just like they program for their laptop.”

Page 20: Introduction to Apache Mesos

Evolution of computing

Datacenter

• low utilization of nodes• long time to start new node (30 min – 2 h)

Page 21: Introduction to Apache Mesos

Evolution of computing

Datacenter Virtual Machines

• even more machnines to manage• high virtualization costs• VM licensing

Page 22: Introduction to Apache Mesos

Evolution of computing

Datacenter Virtual Machines Mesos

• sharing resources• fault-tolerant

Page 23: Introduction to Apache Mesos

Supercomputers

• supercomputers aren’t affordable• single point of failure?

Page 24: Introduction to Apache Mesos

IaaC

Infrastructure as a computer

• build on commodity hardware

Page 25: Introduction to Apache Mesos

Scalablity

Page 26: Introduction to Apache Mesos

First Rule of Distributed Computing

“Everything fails all the time.”

— Werner Vogels, CTO of Amazon

Page 27: Introduction to Apache Mesos
Page 28: Introduction to Apache Mesos

Slave failure• no problem

Page 29: Introduction to Apache Mesos

Master failure• big problem• single point of failure

Page 30: Introduction to Apache Mesos

Master failure – solution• ok, let’s add secondary master

Page 31: Introduction to Apache Mesos
Page 32: Introduction to Apache Mesos

Secondary master failure

× not resistant to secondary master failure× masters IPs are stored in slave’s config• will survive just 1 server failure

Page 33: Introduction to Apache Mesos

Leader election

X in case of master failure new one is elected

Page 34: Introduction to Apache Mesos

Leader election

X in case of master failure new one is elected

Page 35: Introduction to Apache Mesos

Leader election• you might think of the system like this:

Page 36: Introduction to Apache Mesos

Mesos scheme• usually 4-6 standby masters are enough• tolerant to |m| failures, |s| � |m||m| . . . number of standby masters|s| . . .number of slaves

Page 37: Introduction to Apache Mesos

Mesos scheme• each slave obtain master’s IP from Zookeeper• e.g. zk://192.168.1.1:2181/mesos

Page 38: Introduction to Apache Mesos

Common delusion

1 Network is reliable2 Latency is zero3 Transport cost is zero4 The Network is homogeneous

Page 39: Introduction to Apache Mesos

Cloud Computing

design you application for failure

• split application into multiple components• every component must have redundancy• no common points of failure

Page 40: Introduction to Apache Mesos

“If I seen further thanothers it is by standingupon the shoulders ofgiants.”

Sir Isaac Newton

Page 41: Introduction to Apache Mesos

ZooKeeper

Page 42: Introduction to Apache Mesos

ZooKeeper

Page 43: Introduction to Apache Mesos

ZooKeeper

Page 44: Introduction to Apache Mesos

ZooKeeper

• not a key-value storage• not a filesystem• 1 MB item limit

Page 45: Introduction to Apache Mesos

Frameworks

• framework – application running on Mesos

two components:1 scheduler2 executor

Page 46: Introduction to Apache Mesos

Scheduling

• two levels scheduler

• Dominant Resource Fairness

Page 47: Introduction to Apache Mesos

Mesos architecture

• fault tolerant• scalable

Page 48: Introduction to Apache Mesos

Isolation

Page 49: Introduction to Apache Mesos

• using cgroups• isolates:

• CPU• memory• I/O• network

Page 50: Introduction to Apache Mesos

Example usage – GitLab CI

Page 51: Introduction to Apache Mesos
Page 52: Introduction to Apache Mesos

AirBnB

Page 53: Introduction to Apache Mesos

Any web application can run on Mesos

Page 54: Introduction to Apache Mesos

Distributed cron

Page 55: Introduction to Apache Mesos

YARNAlternative to Mesos

• Yet Another Resource Negotiator

Page 56: Introduction to Apache Mesos

Where to get Mesos?• tarball – http://mesos.apache.org• deb, rpm – http://mesosphere.io/downloads/• custom package – https://github.com/deric/mesos-deb-packaging

• AWS instance in few seconds –https://elastic.mesosphere.io/

Page 57: Introduction to Apache Mesos

Configuration management

• automatic server configuration• portable• should work on most Linux distributions (currently

Debian, Ubuntu)• https://github.com/deric/puppet-mesos

Page 58: Introduction to Apache Mesos

Thank you for attention!

[email protected]