introduction to the cluster infrastructure and the systems provisioning engineering teams

18

Upload: angelo-failla

Post on 21-Jan-2017

438 views

Category:

Internet


0 download

TRANSCRIPT

Page 1: Introduction to the Cluster Infrastructure and the Systems Provisioning Engineering teams
Page 2: Introduction to the Cluster Infrastructure and the Systems Provisioning Engineering teams

Cluster Infrastructure &System Provisioning Engineering

Angelo FaillaProduction Engineer – ClusterInfra Dublin

supporting rapid infrastructure and user growth

Page 3: Introduction to the Cluster Infrastructure and the Systems Provisioning Engineering teams

What do we do?

Efficiently bring up new capacity and manage the

health of core services

required to operate our

infra.

Page 4: Introduction to the Cluster Infrastructure and the Systems Provisioning Engineering teams

• DNS Infrastructure• NTP infrastructure• Provisioning infrastructure

(DHCP, TFTP, Grub2, etc…)• Cluster/DC level automation

Cluster Infrastructure

Team Responsibilitie

s

Page 5: Introduction to the Cluster Infrastructure and the Systems Provisioning Engineering teams

System Provisioning Engineering

Team Responsibilitie

s

• Cyborg• Built on top of provisioning infra• Orchestrates server / TOR

provisioning• Image parameters tool• Repair ticketing system• Hardware checking systems

Page 6: Introduction to the Cluster Infrastructure and the Systems Provisioning Engineering teams

(some of the) challenges

Page 7: Introduction to the Cluster Infrastructure and the Systems Provisioning Engineering teams

The number of machines

Page 8: Introduction to the Cluster Infrastructure and the Systems Provisioning Engineering teams

PROVISION-ING:

IT’S HANDS FREE

Page 9: Introduction to the Cluster Infrastructure and the Systems Provisioning Engineering teams

The number of variables is too high

https://www.flickr.com/photos/curveto/2698598542/ - CC-BY-2.0-

Page 10: Introduction to the Cluster Infrastructure and the Systems Provisioning Engineering teams

Let’s talk about TFTP…

TFTP: D.O.B. 1981 Angelo: D.O.B. 1981

Page 11: Introduction to the Cluster Infrastructure and the Systems Provisioning Engineering teams

POP TFTP: Asia -> Oregon

Latency: 150ms

POP

Page 12: Introduction to the Cluster Infrastructure and the Systems Provisioning Engineering teams

POP TFTP: Asia -> OregonRRQ: 150ms

ACK: 150ms

GET DATA BLOCK0: 150ms

DATABLOCK 0 PAYLOAD: 150ms

GET DATABLOCK N: 150ms

DATABLOCK N PAYLOAD: 150ms

POP

Page 13: Introduction to the Cluster Infrastructure and the Systems Provisioning Engineering teams

File size

Block Size

Latency

Time to download

80 MB 512 B 150ms 12.5 hours

80 MB 1400 B 150ms 4.5 hours

80 MB 512 B/ 1400 B 1ms <1 minute

POP TFTP: Asia -> Oregon

Page 14: Introduction to the Cluster Infrastructure and the Systems Provisioning Engineering teams

Solution 1: let’s use iPXE as it talks TCP/HTTP! - It had a 10 minutes watchdog (which we had to patch) - after patch it was still taking > 10 min-utes

Solution 2: put fbtftp server in every POP - our own home made TFTP server - have it stream files from http - cache files locally - couple of minutes to download initrd/ker-nel

Solution 3 (currently investigating):use Grub2 and download initrd/kernel via HTTPconfigurable tcp window size, patch sent up-stream.

Solutions

Page 15: Introduction to the Cluster Infrastructure and the Systems Provisioning Engineering teams

Vendors tell you they are

IPv6 compliant, but

are they really?

Page 16: Introduction to the Cluster Infrastructure and the Systems Provisioning Engineering teams

Bring up/down clusters as fast as possible

Page 17: Introduction to the Cluster Infrastructure and the Systems Provisioning Engineering teams

Come talk to us at our

poster sessions!

Page 18: Introduction to the Cluster Infrastructure and the Systems Provisioning Engineering teams