stupid boot tricks: using ipxe and chef to get to boot management bliss
DESCRIPTION
In this talk I will cover how I built a boot system using ipxe and chef's api to create a lightweight tool for managing install and firmware updating of hosts and network gear.TRANSCRIPT
Me
• Co-founder and Principal Engineer at Fastly
• Former Operations Engineer at Wikia
• Lots of Sysadmin and Linux consulting
A little history
First Racks (the bad old days)
• 2-6 machines per location
• Installs over ipmi
• Organic growth
• No local management infrastructure
Scaling Up and Out
Fastly “Mega” Design
• Single platform for caching clusters
• Deployed as a unit, limited to no incremental growth
• Same components for 4 to 32 machine clusters
• Able to justify management infrastructure
• Able to lean on convention
The “oob” machine
• Private link to internet
• Provides local provisioning
• DHCP
• Squid
• Donner
Existing Tools
• Cobbler
• Razor
• Foreman
Why not existing?
• 20+ "datacenters"
• No backbone/internal network
• Too many moving pieces
• Host network complexity
Donner
• Sinatra app and cookbook for booting things over http
• iPXE
• Chef as datastore
• Open Source soon (stupid heartbleed)
iPXE
• Open Source implementation of pxe
• Formerly known as both gPXE and Etherboot
• ROM image that can be burned into firmware
• Can boot off floppy/usb/hard/other pxe as well
Why iPXE?
• Boot of more than just tftp targets
• http, iSCSI, ATAoE, Fiber Channel
• Scriptable
• Minimal hardware and network inventory data
Why Chef for the datastore?
• Already available as a common service
• Multiple sources of truth suck
• Databags as integration point
Why databags?
• Hardware lifecycle is independent from the node object
• Searchable
• Easy to consume from other tools
Partial Search?
• Fast
• Somewhat convenient API
• I’m too lazy to deal with the databag api for reads
The Workflow
• Shipment Manifest
• Racking/Cabling
• Map Serial to Real World Location
• Power on machines and wait
Vendor Data
• For each shipment vendor provides a spreadsheet
• Serial number
• mac addresses
• Converted to data bag entries
Inventory Data Bag
{ "environment": "production", "datacenter": "LCY", "id": "cache-lcy1122", "mac": “00:25:90:86:91:d8”, "hostname": "cache-lcy1122", "publicip": "185.31.18.22", "mgmtip": "172.16.6.22", "profile": "mega16" }
Site Details
• Racking/Cabling done by remote hands
• Labels applied to physical position
• Labels mapped to serial numbers in data bags
From Bare Metal to Chef
1. Get address
2. Assign boot image
3. Build installer config
4. Build post-install config
5. Install
6. Run chef on first boot
Getting iPXE in your pxe
• ISC dhcpd can do conditional responses
subnet 172.16.16.0 netmask 255.255.255.0 { range 172.16.16.225 172.16.16.254; if exists user-class and option user-class = "iPXE" { filename “http://172.16.16.7/images/dhcpd.ipxe”; } if-else substring(hardware, 1, 3) = 01:1C:73 { option bootfile-name “http://172.16.16.7:1080/ztp”; } else { filename "undionly.kpxe"; } option routers 172.16.16.7; option domain-name-servers 172.16.16.7; }
Scripting the boot image
#!ipxe !:net isset ${net0/mac} && dhcp net0 || goto target set dhcp_mac ${net0/mac:hexhyp} !:target chain http://172.16.16.7:1180/pxe/${dhcp_mac} || goto error !:error sleep 15 goto net
Booting the installer
#!ipxe echo Installation node: <%= @machine['hostname'] %> !sleep 3 kernel http://<%= @serverip %>/images/<%= @image['kernel'] %> <%= @bootargs %> || goto error initrd http://<%= @serverip %>/images/<%= @image['initrd'] %> || goto error boot !:error echo Something went wrong, dropping to a shell… shell
The Install
• Ubuntu with preseed in our case
• Another erb template
• Nothing special here
The post-install
• Annoying amount of our magic happens here
• Lots of netconfig the installer can’t handle
• Install internal apt keys and repos
• Install our chef package and kernels
• Configure chef for first boot
• Generated from a template with access to chef objects
For more than just installers
• BIOS/Firmware Update ISOs
• Boot a live debug image
• Network Gear
Boot an ISO
FreeDOS ISO + vendor firmware
#!ipxe echo Installing Supermicro Firmware for: <%= @machine['hostname'] %> !sleep 3 initrd http://<%= @serverip %>/images/current_firmware.iso || goto error kernel http://<%= @serverip %>/images/memdisk.iso || goto error boot !:error echo Something went wrong, dropping to a shell… shell
Network Gear
• Arista Supports dhcp + http
get '/ztp' do mac = request['X-Arista-SystemMAC'] @device = lookup_device(mac) erb :ztp end