session e: prs - cis.temple.edugiorgio/old/cis307s09/readings/cloud... · 6 open cirrus stack prs...

Post on 23-May-2018

215 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Session E:PRS

2

PRSRichard Gass

Intel

3

AgendaSessions:(A) Introduction 8.30-9.00 (B) Hadoop 9.00-10.00

Break 10.00-10.15Hadoop 10.15-11:30Lunch 11.30-12.30

(C) Pig 12.30-1.30Break 1.30-1.45

(D) Tashi 1.45-3.30Break 3.30-3.45

(E) PRS 3.45-5.00

I. OverviewII. Plans/StatusIII. User ViewIV. AdminsitrationV. InstallationVI. Summary

4

Overview

5

Open Cirrus Stack

Compute + network + storage resources

Power + cooling

Management andcontrol subsystem

Physical Resource set (PRS) service

Credit: John Wilkes (HP)

6

Open Cirrus Stack

PRS service

Eucalyptus Tashi/HDFS NFS storage service

Experiment

PRS clients, each with theirown “physical data center”

7

Open Cirrus Stack

PRS service

Eucalyptus Tashi/HDFS NFS storage service

Experiment

Virtual cluster Virtual cluster

Virtual clusters

8

Open Cirrus Stack

PRS service

Eucalyptus Tashi/HDFS NFS storage service

Experiment

Virtual cluster Virtual cluster

BigData App

Hadoop

1. Application running2. On Hadoop3. On Tashi virtual cluster4. On a PRS5. On real hardware

Web Service

9

Open Cirrus stack - PRS• PRS service goals

– Provide mini-datacenters to users– Isolate mini-datacenters from each other

• PRS service approach– Allocate sets of physical co-located nodes, isolated

inside VLANs.

• Initial PRS implementation from HP• Re-write from Intel (in collaboration with HP)

contributed to Apache Software Foundation

PRS service

Further Motivation

• Enable innovation in virtualization

• Allow running without virtualization overhead– Necessary for predictable QoS

• e.g. cache interference

11

Goals

• Reduce complexity in allocating physical resources• Gain User Confidence

– Show users that we can efficiently allocate/deallocate resources

• Stop the squatting– Incentives

• HP’s tycoon (economic model)• Simple points scheme for good behavior• Early return

12

Responsibilities of PRS

• Isolate domains• Provision system software• Provide platform control

– On/Off• Provide boot debug

VLANPXEIPMI

IPMI

13

VLAN• Virtual LAN technology allows a single physical

network to appear as several isolated networks– Ethernet packets are tagged with a VLAN id– Switches and NICs enforce the policies associated

with each VLAN

• By associating PRS domains with different VLANs, they can be isolated from each other

• The PRS system provides the interfaces necessary to abstract switch configuration programming across multiple switch vendors

14

PXE

• Enables provisioning of OS image over the network

• On machine boot, the NIC firmware contacts a PXE server via the DHCP process for the appropriate kernel and initrd to load

• Once loaded, the init scripts in the initrd can pull the filesystem to the machine

• In our environment, we download the desired filesystem to a ramdisk from a NFS server– enabling a very rapid provisioning (30 seconds or less) while leaving the host filesystem undisturbed

Pre-

eXecution

Environment

15

IPMI

• Defines a standardized, abstracted, message-based interface to intelligent platform management hardware

• Defines standardized records for describing platform management devices and their characteristics

• Enables cross-platform management software

Intelligent

Platform

Management

Interface

16

Status/Plans

17

PRS Roadmap• Stage 1

• Manages all cluster hardware• Handles resource provisioning• Provides interfaces for VLAN definition/programming• Administrator is still in the allocation decision-making loop

• Stage 2• Introduces a request queue and primitive scheduler• Admin may still be in loop, definitely for special cases• Enables provisioning of OS to local disk• Enables virtual disk conversion to physical

• Stage 3• Incentives module added (Tycoon)• Tashi integration

18

Some History

• Previous prototype developed at HP Labs• Focus on economic model

• Nice web interface which will be available upon reconvergence of code

23

User View

24

PRS Roles

• Admin: root of all authority– Controls the physical resources

• User: requests domains– Controls the domain, once allocated

25

Domains

• A Domain is the unit of PRS isolation• A simple domain is a set of compute

nodes gathered into a single VLAN

• Nodes are allocated from pools of available resources

26

The PRS Interface• Users and Admins currently interact with the

PRS system through a command line interface

• This interface both:– Queries and updates records in the PRS database– Wraps the various commands that must be issued to

effect changes in the cluster

• PRS is currently a centralized system; users log into the PRS manager to issue commands– An RPC interface is planned for the near future

27

PRS Usage

Usage: prs <options>Standard options:--help [show this help message and exit]--version [show program's version number and exit]--verbose [be verbose]

Common options:--nodeName <name> [Specify node]--switchPort <port> [Specify switchport switchname:portnum]

Common admin options:--userName <name> [Specify user name]--uid <UID> [Specify user id]

28

Image Management Interface

--addImage <img> [Add image to PRS]--delImage <img> [Delete image]

29

User Allocation Interface--createDomain <name>

– May fail if name already exists --submitDomainRequest <name>--destroyDomain –domain <name>

--requestNodes --domain <name> [--count <N>] [--nodeName <name>] [--cores <n> …]

– Add the requested nodes to the domain--assignImage <kernel> <image>

– Assign image to resource

--associateNewVlan –domain <name>– Allocate an unused VLAN number to domain

--createReservation <YYYYMMDD> <YYYYMMDD>– Specify duration of node reservation where start time may be “ASAP”

--reservationNotes “notes”--updateReservation

30

Admin Allocation Interface

--allocateNode [Assign node to a user]--releaseNode [Release node

allocation]

--vlanIsolate <vlanid> [Specify vlan for isolation]

31

Hardware Control

--hardware [Make hardware call]--powerStatus [Get power status]--rebootNode [Reboot node (Soft)]--powerCycle [Power Cycle (Hard)]--powerOff [Power off node]--powerOn [Power on node]

32

Query Interface--showReservations [Show current node reservations]--showResources [Show available resources to choose from]

--procs <N> [Filter by number of processors]--clock <N> [Filter by processor clock]--memory <N> [Filter by amount of memory (Bytes)]--cpuflags “flags” [Filter by CPU flags]--cores <N> [Filter by number of cores]

--showPxeImages [Show available PXE images to choose from]

--showPxeImageMap [Show PXE images host mapping]

33

Administration Interface--admin Enter Admin mode

--addPxeImage [Add PXE image to database]

--enableHostPort [Enable a switch port]--disableHostPort [Disable a switch port]

--removeVlan <vlanId> [Remove vlan from all switches]--createVlan <vlanId> [Create a vlan on all switches]--addNodeToVlan <vlanId> [Add node to a vlan]--removeNodeFromVlan <vlanId> [Remove node from a vlan]--setNativeVlan <vlanId> [Configure native vlan]--restoreNativeVlan [Restore native vlan]--removeAllVlans [Removes all vlans from a switchport]

--sendSwitchCommand “<command>” [Send Raw Switch Command, BE CAREFUL]--interactiveSwitchConfig “<switchname>” [Interactively configure a switch]

--showSwitchConfig <nodename> [Show switch config for node]

34

Administration

35

Typical Workflow

1. Admin queries available systems2. Admin requests systems with desired user configuration

1. i.e., cores, memory, image, duration, etc

3. Request goes in queue4. PRS locates resources and provides a list to admin/Tashi.5. Admin/Tashi moves VMs to free resources

1. Add node to blacklist and tell hadoop to reload

6. PRS allocates resources1. Provides estimated time to get resources2. User can query 3. PRS sends notification when allocated

7. PRS reclaims resources and adds them back into respective pools1. User may extend time period before expiration

36

After allocation

• A returned PRS node is typically untrusted– update the system to default settings

• Clean physical node by PXE booting a reset image• Restore all setting to defaults (address, IPMI passwords)• Repartition and format disks

• (Option) Trust images from some users– No re-format needed

• Clean network configuration (VLAN)

37

Example: Minicluster./prs –addimage hardy-rgass-testing:hardy:8.03./prs –assignimage hardy-rgass-testing –nodename

r1r1u25./prs –allocatenode –nodename r1r1u25 –username rgass

–reservationDuration 30 –vlanisolate 300 –notes “Practice allocation”

./prs –addnodetovlan 300 –nodename r1r1u25

./prs –hardware –rebootnode –nodename r1r1u25

Example: CloudConnect 1

• Network isolate a rack of machines and PXE boot them with a user’s kernel and initrd

• Create a VM that acts as a SSH gateway and a NAT for the private cluster

• Dynamically configure switches to support the networking experiment

1Gb/s Switch

100Mb/s Switch

Rack A region

Rack B region

Rack C region

Rack D region

4x1Gb trunk link

1 Gb/sSwitch

MRack A

4Gb/s Switch

- server

M

VLAN #2: Optical

VLAN #1: Electrical

100Mb/s Switch

Rack B Rack C Rack D

- switch

M - manager

39

Example: CloudConnect 2for i in r1r1u12 r1r1u13 r1r1u14 r1r1u15;do

./prs --admin --setnativevlan 300 -n ${i}

./prs --admin --addnodetovlan 800 -n ${i}

./prs --admin --addnodetovlan 801 -n ${i}

./prs --admin --addnodetovlan 802 -n ${i}done./prs --admin --switchport sw0-r1r1 --sendswitchcommand "config;interface range ethernet g(25-28); spanning-tree

disable"./prs --admin --switchport sw0-r1r1 --sendswitchcommand "config;interface ethernet g25;switchport mode trunk;exit"./prs --admin --switchport sw0-r1r1 --sendswitchcommand "config;interface ethernet g26;switchport mode trunk;exit"./prs --admin --switchport sw0-r1r1 --sendswitchcommand "config;interface ethernet g27;switchport mode trunk;exit"./prs --admin --switchport sw0-r1r1 --sendswitchcommand "config;interface ethernet g28;switchport mode trunk;exit"./prs --admin --switchport sw0-r1r1 --sendswitchcommand "config;interface ethernet g28;switchport mode trunk;exit"

./prs --admin --switchport sw0-r1r1:25 --setnativevlan 802 -v

./prs --admin --switchport sw0-r1r1:26 --setnativevlan 804 -v

./prs --admin --switchport sw0-r1r1:27 --setnativevlan 806 -v

./prs --admin --switchport sw0-r1r1:28 --setnativevlan 808 -v

for i in $(seq 12 16);do./prs --hardware --rebootnode -n r1r1u${i}

done

40

Example cluster: Networking Testbed

• Network isolate a rack of machines and PXE boot them with a user’s kernel and initrd

• Create a VM that acts as a SSH gateway and a NAT for the private cluster

• Dynamically configure switches to support the networking experiment

41

Future Work

• Integration with Tashi…– Would enable free exchange of resources

between the Tashi pool and the free pool

42

PRS client queries PRS server

for available resources

VMVM

VMVM

VM

VM

VM

VM

VM

VM

VM

System Servers

VMVM VM

VM

VM

VM

VM

VM

VM

VM

VM

PRS server

DB

Tashi Cluster Manager

VMVM VMManagement Servers

PXE server

PRS client

Administratoror

Cluster ManagerVM

VM

VMVM

PRS queries DB to locate available

resources

Node 1 : 8 Core, 16G memory, 6TB disk,30dayNode 2 : 8 Core, 16G memory, 6TB disk,30 dayNode 3 : 8 Core, 16G memory, 6TB disk,90 dayNode 4 : 8 Core, 16G memory, 6TB disk,1 dayNode 5 : 8 Core, 8G memory, 2TB disk, 90 dayNode 6 : 8 Core, 8G memory, 2TB disk,90 dayNode 7 : 8 Core, 8G memory, 2TB disk,90 dayNode 8 : 8 Core, 8G memory, 2TB disk,90 dayNode 9 : 8 Core, 8G memory, 2TB disk,90 dayNode 10: 8 Core, 8G memory, 2TB disk,30 day…

Results are sent back to

the client

User choosesmachine attributes

and submits a request for the resources

for some time period

43

VMVM

VMVM

VM

VM

VM

VM

VM

VM

VM

System Servers

VMVM VM

VM

VM

VM

VM

VM

VM

VM

VM

PRS server

DB

Tashi Cluster Manager

VMVM VMManagement Servers

PXE server

PRS client

Administratoror

Cluster ManagerVM

VM

VMVM

Request Queue

R1

44

VMVM

VMVM

VM

VM

VM

VM

VM

VM

VM

System Servers

VMVM VM

VM

VM

VM

VM

VM

VM

VM

VM

PRS server

DB

Tashi Cluster Manager

VMVM VMManagement Servers

PXE server

PRS client

Administratoror

Cluster ManagerVM

VM

VMVM

PRS processes request and

identifies physical machines that satify

the user request

VM VM VM

VMVM

VM

45

VMVM

VMVM

VM

VM

VM

VM

VM

VM

VM

System Servers

VMVM VM

VM

VM

VM

VM

VM

VM

VM

VM

PRS server

DB

Tashi Cluster Manager

VMVM VMManagement Servers

PXE server

PRS client

Administratoror

Cluster ManagerVM

VM

VMVM

PRS sends requestto Tashi to free selected nodes

Tashi moves virtual machines off of selected nodes

VM VM VM

VM

VMVM

46

VMVM

VMVM

VM

VM

VM

VM

VM

VM

VM

System Servers

VMVM VM

VM

VM

VM

VM

VM

VM

VM

VM

PRS server

DB

Tashi Cluster Manager

VMVM VMManagement Servers

PXE server

PRS client

Administratoror

Cluster ManagerVM

VM

VMVM

PRS allocated the physical machines to the requested user and isolates them from the network

using VLANsTashi notifies PRS that migration of

virutal machines has completed

VM VM

VM

VM

VMVM

PRS reboots the physical machine and

sets PXE image to users VM

VM

PXE

Virtual disk image is converted to PXE

image

Physical machines boot up with PXE

image

PXEPXEPXE

47

VMVM

VMVM

VM

VM

VM

VM

VM

VM

VM

System Servers

VMVM VM

VM

VM

VM

VM

VM

VM

VM

VM

PRS server

DB

Tashi Cluster Manager

VMVM VMManagement Servers

PXE server

PRS client

Administratoror

Cluster ManagerVM

VM

VMVM VM VM

VM

VM

VMVM

VM

PRS updates reservation database

PXE

PXE

PXE

PRS client queries server for allocation

User connects to the machines and starts running experiments

48

VMVM

VMVM

VM

VM

VM

VM

VM

VM

VM

System Servers

VMVM VM

VM

VM

VM

VM

VM

VM

VM

VM

PRS server

DB

Tashi Cluster Manager

VMVM VMManagement Servers

PXE server

PRS client

Administratoror

Cluster ManagerVM

VM

VMVM

49

VMVM

VMVM

VM

VM

VM

VM

VM

VM

VM

System Servers

VMVM VM

VM

VM

VM

VM

VM

VM

VM

VM

PRS server

DB

Tashi Cluster Manager

VMVM VMManagement Servers

PXE server

PRS client

Administratoror

Cluster ManagerVM

VM

VMVM

50

Installation

51

Necessary Components• DHCP Server• PXE Server• NFS Server• DNS Server (optional)• Configurable switches

– New switch types may require new PRS modules• Hardware access method

– E.g. IPMI– IP-addressable PDUs enable rescue if IPMI becomes

compromised

52

Internals

53

Notes on Current Software

• PRS client code is Python 2.5• PRS database implemented in MySQL

– Reachable through python-MySQLdb interface

• pExpect used for switch configuration• User information currently obtained

through LDAP

54

Summary

55

PRS• PRS lays the foundation of the Open Cirrus

software stack– easing management of multiple projects in a single cluster

• PRS enables partitioning clusters into isolated domains of physical resources

• Current implementation allows rapid provisioning of system software

• PRS code base is open source software available through Tashi project in Apache Incubator– Contributions welcome

top related