ldom network toi

19
TSC SystemsTech Talk Feb , 2009 Lui, Hoe Keong TSC APAC Logical Domains Networking : An Introduction to Logical Domain Channels

Upload: enrique-del-rio-martin-serrano

Post on 29-Nov-2014

75 views

Category:

Documents


8 download

TRANSCRIPT

Page 1: Ldom Network Toi

TSC SystemsTech TalkFeb , 2009

●Lui, Hoe Keong–TSC APAC

Logical Domains Networking : An Introduction to Logical Domain Channels

Page 2: Ldom Network Toi

Sun Confidential: Need to Know Only 2

Agenda

• Logical Domains : Virtual IO Model • What's the Logical Domain Channel• LDC : An Overview• LDC Comms : Packet Based• LDC Comms : Shared Memory • LDom Networks & LDC

Page 3: Ldom Network Toi

Sun Confidential: Need to Know Only 3

LDOMs : Networking

• Virtual network (vnet) device implements a virtual Ethernet device and communicates with other vnet devices in the system using the virtual network switch.

• Virtual network switch (vsw) is a layer-2 network switch that connects the virtual network devices to the external network and also switches packets between them.

Page 4: Ldom Network Toi

Sun Confidential: Need to Know Only 4

LDOM : Virtual IO Model • Virtualized I/O Model, showing devices

shared from an I/O Service Domain through a Logical Domain Channel (LDC) to a Guest Domain

• The concept of virtual devices is based upon at least one service domain owning a device through the direct I/O model, and establishing a path to the other domains by a logical domain channel. The operating system in the guest domains then sees a virtual device driver with which it can interact as if it were a local, physical device.

Page 5: Ldom Network Toi

Sun Confidential: Need to Know Only 5

LDOM : Background • LDoms allows you to allocate a system’s various resources, such as memory,

CPUs, and devices, into logical groupings and create multiple, discrete systems, each with their own operating system, resources, and identity within a single computer system. This is done via facilitating the abstraction of the underlying compute and IO resources

• LDoms Virtual IO (VIO) infrastructure provides device access to domains via virtualized devices that communicate with a 'service' domain that completely own a device along with its driver, and functions as a proxy to the device

• This is implemented via a client server model where client virtual devices communicate with their service counterpart via a general purpose channel infrastructure for Inter-domain and domain-Hypervisor communications – the Logical Domain Channels ( LDCs )

Page 6: Ldom Network Toi

Sun Confidential: Need to Know Only 6

LDOM Virtual IO Model : LDC• Virtual device drivers interact with their underlying hardware via the Hypervisor

• There are two primary reasons to virtualize a device driver : sharing and security

• LDOM virtual I/O functionaility includes support for virtual networking , disk and console ( along with their correponding services backend )

• Underlying the infrastructure of services backends and services consumers is a general purpose channel infrastructure for Inter-domain and domain-Hypervisor communications – the Logical Domain Channels ( LDCs ) :

41 131f648 d558 - 1 ldc (sun4v LDC module v1.9)

165 7bfd0000 30d8 130 1 vldc (sun4v Virtual LDC Driver 1.6)

• The vldc driver extends the LDC functionality to user level clients via standard driver interfaces

Page 7: Ldom Network Toi

Sun Confidential: Need to Know Only 7

Logical Domains : What's LDC ?• When you first setup a LDOMS system, you first boot into the "factory-default"

machine description which basically gives all of the hardware resources to the control node. Then from the control node you take things away, and setup the virtualization server services, so that guests can be created. You update the machine description for the control node, giving it a new name, then reboot into it. You're ready to create guests. The copies of these machine descriptions sit on the System Controller of the machine.

• Along with the machine description facility are Logical Domain Channels (LDCs). Each virtualized service between guests, control nodes, service nodes, and the system controller communicate over point to point links. Each link is configured within the hypervisor, there is a transmit queue and a receive queue for each end of the channel. Each entry in the queue holds a 64-byte fixed-size LDC packet. You can size your queues however you like with some minor restrictions.

Page 8: Ldom Network Toi

Sun Confidential: Need to Know Only 8

Logical Domains : What's LDC ?

Page 9: Ldom Network Toi

Sun Confidential: Need to Know Only 9

Logical Domains : What's LDC ?• The LDC link layer defines a handshake, reliable and unreliable as well as raw

modes of operation. The handshake is used to negoatiate a LDC protocol version that both sides can understand. The handshake also is used to get the sequence numbers initialized so real work can be done on the link. The raw mode elides the handshake entirely, has no packet headers, and just sends raw 64-byte packets over the link.

• The hypervisor also provides memory sharing facilities for the LDC channels. There is a page table where exported pages are defined, and exported memory is expressed to the remote consumer using "cookies" which essentially define which export page table entry holds the translation, the offset into that page, the page size of the translation, and the size of the area being described. Essentially these cookies are DMA descriptors.

Page 10: Ldom Network Toi

Sun Confidential: Need to Know Only 10

Logical Domains : What's LDC ?• So the safest thing to do, and what every existing use of LDC channels does, it use

the hypervisor copy operation to access imported memory. In this case you only need to handle error return values from the LDC hypervisor call, rather than complicated faults all over the place, when revoked memory is accessed.

• On top of the LDC protocol sits the VIO layer, which has it's own handshake mechanism. It handles versioning and sequence number initialization just like the LDC handshake does, but it also handles the transfer of device specific attributes such as exported disk size, network device MTU, etc.

• The VIO handshake also handles the registry of descriptor rings. These rings are how VIO devices setup I/O operations. The ring entries are composed of a generic VIO tag (containing a entry state value, and an ACK field which says if the receiver should ACK the ring entry after it is processed or defer the ACK until it's current run over the ring is complete). After the tag is the device type specific area where virtual disk devices describe the block I/O and virtual network devices can describe the size of the packet etc. Finally, there is an array of cookie entries to describe the I/O buffer.

Page 11: Ldom Network Toi

Sun Confidential: Need to Know Only 11

LDCs : An Overview

• A point-to-point duplex link for:> domain-to-domain> hypervisor-domain> and HV/domain-SP communication.

• Two methods for transferring data:> A simple 64-byte datagram > Shared memory DMA > Link layer protocol provides drivers with the ability to

choose either mechanism for data transfer

• Multi-protocol sun4v transport layer

Page 12: Ldom Network Toi

Sun Confidential: Need to Know Only 12

LDCs : An Overview

• LDom Manager (ldmd ) specifies LDCs as a node in the Hypervisor Machine Descriptor (MD)

• The virtual devices in the LDOM VIO infrastructure sees two classes of nodes : > Devices which do not use LDCs : virtual-devices> Devices which use LDCs : channel-devices

• Virtual-devices nodes under channel-devices can point to channel-endpoint node(s) which represents an instance of a channel endpoint available to this guest domain

Page 13: Ldom Network Toi

Sun Confidential: Need to Know Only 13

LDC : Packet based Comms • A simple packet based transfer mechanism where data is sent in 64-byte packets

• Each logical domain LDC transport will register Tx and Rx messages queues with the Hypervisor on behalf of the virtual device client(s), along with a target virtual CPU for each LDC endpoint. Each entry on the queue holds 64 bytes of data

• Transfer is initiated when data is copied into the Tx queue & invoking the Hypervisor API to set the tail to the Tx queue. The Hypervisor triggers a dev_mondo interrupt to the target vCPU for the Rx queue. Data is then transferred from the Tx to the Rx queues at the request of the receiver when it reads the head and tail pointers

• Link layer protocol is responsible for fragmenting (& reassembling) messages being transferred. It inserts additional header information to each packet to denote the start/end of the fragmented data transfer

• The packet based transfer approach is recommended for use for short messages

Page 14: Ldom Network Toi

Sun Confidential: Need to Know Only 14

LDC : Shared Memory framework• The LDC Shared Memory framework allows one logical domain to export a number

of its own memory pages across a LDC for access & use by the logical domain at the other end of the channel

• This approach allows clients to export regions of their memory address space with clients ( at the other end of the LDC connection ). This communications method will allow the importing client to access the remote memory region by either mapping it into its address space, use Hypervisor API to copy data to/from the exported memory or program an IOMMU to directly read/write the memory region

• Logical domains uses export and import map tables ( that are allocated & defined within its own memory ) that are registered with the hypervisor. A map table entry consists of two 64-bit words which corresponds to the location of the shared memory pages

Page 15: Ldom Network Toi

Sun Confidential: Need to Know Only 15

LDOM Networking & LDCs• Virtual Network support within LDOM is realized by two components :

> virtual network device (vnet) which emulates a Ethernet device, that communicates with other vnet devices or the virtual switch device (vsw) over a point-to-point connection

> All comunications between the virtual network components occur via LDC connections. The job of establishing & tearing down the channel connections is facilitated by the virtual channel nexus (cnex)

> Vnet device driver uses a multi-protocol transport infrastructure so that it can use different types of transport to send/receive data

> GLDv3 framework ( see OpenSolaris' Project Nemo ) compliant driver implementation – i.e., plumbs & configures into IP stack as a regular driver, supports standard ethernet / jumbo MTUs, permits creation of logical interfaces on top of it, support snoop etc. The only material difference lies with the fact that changing MAC addr via the ifconfig cmd is not supported ( by default, LDom manager assigns a unique MAC addr to each vnet interface )

Page 16: Ldom Network Toi

Sun Confidential: Need to Know Only 16

LDOM Networking & LDCs• Virtual Network support within LDOM is realized by two components :

> virtual switch device (vsw) functions as a mux/demux for ingress/egress packets for all network traffic ( to and from host systems outside the system )

> May be bound to one physical network interface and/or network group> Virtual switch device comprises of two components : vnet proxy server ( which

functions as a switch that interacts with all vnet devices, LDOM mgr & the packet multiplexer on top of it )& packet multiplexer ( functions by sending packets it receives from vnet devices to its destination via the physical network i/f & distribute packets it receives from the physical i/f to the appropriate vnet devices )

> Virtual switch device can be plumbed as a network device with IP routing enabled to allow the vsw device to exchange packets with the outside world, on behalf of its network clients : i.e., # dladm show-link | grep -i vsw

vsw0 type: non-vlan mtu: 1500 device: vsw0

# ifconfig vsw0 <IP-addr> netmask <netmask-addr> broadcast + up

Page 17: Ldom Network Toi

Sun Confidential: Need to Know Only 17

LDOM Networking & LDCs

Hypervisor

Dumb NIC

Logical Domain 0 – Service Domain Logical Domain A

Solaris TCP/IPand Nemo

LdomsManager

Nemo MACdriver

(bge / e1000g)

Vnet (leaf) device

Genericlayer

RX/TXCtrl

Vnetproxyclient

Control path

Data path

vSwitch

vnet proxyserver

packetmultiplexer

Error path

Solaris TCP/IPand Nemo

Page 18: Ldom Network Toi

Sun Confidential: Need to Know Only 18

LDOM Networking & LDCs• For the network device there is a single TX descriptor ring created at each end,

these are populated locally with transmit packets for receive at the other end. They are imported into the peer using the hypervisor export mechanism.

• Descriptor ring entries at the importer side are accessed with the aforementioned LDC copy mechanism.

• I/O is triggered using DRING_DATA packets over the LDC channel, which tell the receiver which entries in the descriptor ring to process. Writes into the local peers descriptor entry just use local cpu loads and stores, ordering is important.

• The DRING_DATA packets give a start and end descriptor index for the peer to process. The end index can be specified as "-1" which means to just keep processing until you see a descriptor which is not in READY state.

• Thus is it important for the sending peer to update the state field as the last possible operation, with a memory barrier, such that the receiver does not accidently see a half-initialized descriptor in READY state.

Page 19: Ldom Network Toi

Questions ?

●Lui, Hoe Keong–APAC TSC

Logical Domains Networking : An Introduction to Logical Domain Channels