lenovo and midokura openstack poc - cloud object … · lenovo and midokura openstack poc...

23
Front cover Lenovo and Midokura OpenStack PoC Software-Defined Everything Describes a validated, scale-out Proof of Concept (PoC) implementation of OpenStack Provides business and technical reasons for Software Defined Environments Explains advantages of Lenovo Hardware and MidoNet’s OpenStack Neutron plugin Describes the configurations for building an agile cloud with a distributed architecture Krzysztof (Chris) Janiszewski Michael Lea Cynthia Thomas Susan Wu

Upload: duongnhan

Post on 03-May-2018

222 views

Category:

Documents


1 download

TRANSCRIPT

Front cover

Lenovo and Midokura OpenStack PoCSoftware-Defined Everything

Describes a validated, scale-out Proof of Concept (PoC) implementation of OpenStack

Provides business and technical reasons for Software Defined Environments

Explains advantages of Lenovo Hardware and MidoNet’s OpenStack Neutron plugin

Describes the configurations for building an agile cloud with a distributed architecture

Krzysztof (Chris) Janiszewski

Michael Lea

Cynthia Thomas

Susan Wu

2 Lenovo and Midokura OpenStack PoC: Software Defined Everything

Abstract

This document outlines a Software Defined Everything infrastructure that virtualizes compute, network, and storage resources and delivers it as a service. Rather than by the hardware components of the infrastructure, the management and control of the compute, network, and storage infrastructure are automated by intelligent software that is running on the Lenovo x86 platform.

This Proof of Concept (PoC) focuses on achieving high availability and scale-out capabilities for an OpenStack cloud that uses Lenovo hardware and Midokura's OpenStack Neutron plugin while helping reduce operating expense and capital expense. Midokura provides an enterprise version of the open source MidoNet software. MidoNet achieves network virtualization through GRE or VXLAN overlays with a completely distributed architecture.

This document shows how to integrate OpenStack, MidoNet, and Ceph, while using Lenovo ThinkServer systems with Lenovo Networking switches managed by xCAT.

This paper is intended for customers and partners looking for a reference implementation of OpenStack.

At Lenovo Press, we bring together experts to produce technical publications around topics of importance to you, providing information and best practices for using Lenovo products and solutions to solve IT challenges.

For more information about our most recent publications, see this website:

http://lenovopress.com

Contents

Executive summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3Business objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4Lenovo server environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Cloud installation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5Network switch solution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6Virtual machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Lenovo physical networking design (leaf/spine). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7OpenStack environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9Automating OpenStack deployment and hardware management . . . . . . . . . . . . . . . . . . . . 10Software-Defined Storage: Ceph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12Software-Defined Networking: MidoNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

Configurations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Operational tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Professional services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20About the authors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Executive summary

Business are being challenged to react faster to growing customer technology demands by creating Infrastructure-as-a-Service (Iaas). To achieve the stated business goals, companies are using OpenStack to integrate so-called “Software-Defined Everything”. With the use of Software-Defined Everything, data centers can easily scale to swiftly deploy and grow to meet user demands.

To achieve stated business goals, companies are using OpenStack to integrate so-called “Software Defined Everything” resources and environments. OpenStack is open source software that enables deployment and management of cloud infrastructure. The OpenStack project is not a single piece of software but an umbrella that covers multiple software projects to manage processing, storage, and networking resources. The key benefit of OpenStack is the orchestration of the data center to allow for “software defined everything” to reduce cost and complexity, increase speed of application deployment, and help with security assurance.

One of the key challenges around OpenStack deployments is networking. Because OpenStack is a highly virtualized environment, it requires a virtualized approach to provide agility and seamless end-to-end connectivity. To solve the networking issues within OpenStack, the presented solution uses Midokura Enterprise Midonet (MEM). MEM offers a distributed architecture that is built to scale.

By using a network virtualization overlay approach, MidoNet provides Layer 2-4 network services, including routing, load-balancing, and NAT at the local hypervisor. MidoNet also ties into OpenStack provisioning tools to provide seamless networking integrations.

Although OpenStack is a software-based solution, it still requires physical hardware to operate while providing compute, storage, and networking infrastructure. In fact, the use of proper hardware is critical to achieving a successful OpenStack deployment. Proper selection of hardware helps ensure that reliability and performance metrics are met and reduces capital expense (CapEx) and operating expense (OpEx) around the solution.

The Lenovo® server, storage, and networking offerings have many clear advantages when it comes to the following key areas of physical infrastructure:

� Servers

Lenovo offers high performance Intel based servers to power the virtual machines.

� Storage

Lenovo can mix a solid-state drive’s (SSD) spinning disks with Lenovo’s AnyBay technology that offers support for mixing 2.5-inch and 3.5-inch drives so that customers can achieve the right mix between cost and performance.

� Networking

Lenovo offers high-performance, low-cost 10 Gb Ethernet and 40 Gb Ethernet networking solutions to provide connectivity for storage and servers.

© Copyright Lenovo 2015. All rights reserved. 3

Business objectives

The idea of an ephemeral virtual machine (VM) is gaining traction in the enterprise for application provisioning and decommissioning. Unlike stand-alone offerings that are provided by public cloud providers, this type of compute service can be all-inclusive: compute, network, and storage services are abstracted into pooled services and users are presented with a la carte choices. Moving to this model provides organizations with an elegantly metered, monitored, and managed style of computing while offering complete isolation and automated application-level load balancing.

Lenovo and Midokura helped an organization implement this model. The old workflow required three teams working simultaneously and processes being ping-ponged across the three teams. The post-OpenStack workflow provides cleaner hand-offs and removes the redundant tasks and rework. The streamlined workflow that was provided by implementing OpenStack was the key to providing the operational efficiency and agility this organization is looking for.

Figure 1 shows the old workflow the organization that was used and the new workflow that uses OpenStack.

Figure 1 Comparing the workflow

The Lenovo-Midokura design offers the following technical and business benefits:

� Rapid deployments and scale up of new applications

� Reduced management cost and reduced complexity

� Enterprise class machines that are ideal for cloud-based environments

� Management tools that can support the management of thousands of physical servers

� Ability to scale to thousands of VMs per cloud administrator

� Reduced cost per VM

� Advanced and agile networking that uses Network Virtualization Overlays

� Tenant isolation over shared infrastructure

Team A Team B Team C

Current Workflow

Request Development VM (s)

Request Development VM (s) Development

Development Work plan and cookbook creation

Validate / Review Health check URLs

Validate / Review Health check URLs

Build Dev VM(s) Deploy OS Build StagingVM(s)

Deploy OS Build SandboxVM(s)

DeployApp to Sandbox

Build and Deploy to Prod.

Troubleshoot / Validate ACLs

Troubleshoot / Validate ACLs

App Deployment

Complete

Engage Team C if

issues arise

Engage Team C if

issues arise

Engage Team C if

issues arise

Engage Team C if

issues arise

Engage Team C if

issues arise

Post On-Premise OpenStack Cloud Workflow

Legend |

Work Plan andCookbook Creation

Deploy App to Sandbox

Validate ReviewHealth check URLs

Build and Deployto Production

Validate / Review Health check URLs

App deployment complete

Approval

4 Lenovo and Midokura OpenStack PoC: Software-Defined Everything

� Reduced Networking Hardware costs with the use of Lenovo high-performance Ethernet switches

� Simplified underlying Network infrastructure that uses open standards L3 routing protocols

� Improved IT productivity with reduced time to deploy resources

Lenovo server environment

A Proof-of-Concept Software-Defined Environment was created to measure and validate the capabilities of a highly available and highly scalable OpenStack deployment with Software-Defined Networking and Software-Defined Storage. The hardware management was accomplished through open source xCAT and Confluent Software.

The software and hardware configuration that was used in this paper is described next.

Cloud installation

The Cloud installation included the following components:

� Operating system: Red Hat Enterprise Linux 7.1� OpenStack: Red Hat Enterprise Linux OpenStack Platform 6.0 (Juno)� SDN: Midokura Enterprise MidoNet 1.8.5� SDS: Ceph (Giant) – 0.87.1� Hardware Management: xCAT 2.9.1 with Confluent 1.0

Hardware

The following hardware was used:

� Four Lenovo ThinkServer® RD550 controller nodes:

– CPU: 2x Intel Xeon E5-2620 v3

– Memory: 4x 16 GB 2Rx4 PC4-17000R (64 GB)

– Media:

• 4x 4 TB HDD, 7200 RPM (RAID-10 Virtualization)• 2x 32 GB SD Cards (OS)

– RAID: ThinkServer RAID 720IX Adapter with 2 GB supercapacitor upgrade

– Network:

• 2x Emulex CNA OCe14102-UX 10 Gb Dual port (four ports total) (data)• Mezzanine Quad RJ45 1 Gb Port (management)

� Eight Lenovo ThinkServer RD650 Ceph OSD nodes:

– CPU: Intel Xeon E5-2620 v3

– Memory: 4x 16 GB 2Rx4 PC4-17000R (64 GB)

– Media: 2x 200 GB 12 Gb SAS SSD 3.5-inch (Journal)

– 8x 6 TB HDD, 7200 RPM, 3.5-inch, 6 Gb SAS, hot swap (OSD)

– 2x 32GB SD Cards (OS)

– RAID: ThinkServer RAID 720IX Adapter with 2 GB supercapacitor upgrade

5

– Network:

• Emulex CNA OCe14102-UX 10 Gb Dual port (data)• Mezzanine Quad RJ45 1 Gb Port (management)

� 16 Lenovo ThinkServer RD550 compute nodes:

– CPU: 2x Intel Xeon E5-2650 v3

– Memory: 24x 16 GB 2Rx4 PC4-17000R (384 GB)

– Media: 2x 32 GB SD Cards (OS)

– Network:

• 2x Emulex CNA OCe14102-UX 10 Gb Dual port (data)• Mezzanine Quad RJ45 1 Gb Port (management)

Network switch solution

The following network solution was used:

� 10 GbE: 4x Lenovo RackSwitch™ G8264� 1 GbE: 1x Lenovo RackSwitch G8052

One of the goals for this environment was to separate management services for better manageability and easy migration to alternative hosts. By using this configuration, the environment was highly available and a potential disaster recovery process can be handled in much more efficient fashion.

Also, capacity utilization metering is easier to accomplish. To achieve sufficient isolation, management services were contained in a VM that was running under a KVM hypervisor and managed by xCAT. These management VMs were customized with the minimum resource overhead. The selected software platform was RHEL 7.1 with latest OpenStack Juno and Ceph Giant enhancements. All the redundant components ran in active/active mode.

Virtual machines

The following VMs were used:

� Four OpenStack Controllers� Four OpenStack Databases (MariaDB with Galera) (3x active / 1x passive)� Four Network State Databases (MidoNet) (3x active / 1x passive)� Four HA Proxies� Three Ceph monitor nodes� xCAT hardware and service VM manager

6 Lenovo and Midokura OpenStack PoC: Software-Defined Everything

Figure 2 shows the Proof of Concept environment.

Figure 2 OpenStack PoC environment

Lenovo physical networking design (leaf/spine)

OpenStack deployments depend on a solid physical network infrastructure that can provide consistent low-latency switching and delivery of data and storage traffic. To meet this need, a leaf/spine (Clos Network) design was used. By using a leaf/spine design, the infrastructure can provide massive scale to support over 15,872 servers.

Lenovo 10 GbE and 40 GbE switches were selected for the design because they provide a reliable, scalable, cost-effective, easy-to-configure, and flexible solution. When a leaf/spine design is used, there is no need for expensive proprietary switching infrastructure because the switches need to provide only layer 2 and layer 3 network services.

Physical Node

Switch 10Gb1 Switch 10Gb2

MidoNet GW4

NSD4

CEPHMon3

HAProxy4

MariaDB4

OpenStkCont4

MidoNet GW3

NSD3

CEPHMon2

HAProxy3

MariaDB3

OpenStkCont3

MidoNet GW2

NSD2

CEPHMon1

HAProxy2

MariaDB2

OpenStkCont2

MidoNet GW1

xCAT

HAProxy1

MariaDB1

OpenStkCont1

Switch 1Gb - mgmt

Virtual Machine

RD650s - Storage

NSD1

CEPH1 CEPH2 CEPH3 CEPH4 CEPH5 CEPH6 CEPH7 CEPH8

RD550 - Compute

CPT1 CPT2 CPT3 CPT4 CPT5 CPT6 CPT7 CPT8

CPT9 CPT10 CPT11 CPT12 CPT13 CPT14 CPT15 CPT16

RD550s - Controllers

1GbE mgmt 20GbE data 10GbE bgp

VLAG

7

There is also no need for a large chassis switch because the Clos network can scale out to thousands of servers that use fixed-form, one or two rack unit switches.

In this design, all servers connect to the leaf nodes and the spine nodes provide interconnects between all of the leaf nodes. Such a design is fully redundant and can survive the loss of multiple Spine nodes. To facilitate connectivity across the fabric, a Layer 3 routing protocol was used that offered the benefit of load balancing traffic, redundancy, and increased bandwidth within the OpenStack environment. For the routing protocol, Open Shortest Path First (OSPF) was selected because it is an open standard and supported by most switching equipment.

The use of Virtual Link Aggregation Groups (vLAG), which is a Lenovo Switch feature that allows for multi-chassis link aggregation, facilitates active-active uplinks of access switches for server connections. Servers are connected to the vLAG switch pair with the use of Link Aggregation Control Protocol (LACP). The use of vLAG allows for increased bandwidth to each server and more network resiliency.

Because MidoNet provides a network overlay that uses VXLAN, there is no need for large Layer 2 networks. Removing large Layer 2 networks removes the need for large core switches and the inherent issues of large broadcast domains on physical networks. Also, compute nodes need only IP connectivity between each other.

Midonet handles tenant isolation by using VXLAN headers. To further enhance network performance, the design uses Emulex network adapters with hardware VXLAN-offload capabilities.

The Lenovo-Midokura solution is cost effective and provides a high-speed interconnect solution and can be modified depending on the customer’s bandwidth requirements. The fabric that connects the leaf and spine can be modified by using 10 GbE or 40 GbE, which results in cost savings. It is also possible to use only two spine nodes, but the four post design increases reliability and provides more bandwidth.

8 Lenovo and Midokura OpenStack PoC: Software-Defined Everything

Figure 3 shows the Leaf/Spine vLAG network topology.

Figure 3 OpenStack network topology that uses Lenovo Leaf/Spine vLAG

OpenStack environment

Red Hat Enterprise Linux OpenStack Platform 6 was selected for this project because of enterprise class support that the vendor provides to meet customer demands. However, instead of the use of Red Hat OpenStack installation mechanisms (Foreman), the solution was implemented by using a manual process for better customization while automating it by using the xCAT tool. In doing so, the solution benefits from all the premium features of Red Hat OpenStack solutions, but has more control over each component that is installed and handled by the system.

To prove the scalability factor of OpenStack, four redundant and active VMs were created to handle the following OpenStack Management Services:

� Keystone� Glance� Cinder� Nova� Horizon� Neutron with the MidoNet plugin� Heat� Ceilometer

For the database engine and message broker, four instances of MariaDB with Galera were clustered and RabbitMQ was selected to meet scalability and redundancy needs.

CEPH Storage

Compute Controller

Spine

Leaf

Lenovo RD650 ServersSingle CPU 2620 – 6 Core48 TB of RAW Storage64GB RAM

Lenovo RD550 ServersDual CPU 2650 – 20 Core2.4 TB of RAW Storage384GB RAM

Lenovo RD650 ServersDual CPU 2650 – 20 Core16 TB of RAW Storage64GB RAM

Storage

L3 ECMP OSPF

VXLAN10 Gb40 Gb

Emulex VXLAN Offload

VXLAN Providing Connection to Multiple Tenants

Lenovo G8332 L2 / L3 Switches32 Ports of 40G or 128 Ports of 10G

vLAG vLAG vLAG

VXLAN Gateway

Lenovo G8264 L2 / L3 Switches48 Ports 10G4 Ports 40G

Emulex VXLAN Offload

Emulex VXLAN Offload

Lenovo RD550 Dual CPU 2650 – 20 Cores, 16 TB of RAW Storage, 64GB RAM

LACP Port Channel

LACP Port ChannelLACP Port

Channel

Redundant Routed Interfaces Providing

BGP Peering between the Leaf and midonet

9

Each Service VM was placed on separate hardware with ability to migrate over to another KVM host if there was a hardware failure. The load balancing between management services was accomplished with help of four redundant HAproxy VMs with the Keepalive Virtual IP implemented to create single point of entry for the user.

For the MidoNet Network State Database (NSD) redundancy and to maintain consistency with the rest of the environment, four instances of the Apache ZooKeeper/Cassandra databases were created. For reference, it is recommended to use an odd number of ZooKeeper/Cassandra nodes in the environment for quorum.

To avoid a split brain issue, the database systems were placed in an odd number of active nodes and remaining node in a passive state.

Memcached daemon was used to address the known nova-consoleauth service scale limitation and handling tokens with multiple users attempting to access VNC services with Horizon.

Extensive tests of multiple failing components were performed; entire nodes and even the entire cluster was brought down to verify the HA, which confirmed that disaster recovery can be accomplished in relatively quick fashion.

This OpenStack solution was built fully redundant with no single point of failure and ready to manage large amounts of compute resources. The manual installation approach with xCAT automation allows for rapid and nondisruptive scaling deployment in the environment.

The separation of the management services in the VMs provides the ability to better monitor and capacity-plan the infrastructure and easily move resources on demand to dedicated hardware.

This OpenStack environment meets all the demands of production-grade, highly scalable, swiftly deployable private clouds.

Automating OpenStack deployment and hardware management

To better manage the cloud infrastructure from a hardware and software perspective, the open source project - xCAT1 was used with the addition of Confluent.

xCAT offers complete management for HPC clusters, render farms, Grids, web farms, online gaming infrastructure, clouds, data centers, and complex infrastructure setups. It is agile, extensible, and based on years of system administration best practices and experience. It is a perfect fit for custom OpenStack deployments, including this reference architecture. It also allows for bare-metal deployment and handles post-operating system installation, automation, and hardware and VM monitoring mechanisms.

xCAT manages infrastructure by setting up and interacting with IPMI/BMC component on the hardware level. It also uses Serial-over-LAN for each machine to access consoles without the need of a functional network layer.

1 For more information, see this website:http://sourceforge.net/p/xcat/wiki/Main_Page

10 Lenovo and Midokura OpenStack PoC: Software-Defined Everything

For the Service VM layer of the solution, xCAT connects to the virsh interface and SOL, so that managing VM infrastructure is as easy as managing hardware. Moreover, xCAT can read sensor information and gather inventories directly from the hardware, which allows identifying hardware failures quickly and easily.

The ability to push firmware updates in an automated fashion by using a built-in update tool helps maintain hardware features and fixes. These features make the management of hardware and software much simpler by creating a single stop shop approach for any management tasks.

Ultimately, xCAT was set up to manage the following tasks:

� Customize and deploy operating system images to all required type of nodes (Ceph, Service VM Controller, Compute, OpenStack, MariaDB, Cassandra/ZooKeeper, and HAProxy)

� Customize and deploy postinstallation scripts that define the software infrastructure

� Identify hardware issues with hardware monitoring

� Identify software issues with a parallel shell mechanism

� Update firmware for all hardware components

� Provide Serial-Over-LAN connectivity for bare-metal operating system and VM operating system

� Automate expansion of the cloud or node-replacement

� Provide DHCP/DNS/NAT/FTP/HTTP services to the infrastructure nodes.

� Provide local package repository (rpm) for all required software, including RHEL 7.1, Ceph, MidoNet, and Epel

� Provide a simple Web User Interface (Confluent) for quick overview of hardware health and SOL console access

11

Software-Defined Storage: Ceph

Cloud and enterprise organizations’ data needs grow exponentially and the classic enterprise storage solutions do not suffice to meet the demand in a cost-effective manner. Moreover, the refresh cycle of the legacy storage hardware lags behind x86 commodity hardware. The viable answer to this problem is the emerging Software-Defined Storage (SDS) approach. One of the leading SDS solutions, Ceph provides scale-out software that runs on commodity hardware with the latest performance hardware and the ability to handle exabytes of storage. Ceph is highly reliable, self-healing, easy to manage, and open source.

The POC environment uses eight dedicated Ceph nodes with over 300 TB of raw storage. Each storage node is populated with 8x 6TB HDD for OSD and 2x 200 GB supporting SSDs for journaling. Ceph can be configured with spindle drives only; however, because of journaling devices performing random reads and writes, it is recommended to use SSDs to decrease access time and read latency while accelerating throughput. Performance tests on configurations with SSDs enabled and disabled showed an increase of IOPs by more than 50% with the SSD enabled for journaling.

To save disk cycles from operating system activities, RHEL7.1 was loaded to dual on-board SD cards (Lenovo ThinkSErver option) to all nodes, including Ceph OSD nodes. Dual-card, USB3-based reader with class 10 SD cards allowed for enough local storage and speed to load the operating system and all necessary components without sacrificing the performance. Software RAID level 1 (mirroring) was used for local redundancy.

Ceph storage availability to compute hosts depends on the Ethernet network; therefore, ensuring maximum throughput and minimum latency must be established on the internal, underlying network infrastructure. For best results, dual 10GbE Emulex links were aggregated by using the OVS LACP protocol with balance-tcp hashing algorithm. Lenovo’s vLAG functionality on the top-of-rack (TOR) switches allows for full 20Gb connectivity between the Ceph nodes for storage rebalancing.

All the compute nodes and OpenStack Controller nodes used Linux bond mode 4 (LACP). The aggregated links were VLAN-trunked for client and the cluster network access. Quick performance tests showed Ceph’s ability to use aggregated links to the full extent, especially with read operations to multiple hosts.

12 Lenovo and Midokura OpenStack PoC: Software-Defined Everything

Figure 4 shows optimal Journal-to-OSD disk ratio for Ceph deployment.

Figure 4 Ceph reference deployment

To protect data from any potential hardware failures, the replication ratio of 3:1 was configured. A significant number of placement groups were created to safely spread the load between all the nodes in different failure domains.

CEPH 1

OSD - HDD

Journal - SSD

Journal - SSD

OSD - HDD

OSD - HDD

OSD - HDD

OSD - HDD

OSD - HDD

OSD - HDD

OSD - HDD

OSD - HDD

OSD - HDD

CEPH 2

OSD - HDD

Journal - SSD

Journal - SSD

OSD - HDD

OSD - HDD

OSD - HDD

OSD - HDD

OSD - HDD

OSD - HDD

OSD - HDD

OSD - HDD

OSD - HDD

CEPH 3

OSD - HDD

Journal - SSD

Journal - SSD

OSD - HDD

OSD - HDD

OSD - HDD

OSD - HDD

OSD - HDD

OSD - HDD

OSD - HDD

OSD - HDD

OSD - HDD

CEPH 4

OSD - HDD

Journal - SSD

Journal - SSD

OSD - HDD

OSD - HDD

OSD - HDD

OSD - HDD

OSD - HDD

OSD - HDD

OSD - HDD

OSD - HDD

OSD - HDD

CEPH 5

OSD - HDD

Journal - SSD

Journal - SSD

OSD - HDD

OSD - HDD

OSD - HDD

OSD - HDD

OSD - HDD

OSD - HDD

OSD - HDD

OSD - HDD

OSD - HDD

CEPH 6

OSD - HDD

Journal - SSD

Journal - SSD

OSD - HDD

OSD - HDD

OSD - HDD

OSD - HDD

OSD - HDD

OSD - HDD

OSD - HDD

OSD - HDD

OSD - HDD

CEPH7

OSD - HDD

Journal - SSD

Journal - SSD

OSD - HDD

OSD - HDD

OSD - HDD

OSD - HDD

OSD - HDD

OSD - HDD

OSD - HDD

OSD - HDD

OSD - HDD

CEPH 8

OSD - HDD

Journal - SSD

Journal - SSD

OSD - HDD

OSD - HDD

OSD - HDD

OSD - HDD

OSD - HDD

OSD - HDD

OSD - HDD

OSD - HDD

OSD - HDD

13

Ceph global configuration is shown in Figure 5

Figure 5 Ceph reference deployment

Ceph storage was used in multiple OpenStack Services, including Glance for storing images, Cinder for Block Storage usage, and Volume creation and Nova for creating VMs that are directly on the Ceph volumes.

Software-Defined Networking: MidoNet

MidoNet is an open source software solution that enables agile cloud networking via Network Virtualization Overlays (NVO). As a software play, MidoNet enables the DevOps and CI movement by providing network agility through its distributed architecture. When paired with OpenStack as a Neutron plugin, MidoNet allows tenants to create logical topologies via virtual routers, networks, security groups, NAT, and load balancing, all of which are created dynamically and implemented with tenant isolation over shared infrastructure.

MidoNet provides the following networking functions:

� Fully distributed architecture with no single points of failure

� Virtual L2 distributed isolation and switching with none of the limitations of conventional VLANs

� Virtual L3 distributed routing

� Distributed Load Balancing and Firewall services

� Stateful and stateless NAT

� Access Control Lists (ACLs)

� RESTful API

� Full Tenant isolation

� Monitoring of networking services

� VXLAN and GRE support: Tunnel zones and Gateways

� Zero-delay NAT connection tracking

[global]fsid = cce8c4ea-2efd-408f-845e-87707d26b99amon_initial_members = cephmon1, cephmon2, cephmon3mon_host = 192.168.0.20,192.168.0.21,192.168.0.22auth_cluster_required = cephxauth_service_required = cephxauth_client_required = cephxfilestore_xattr_use_omap = true osd_pool_default_size = 3osd_pool_default_pg_num = 4096osd_pool_default_pgp_num = 4096public_network = 192.168.0.0/24cluster_network = 192.168.1.0/24 [client]rbd cache = true

14 Lenovo and Midokura OpenStack PoC: Software-Defined Everything

MidoNet features a Neutron plugin for OpenStack. MidoNet agents run at the edge of the network on compute and gateway hosts. These datapath hosts (where the MidoNet agents are installed), require only IP connectivity between them and must permit VXLAN or GRE tunnels to pass VM data traffic (maximum transmission unit [MTU] considerations).

Configuration management is provided via a RESTful API server. The API server can typically be co-located with the neutron-server on OpenStack controllers. The API is stateless and can be accessed via the MidoNet CLI client or the MidoNet Manager GUI.

Logical topologies and virtual networks devices that are created via the API are stored in the Network State Database (NSDB). The NSDB consists of ZooKeeper and Cassandra for logical topology storage. These services can be co-located and deployed in quorum for resiliency.

For more information about the MidoNet Network Models, see the “Overview” blogs that are available at this website:

http://blog.midonet.org

Figure 6 shows the MidoNet Reference Architecture.

Figure 6 MidoNet Reference Architecture

MidoNet achieves L2 - L4 network services in a single virtual hop at the edge, as traffic enters the OpenStack cloud via the gateway nodes or VMs on compute hosts. There is no reliance on a particular service appliance nor service node for a particular network function, which removes bottlenecks in the network and allows the ability to scale. This architecture is a great advantage for production-ready clouds over alternative solutions.

Compute Node 2Compute Node 1Compute Node 1Compute Node X

Management Network 192.168.0.0/24

Ups

trea

m C

onne

ctiv

ity

10.0

.Y.0

/30

Ups

trea

m C

onne

ctiv

ity

10.0

.1.0

/30

Datapath Network 172.16.0.0/24

NSDB 1ZK/Cass

NSDB 2ZK/Cass

NSDB 3ZK/Cass

OpenStack Controller Services Horizon & MidoNet Manager

Neutron & API servers

External Networks & Upstream BGP Routers

MidoNet Gateway 1 MidoNet Gateway Y

15

Table 1 shows a comparison between the MidoNet and Open vSwitch Neutron plugin.

Table 1 MidoNet and OVS Neutron plugin comparison

Features MidoNet OVS

Open Source Yes Yes

Hypervisors Supported KVM, ESXi, Xen, Hyper-V (Planned) KVM, Xen

Containers Docker Docker

Orchestration Tools OpenStack, oVirt, RHEV, Docker, Custom, vSphere, Mesos (Planned)

OpenStack, oVirt, openQRM, openNebula

L2 BUM traffic Yes Default: Send bcast to every host even if they do not use the corresponding network. Send to partial-mesh over unicast tunnels requires enabling extra l2population mechanism driver.

Distributed Layer 3 Gateway Scales to 100s, no limitations when enabled

Default deployment: Intrinsic architectural issue with SPOF Neutron Network Node for routing and higher layer network services. Does not scale well. Early stage DVR requires installing an extra agent (L3-agent) on compute hosts and still relies on network node for non-distributed SNAT. Currently, DVR cannot be combined with L3HA/VRRP.

SNAT Yes Not distributed, requires iptables(poor scale)

VLAN Gateway Yes Yes

VXLAN Gateway Yes L3 HA: Requires keepalived, which uses VRRP internally (active-standby implications);DVR: Requires external connectivity on each host (security implications)

HW VTEP L2 Gateway Yes Yes

Distributed Layer 4 Load Balancer

Yes Relies on another driver (HAProxy)

Supports spanning multiple environments

Yes No

GUI-based configuration Yes No

GUI-based monitoring Yes No

GUI-based flow tracing Yes No

Pricing OSS: Free.MEM: $1899 USD per host (any number of sockets), including 24x7 support standard.

Free: No Support Option.

16 Lenovo and Midokura OpenStack PoC: Software-Defined Everything

In the proof of concept lab, well-capable servers were used and thus some MidoNet components were co-located. On the four OpenStack Controller nodes, the MidoNet Agents were installed on bare-metal operating systems to provide Gateway node functionality by terminating VXLAN tunnels from the OpenStack environment for external access via the Border Gateway Protocol (BGP) and Equal-Cost Multi-Path routing (ECMP).

Next, a Service VM on each of the four Controller nodes was created for the Network State Database (consisting of ZooKeeper and Cassandra). These projects require deployment in quorum (3, 5, 7,…) to sustain themselves if there are N failures. In this POC, the failure acceptance is equivalent to that achieved by three ZooKeeper/Cassandra nodes in a cluster.

Each of the OpenStack Controller Service VMs was created to serve the main OpenStack controller functions. Within these Service VMs, the MidoNet API (stateless) server and MidoNet Manager files (web files to serve up the client-side application) were installed. MidoNet Manager is part of the Midokura Enterprise MidoNet (MEM) subscription (bundled with support) and provides a GUI for configuring and maintaining virtual networks in an OpenStack + MidoNet environment.

Other network-related packages that were installed on the OpenStack Controller Service VMs include the neutron-server and metadata-agent. Because of the metadata-agent proxy’s dependency on DHCP namespaces, the dhcp-agent was also installed on the OpenStack Controller Service VM despite MidoNet's distributed DHCP service. These specific services were load-balanced by using HAProxy.

Finally, the MidoNet agent is installed on the Compute nodes for providing VMs with virtual networking services. Because the MidoNet agent uses local compute power to make all L2- L4 networking decisions, MidoNet provides the ability to scale. As the number of Compute nodes grow, so does the networking compute power.

Configurations

The Gateway nodes provide external connectivity for the OpenStack + MidoNet cloud. BGP was implemented between the Gateway nodes and the Lenovo Top of Rack switches for its dynamic routing capabilities.

To exhibit fast failover, the BGP timers were shortened. These settings can easily be adjusted based on the needs of the users. In this lab, the parameters that are shown in Figure 7 were modified in the /etc/midolman/midolman.conf file.

Figure 7 BGP parameters in midolman.conf on MidoNet Gateway Nodes

These parameters provide a maximum of 15 seconds for failover if the BGP peering session goes down on a Gateway node.

# bgpdbgp_connect_retry=10bgp_holdtime=15bgp_keepalive=5

17

The gateways must have Large Receive Offload (LRO) turned off to ensure MidoNet delivers packets that are not larger than the MTU of the destination VM. For example, the command that is shown in Figure 8 turns off LRO for an uplink interface of a gateway.

Figure 8 Disabling LRO on MidoNet Gateway Nodes

Also, to share state, port groups were created for gateway uplinks. Stateful port-groups allow the state of a connection to be shared such that gateways can track connections with asymmetric traffic flows. Figure 9 shows the commands that are used to configure stateful port-groups.

Figure 9 Configuring stateful port-groups

The default number of client connections for ZooKeeper was changed on the NSDB nodes. This change is made in the /etc/zookeeper/zoo.cfg file by using the line that is shown in Figure 10.

Figure 10 Increasing the number of client connections for ZooKeeper instances

This configuration change allows the number of MidoNet agents that are connecting to ZooKeeper to go beyond the default limit.

Logical routers and rules and chains were also created to provide multi-VRF functionality for upstream isolation of traffic.

Operational tools

MidoNet Manager is a network management GUI that provides an interface for operating networks in an OpenStack + MidoNet cloud. It allows the configuration of BGP for gateway functionality and monitoring of all virtual devices through traffic flow graphs.

When VXLAN overlays are used with OpenStack, operating and monitoring tools become increasingly relevant when you are moving from proof-of-concept into production. Preceding monitoring and troubleshooting methods (such as RSPAN) capture packets on physical switches but give no context for a traffic flow.

MidoNet Manager presents flow tracing tools in a GUI to give OpenStack + MidoNet cloud operators the ability to identify specific tenant traffic and trace their flow through a logical topology. The flow tracing gives insight into each virtual network device that is traversed, every security group policy that is applied, and the final fate of the packet. MidoNet Manager provides insights for NetOps and DevOps for the Operations and Monitoring of OpenStack + MidoNet environments that are built for enterprise private clouds.

# ethtool -K p2p1 lro off

midonet-cli> port-group create name SPG stateful truepgroup0midonet> port-group pgroup0 add member port router0:port0port-group pgroup0 port router0:port0midonet> port-group pgroup0 add member port router0:port1port-group pgroup0 port router0:port1

maxClientCnxns=500

18 Lenovo and Midokura OpenStack PoC: Software-Defined Everything

An example of the initial stage of flow tracing in MidoNet Manager is highlighted in the red box in Figure 11.

Figure 11 MidoNet Manager flow tracing

19

Professional services

The Lenovo Enterprise Solution Services team helps clients worldwide with deployment of Lenovo System x® and ThinkServer solutions and technologies. The Enterprise Solution Services team can design and deliver the OpenStack Cloud solution that is described in this document and new designs in Software Defined Everything, big data and analytics, HPC, Virtualization, or Converged Infrastructure. Lenovo Enterprise Solution Services also provides training to a staff at site to get up to speed with performing health check services for existing environments.

We feature the following offerings:

� Cloud: Our cloud experts help design complex IaaS, PaaS, or SaaS cloud solutions with our Cloud Design Workshop. We specialize in OpenStack and VMware-based private and hybrid cloud, design, and implementation services.

� Software-Defined Storage: We provide expertise with design and implementation servers for software-defined storage environments. Our consultants can provide assistance with implementing Ceph, Quobyte, General Parallel File System (GPFS), and GPFS storage server installation and configuration of key operating system and software components or with other software-defined storage technologies.

� Virtualization: Get assistance with VMware vSphere or Linux KVM through our design implementation and health check services.

� Converged Infrastructure: Learn about Flex system virtualized, Blade server to Flex system migration assessment, VMware-based private cloud, and Flex System™ manager quickstart.

� High-Performance Computing (HPC): Our team helps you get the most out of your System x or ThinkServer with HPC intelligent cluster implementation services, health check services, and state-of-the-art cloud services for HPC.

For more information, contact Lenovo Enterprise Solution Services at: [email protected]

The Midokura team provides professional services and training to enable customers with OpenStack and MidoNet. Midokura’s expertise is in distributed systems. The Midokura team has real-world experience building distributed systems for large e-commerce sites, such as Amazon and Google.

Midokura Professional Services helps customers from architectural design, implementation into production, and MidoNet training, in only a couple of weeks. Midokura Professional Services are not only academic; the solutions are practical and come from hands-on deployments, operational experience, and direct contributions to the OpenStack Neutron project.

For more information, contact Midokura at: [email protected]

20 Lenovo and Midokura OpenStack PoC: Software-Defined Everything

About the authors

Krzysztof (Chris) Janiszewski is a member of Enterprise Solution Services team at Lenovo. His main background is in designing, developing, and administering multiplatform, clustered, software-defined, and cloud environments. Chris previously led System Test efforts for the IBM OpenStack cloud-based solution for x86, IBM System z, and IBM Power platforms.

Michael Lea, CCIE #11662, CISSP, MBA, is an Enterprise System Engineer for Lenovo’s Enterprise Business Group. He has over 18 years of experience in the field designing customers networks and data centers. Over the past 18 years, Michael worked with service providers, managed service providers (MSPs), and large enterprises, delivering cost effective solutions that include networking, data center, and security assurance. By always looking at technical and business requirements, Michael makes certain that the proper technologies are used to help clients meet their business objectives. Previous roles held by Michael include Consulting Systems Engineer with Cisco Systems and IBM.

Cynthia Thomas Cynthia is a Systems Engineer at Midokura. Her background in networking spans Data Center, Telecommunications, and Campus/Enterprise solutions. Cynthia has earned a number of professional certifications, including: Alcatel-Lucent Network Routing Specialist II (NRS II) written certification exams, Brocade Certified Ethernet Fabric Professional (BCEFP), Brocade Certified IP Network Professional (BCNP), and VMware Technical Sales Professional (VTSP) 5 certifications.

Susan Wu is the Director of Technical Marketing at Midokura. Susan previously led product positions for Oracle/Sun, Citrix, AMD, and Docker. She is a frequent speaker for industry conferences, such as Interzone, Cloudcon/Data360, and Data Storage Innovation.

Thanks to the following people for their contribution to this project:

� Srihari Angaluri, Lenovo� Michael Ford, Midokura� Adam Johnson, Midokura� David Watts, Lenovo Press

21

Notices

Lenovo may not offer the products, services, or features discussed in this document in all countries. Consult your local Lenovo representative for information on the products and services currently available in your area. Any reference to a Lenovo product, program, or service is not intended to state or imply that only that Lenovo product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any Lenovo intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any other product, program, or service.

Lenovo may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to:

Lenovo (United States), Inc.1009 Think Place - Building OneMorrisville, NC 27560U.S.A.Attention: Lenovo Director of Licensing

LENOVO PROVIDES THIS PUBLICATION “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some jurisdictions do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you.

This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. Lenovo may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice.

The products described in this document are not intended for use in implantation or other life support applications where malfunction may result in injury or death to persons. The information contained in this document does not affect or change Lenovo product specifications or warranties. Nothing in this document shall operate as an express or implied license or indemnity under the intellectual property rights of Lenovo or third parties. All information contained in this document was obtained in specific environments and is presented as an illustration. The result obtained in other operating environments may vary.

Lenovo may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you.

Any references in this publication to non-Lenovo Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this Lenovo product, and use of those Web sites is at your own risk.

Any performance data contained herein was determined in a controlled environment. Therefore, the result obtained in other operating environments may vary significantly. Some measurements may have been made on development-level systems and there is no guarantee that these measurements will be the same on generally available systems. Furthermore, some measurements may have been estimated through extrapolation. Actual results may vary. Users of this document should verify the applicable data for their specific environment.

© Copyright Lenovo 2015. All rights reserved.Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by Global Services Administration (GSA) ADP Schedule Contract 22

This document REDP-5233-00 was created or updated on June 25, 2015.

Send us your comments in one of the following ways:� Use the online Contact us review Redbooks form found at:

ibm.com/redbooks� Send your comments in an email to:

[email protected]

Trademarks

Lenovo, the Lenovo logo, and For Those Who Do are trademarks or registered trademarks of Lenovo in the United States, other countries, or both. These and other Lenovo trademarked terms are marked on their first occurrence in this information with the appropriate symbol (® or ™), indicating US registered or common law trademarks owned by Lenovo at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of Lenovo trademarks is available on the Web at http://www.lenovo.com/legal/copytrade.html.

The following terms are trademarks of Lenovo in the United States, other countries, or both:

Flex System™Lenovo®

RackSwitch™Lenovo(logo)®

System x®ThinkServer®

The following terms are trademarks of other companies:

Intel, Intel Xeon, Intel logo, Intel Inside logo, and Intel Centrino logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.

Linux is a trademark of Linus Torvalds in the United States, other countries, or both.

Other company, product, or service names may be trademarks or service marks of others.

23