webinar: openstack best practices for production
TRANSCRIPT
Sirish Raghuram Co-founder, CEO
Platform9
7 OpenStack Best Practices
Private Clouds Made Easy
Roopak Parikh Co-founder, VP Engineering
Platform9
© 2015 Platform9 Systems, Inc. Webinar: Best Practices for OpenStack in Production
Speaker Bio
2
Sirish Raghuram
• Co-founder, CEO at Platform9
• Previously: Staff Engineer at VMware (12 years)
• Technical and Management responsibility for multiple VMware products
Roopak Parikh
• Co-founder, VP Engineering at Platform9
• Previously: Staff Engineer at VMware (7 years)
• Architect for multiple VMware products
© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?
• Best practices from managing 50+ active OpenStack deployments
• Recommended for technical audience looking to use OpenStack in production
• Assumes fair knowledge of OpenStack
Preamble
3
© 2015 Platform9 Systems, Inc. Webinar: Best Practices for OpenStack in Production
OpenStack Architecture
4
Clarity UI
Nova !!
Cin
de
r
Scheduler
Keystone (Identity)
CLI / Tools Scripts Heat (Orchestration)
Ne
utr
on
Gla
nce
(Im
age
s)
Basic Storage
Compute
Basic Network
BlockStorage
NetworkController
© 2015 Platform9 Systems, Inc. Webinar: Best Practices for OpenStack in Production
Platform9 Managed OpenStack:
• Your servers host your data
• Platform9 hosts the OpenStack controller as a Service, with an SLA
• No need to install, monitor, troubleshoot or upgrade OpenStack
Platform9 Managed OpenStack
5
© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?
• Controller API logs
• Nginx or Apache
• Controller services
• /var/log/nova/*, /var/log/glance/*, /var/log/keystone…
• Rabbit/MQ
• /var/log/rabbitmq
• Controller system health
• CPU, Memory, Disk, N/W
• File Descriptors
• Sockets
• Compute node logs (occasionally)
• nova, glance, other services
• Rarely, libvirt
#1 — Instrument & Monitor
6
© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?
Platform9 Log Telemetry
7
raw log
raw log
raw log
raw log
… Pre-process(filter)
log storage, archival and
search
Alert filters
alertmechanism
Alerts
© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?
• 100% automation is key
• Alerts can be very noisy
• Future:
• Sentry / Rollbar / to easily discern problem areas by severity and priority
• Migrate from papertrail to E-L-K?
Takeaways
8
© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?
• Common points of failure
• OpenStack Controller
• Database
• Python applications (Keystone, Nova, Glance, et al)
• Rabbit-mq
• Compute Nodes
• Agent software uptime
#2 — High Availability Configuration
9
© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?
Platform9 HA Architecture
10
Compute NodeCompute NodeCompute NodeCompute Node …
Internet
OpenStack Controller
OpenStack Controller
OpenStack Controller
UI
VirtualIP
Load Bala-ncer
Intranet
ReplicatedDB
© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?
• SLA —> must recover quickly from losing Controller
• Backup Controller DB
• Backup Controller State
• Automated recipe to restore from backup
• Test restore recipe
#3 — Backup / Restore
11
© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?
• Automated mechanism to rollout
• Controller upgrade
• Compute node agent upgrade
• Plan for testing upgrade before committing
• Roll-back if required
#4 — Upgrade / Patch Rollout
12
© 2015 Platform9 Systems, Inc. @Platform9Sys
Platform9 Orchestration
13
Vanilla OS
customer state
Template Image V1
Customer Server V1
Fresh Install
Upgrade
Vanilla OS Template Image V2
Customer Server V2
© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?
Platform9: Havana to Juno Upgrade
14
© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?
• Segregate underlying infrastructure for different classes of workloads (or users!)
• By workload, hardware type, geography or organization
• Illustrations:
• Test/Dev vs Production
• Tier 1 vs Tier 2
• SSD vs HDD
#5 — Workload Tiering
15
© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?
Intelligent Placement
16
DevOps
Tier-2Infra
Tier-1Infra
Private Cloud
Tier-2Tier-1
© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?
• OpenStack controller and compute node software communicate over message queues
• Reliable message delivery is critical to OpenStack
• Issue
• Once in ~2-5000 API requests, compute node or controller node can lose connection to queue
• Result: messages stuck in queue and never delivered
• Result: operations can stall, seemingly at random
• Resolution
• oslo messaging heart-beating applied Jan 2015
• Ref: https://github.com/openstack/oslo.messaging/commit/b9e134d7e955b9180482d2f7c8844501c750adf6
• Disabled in April: https://github.com/openstack/oslo.messaging/commit/287a4f56f45ed9cd40116a9e7b6e529f3382a925
• Platform9 has a Platform9 specific heart-beat mechanism, leverages Platform9 web socket architecture
#6 — Hardened Messaging Libs
17
© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?
• Issue #6 is an example of an issue you will run into
• Be prepared to
• Debug / diagnose
• It took us ~7 man days to debug issue #6 (worst case example)
• Roll out a patch
• Techniques
• Separate webinar topic!
#7 — Troubleshooting / Debugging
18
© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?
• Reviewed 7 best practices to running OpenStack successfully
• Share your own tips — share via GTM chat panel!
Recap
19
© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?
• Production grade OpenStack without the hard work
• Request your own Platform9 account
• Related resources
• OpenStack benefits for KVM / VMware — recorded webinars
• Upcoming webinar: Jun 7, 2015
• Have questions?
• Ask away!
• Get in touch:
• @Platform9Sys
Summary
20