vmworld 2013: troubleshooting at cox communications with vmware vcenter log insight and vcenter...

38
Troubleshooting at Cox Communications with VMware vCenter Log Insight and vCenter Operations Management Suite Chris Nakagaki, Cox Communications Jason Davis, Cox Communications Himanshu Kumar Singh, VMware VCM5034 #VCM5034

Upload: vmworld

Post on 15-Jul-2015

115 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: VMworld 2013: Troubleshooting at Cox Communications with VMware vCenter Log Insight and vCenter Operations Management Suite

Troubleshooting at Cox Communications with

VMware vCenter Log Insight and vCenter

Operations Management Suite

Chris Nakagaki, Cox Communications

Jason Davis, Cox Communications

Himanshu Kumar Singh, VMware

VCM5034

#VCM5034

Page 2: VMworld 2013: Troubleshooting at Cox Communications with VMware vCenter Log Insight and vCenter Operations Management Suite

Troubleshooting at

Cox Communications

with VMware vCenter

Log Insight and

vCenter Operations

Management Suite

Press Start

Page 3: VMworld 2013: Troubleshooting at Cox Communications with VMware vCenter Log Insight and vCenter Operations Management Suite

Player 1!

x3

World vCOPs

Page 4: VMworld 2013: Troubleshooting at Cox Communications with VMware vCenter Log Insight and vCenter Operations Management Suite

Agenda Background

Why vCOPs and Log Insight?

vCOPs

Capacity Planning Demo

Custom Dashboarding Demo

HeatMaps Demo

Log Insight – What is it? How did Cox use it?

Storage Deeper Dive Demo

VM Backup Failures Demo

Q&A

How to Play

Page 5: VMworld 2013: Troubleshooting at Cox Communications with VMware vCenter Log Insight and vCenter Operations Management Suite

Background

Cox Communications, Inc. (Atlanta)

100+ Hosts, 3000+ VM’s

2800+GHz Compute Capacity

13.5 TB Memory Capacity

200TB SAN Storage

Chris Nakagaki vExpert 2011, 2012, 2013

10 years @ Cox Communications

Started w/ ESX 2.5

@zsoldier

Jason Davis

15 years Windows Experience

12 years @ Cox Communications

Started w/ ESX 2.0

Credits?

Page 6: VMworld 2013: Troubleshooting at Cox Communications with VMware vCenter Log Insight and vCenter Operations Management Suite

Why vCOPs and Log Insight?

!?

Dynamic Thresholds (vCOPs)

Easy Deployment (vCOPs/Log Insight)

Capacity Planning (vCOPs)

Cloud Suite Cost Savings (vCOPs)

Log Aggregation (Log Insight)

Pretty Pictures (vCOPs)

Because we like to have a strong upper

and lower body.

Page 7: VMworld 2013: Troubleshooting at Cox Communications with VMware vCenter Log Insight and vCenter Operations Management Suite

vCOPs – Is there capacity?

1UP!

Network switch maintenance

Multiple hosted production VM’s

potentially affected

Can we place affected hosts in

maintenance mode and maintain

uptime?

Page 8: VMworld 2013: Troubleshooting at Cox Communications with VMware vCenter Log Insight and vCenter Operations Management Suite

vCOPs – Is there capacity?

1UP!

Page 9: VMworld 2013: Troubleshooting at Cox Communications with VMware vCenter Log Insight and vCenter Operations Management Suite

vCOPs – Is there capacity?

1UP!

Conclusion:

Yes, there is capacity

Network maintenance can proceed

Demonstrated:

vCOPs Capacity Planning Tool

Bottleneck is disk space not anything else

VM’s can continue to run

Page 10: VMworld 2013: Troubleshooting at Cox Communications with VMware vCenter Log Insight and vCenter Operations Management Suite

vCOPs - How do we monitor streaming servers?

Sim Infrastructure

Live streaming event w/ CEO and CTO

Monitor VM’s associated w/ streaming

service live!

Key Metrics?

CPU

Memory

Network

Page 11: VMworld 2013: Troubleshooting at Cox Communications with VMware vCenter Log Insight and vCenter Operations Management Suite

vCOPs - How do we monitor streaming servers?

Sim Infrastructure

Page 12: VMworld 2013: Troubleshooting at Cox Communications with VMware vCenter Log Insight and vCenter Operations Management Suite

vCOPs - How do we monitor streaming servers?

Sim Infrastructure

Conclusion:

vCOPs custom dashboarding is useful!

We demonstrated:

Grouping all streaming VMs as an

application object

Creating a custom dashboard

Focused on 3 Key Metrics

Health Tree to show who’s being lazy

Page 13: VMworld 2013: Troubleshooting at Cox Communications with VMware vCenter Log Insight and vCenter Operations Management Suite

vCOPs – Why are VM’s slow?

POW!

Receiving reports that VM’s are

performing slowly.

No immediate discernable pattern

vCOPs to the rescue!

Page 14: VMworld 2013: Troubleshooting at Cox Communications with VMware vCenter Log Insight and vCenter Operations Management Suite

vCOPs – Why are VM’s slow?

POW!

Page 15: VMworld 2013: Troubleshooting at Cox Communications with VMware vCenter Log Insight and vCenter Operations Management Suite

vCOPs – Why are VM’s slow?

POW!

Determined one array having severe

latency.

Now questions arise around VMware NMP

To Log Insight for deeper analysis…

Page 16: VMworld 2013: Troubleshooting at Cox Communications with VMware vCenter Log Insight and vCenter Operations Management Suite

Why vCOPs AND Log Insight?

Chicken Legs

Page 17: VMworld 2013: Troubleshooting at Cox Communications with VMware vCenter Log Insight and vCenter Operations Management Suite

Log Insight – What is it?

Continue?

Continue?

5 4 3 2 1 0

Page 18: VMworld 2013: Troubleshooting at Cox Communications with VMware vCenter Log Insight and vCenter Operations Management Suite

18

We Interrupt This Program

To Bring You An Important Message…

Page 19: VMworld 2013: Troubleshooting at Cox Communications with VMware vCenter Log Insight and vCenter Operations Management Suite

19

Introducing: VMware vCenter Log Insight

Himanshu Singh

Senior Product Marketing Manager

Enterprise and Cloud Management, VMware

Page 20: VMworld 2013: Troubleshooting at Cox Communications with VMware vCenter Log Insight and vCenter Operations Management Suite

20

Problem: Operate and Troubleshoot a Complex System

VMware

Logs

OS and

App Logs

Physical Infrastructure Logs

Page 21: VMworld 2013: Troubleshooting at Cox Communications with VMware vCenter Log Insight and vCenter Operations Management Suite

21

VMware’s Approach to Log Management

Extend Analytics to Log Data

• With vC Ops, VMware introduced an analytics-based operations

management solution for structured data (metrics, KPIs, events, alerts)

• Log Insight extends our analytics-based approach to logs and

unstructured, machine generated data

Easy to Use and Accessible

• Existing solutions are highly specialized and often too expensive

• Log Insight has an intuitive, easy-to-use interface

• Using a predictable pricing model with unlimited amount of log data,

making it accessible to all

Optimized for VMware Environments

• Log Insight comes with built-in knowledge and native support for vSphere

• Integration with vC Ops maximizes ROI and value, providing a complete

cloud operations management solution

1

2

3

Page 22: VMworld 2013: Troubleshooting at Cox Communications with VMware vCenter Log Insight and vCenter Operations Management Suite

22

VMware Cloud Ops Mgmt = Log Insight and vCenter Operations

Cloud Operations Management

• vCenter Log Insight and vCenter

Operations complement each other

• Delivers best of breed capabilities for

performance, capacity, configuration

management

• Tight integration enables seamless

transition from monitoring to

troubleshooting

• Log Insight and VC Ops together provide

a complete solution for

Cloud Operations Management

Page 23: VMworld 2013: Troubleshooting at Cox Communications with VMware vCenter Log Insight and vCenter Operations Management Suite

23

Key vCenter Log Insight Use Cases

IT Operations

• Troubleshooting and Root Cause Analysis

I observed a problem (e.g. slowness), try to troubleshoot the problem and identify the

part of the stack that is responsible (e.g. network delay vs storage)

Follow the trail from vC Ops to logs to get to root cause to an observed problem

• Monitoring Using Logs

Monitor metrics and events (performance & change) that are visible only in logs

Collect all the data in one place without the need for custom parsing, transformation of

data

Security and Compliance

• Security Forensics

• Comprehensive Audit (who, when) / Compliance Reporting

Business Transaction Monitoring

• Collect and correlate transaction logs with infrastructure performance

Page 24: VMworld 2013: Troubleshooting at Cox Communications with VMware vCenter Log Insight and vCenter Operations Management Suite

24

Integration with vCenter Operations

Automated correlation of performance and log data

Page 25: VMworld 2013: Troubleshooting at Cox Communications with VMware vCenter Log Insight and vCenter Operations Management Suite

25

Announcing: Log Insight Content Pack Marketplace

And more…

https://solutionexchange.vmware.com/store/loginsight

Extend vCenter Log Insight with Content Packs from:

Page 26: VMworld 2013: Troubleshooting at Cox Communications with VMware vCenter Log Insight and vCenter Operations Management Suite

26

And Now…

Back To Regular Scheduled Programming…

Page 27: VMworld 2013: Troubleshooting at Cox Communications with VMware vCenter Log Insight and vCenter Operations Management Suite

Player 2!

x3

World Log Insight

Page 28: VMworld 2013: Troubleshooting at Cox Communications with VMware vCenter Log Insight and vCenter Operations Management Suite

Log Insight – Was Round Robin causing issues?

Were paths being marked dead?

Were the paths remaining dead?

Did the paths come back when

expected?

LET’S SEE ….

Leeroy Jenkins!

Page 29: VMworld 2013: Troubleshooting at Cox Communications with VMware vCenter Log Insight and vCenter Operations Management Suite

Log Insight – Was Round Robin causing issues?

Leeroy Jenkins!

Page 30: VMworld 2013: Troubleshooting at Cox Communications with VMware vCenter Log Insight and vCenter Operations Management Suite

Log Insight – Was Round Robin causing issues?

Conclusion:

No, round robin was not causing issues!

We Demonstrated:

Paths were marked DEAD.

Paths remained DEAD.

Paths came back ON when expected.

Leeroy Jenkins!

Page 31: VMworld 2013: Troubleshooting at Cox Communications with VMware vCenter Log Insight and vCenter Operations Management Suite

Log Insight – What’s causing VM backup failures?

Netbackup has snapshot errors (status

code 156).

Symantec HOWTO70949 article states

there are multiple possible causes.

Which is the most probable cause?

Does VMware have correlating logs?

LET’S SEE …

Paku-Man?

Page 32: VMworld 2013: Troubleshooting at Cox Communications with VMware vCenter Log Insight and vCenter Operations Management Suite

Log Insight – What’s causing VM backup failures?

Paku-Man?

Page 33: VMworld 2013: Troubleshooting at Cox Communications with VMware vCenter Log Insight and vCenter Operations Management Suite

Log Insight – What’s causing VM backup failures?

Conclusion:

The most probable cause is inability to

create VM snapshots due to timeouts.

We Demonstrated:

Correlating errors in VMware logs stating: “The guest OS has reported an error during quiescing.”

VMware KB 1018194 provides additional

troubleshooting steps:

Reboot the VM

Reduce I/O

Etc ….

Paku-Man?

Page 34: VMworld 2013: Troubleshooting at Cox Communications with VMware vCenter Log Insight and vCenter Operations Management Suite

Q&A

42

Page 35: VMworld 2013: Troubleshooting at Cox Communications with VMware vCenter Log Insight and vCenter Operations Management Suite

35

Other VMware Activities Related to This Session

HOL:

HOL-SDC-1301

Applied Cloud Operations

Group Discussions:

VCM1002-GD, VCM1004-GD

Cloud Operations with Hicham Mourad or Sam McBride

Breakout Session – repeat by demand:

VCM4528 – Thursday, 2 pm Moscone West, room 3001

Tips and Tricks with vCenter Log Insight

Follow us:

@VMLogInsight and get 5 free licenses

Hang with us:

Booth 2020 – Cloud Management Lounge

VCM5034

Page 36: VMworld 2013: Troubleshooting at Cox Communications with VMware vCenter Log Insight and vCenter Operations Management Suite

THANK YOU

Page 37: VMworld 2013: Troubleshooting at Cox Communications with VMware vCenter Log Insight and vCenter Operations Management Suite
Page 38: VMworld 2013: Troubleshooting at Cox Communications with VMware vCenter Log Insight and vCenter Operations Management Suite

Troubleshooting at Cox Communications with

VMware vCenter Log Insight and vCenter

Operations Management Suite

Chris Nakagaki, Cox Communications

Jason Davis, Cox Communications

Himanshu Kumar Singh, VMware

VCM5034

#VCM5034