inside azure diagnostics

38
Inside Azure Diagnostics Pittsburgh Tech Fest June 7 th , 2014

Upload: michael-collier

Post on 02-Jun-2015

1.484 views

Category:

Technology


2 download

DESCRIPTION

** Session from Pittsburgh Tech Fest - June 2014 **

TRANSCRIPT

Page 1: Inside Azure Diagnostics

Inside Azure Diagnostics

Pittsburgh Tech FestJune 7th, 2014

Page 2: Inside Azure Diagnostics

Michael S. CollierPrincipal Cloud Architect

[email protected]@MichaelCollierwww.MichaelSCollier.com

Page 3: Inside Azure Diagnostics

17

COLUMBUS, OH OCTOBER 17, 2014 CLOUDDEVELOP.ORG

Call for

Speakers

-- June 27th --

Page 4: Inside Azure Diagnostics

Today’s Agenda

1 / The need for diagnostic data in cloud applications

2 / Data we can we monitor

3 / Using the Microsoft Azure Diagnostic Agent

4 / Real-world guidance for troubleshooting Microsoft Azure

apps

Page 5: Inside Azure Diagnostics

Success vs. FailureSuccessful projects share at least one common trait . . .

node.js C# Java

Agile- vs -

Waterfall

Page 6: Inside Azure Diagnostics

Success vs. FailureSuccessful projects share at least one common trait . . .

Diagnostics Data / Telemetry

Page 7: Inside Azure Diagnostics

A True Story

Scenario1 week before date of production launch. “Am I ready?”

Well, we eventually log

any fatal errors, but that’s all.

OH . . .

Logs? Yeah . . .we really don’t have logs.

Let’s run some tests and look at your logs

I guess that’s better than

nothing.

We looked at Azure diagnostic logging but

didn’t see much value in it

Page 8: Inside Azure Diagnostics

A True Story

You’re kidding? Right?

Page 9: Inside Azure Diagnostics

A True StoryScenarioo Determine if solution is production

readyo Deployed as an Azure Cloud

Serviceo No load testso No performance testso No unit testso Very little instrumentation

We have a problemhttp://www.cutedaily.com/wp-content/uploads/2011/11/shockedbaby.jpg

Page 10: Inside Azure Diagnostics

A True StoryResolutiono Step 0 – Enable Azure

diagnostics• Set key performance

counterso Step 1 – Add logging

statements around key functionality• Especially external services

o Step 3 – Test, test, testo Step 4 – Analyzeo Step 5 – Fix it

Scenarioo Determine if solution is production

readyo Deployed as an Azure Cloud

Serviceo No load testso No performance testso No unit testso Very little instrumentation

Page 11: Inside Azure Diagnostics

Instrumentation more important in “the cloud”o Need to have good instrumentation for on-premises

applications

o Cloud – it matters more!

o Distributed environments and serviceso Composite applicationso Reliance on 3rd party vendors . . . such as Microsoft for Azureo Highly automated environmentso Scale out modelo Massive amounts of data

Page 12: Inside Azure Diagnostics

The Cloud Scales

worker roles

web roles

Page 13: Inside Azure Diagnostics

The Cloud Scales . . . You Do Not

worker roles

web roles

Diagnostic Data – 4x

Page 14: Inside Azure Diagnostics

Diagnostic DataWhat data do you gather today?

Performance Counters

Custom Logs(nLog, Log4net, etc.)

IIS Logs

Windows Event Logs

Crash Dumps

Page 15: Inside Azure Diagnostics

Diagnostic Data

Performance Counters

Custom Logs(nLog, Log4net, etc.)

IIS Logs

Windows Event Logs

Crash Dumps

Page 16: Inside Azure Diagnostics

Diagnostic Data – Azure Not so Different

Performance Counters

Custom Logs(nLog, Log4net, etc.)

IIS Logs

Windows Event Logs

Crash Dumps

Azure Storage

Page 17: Inside Azure Diagnostics

Diagnostic Data StorageDiagnostic Item Table Name Blob Container

NameWindows Event Logs WADWindowsEventLogsTable  

Performance Counters WADPerformanceCountersTable  

Trace Log Statements WADLogsTable  

Azure Diagnostic Infrastructure Logs

WADDiagnosticInfrastructureLogs

 

Custom Logs(i.e. log4net, NLog, etc.)

  <custom>

IIS Logs WADDirectoriesTable* wad-iis-logfiles

IIS Failed Request Logs WADDirectoriesTable* wad-iis-failedreqlogfiles

Crash Dumps WADDirectoriesTable*  * Location of the blob log file is specified in the Container field and name of the blob in the RelativePath field. The AbsolutePath field contains the name of the file as it existed on the role instance.

Page 18: Inside Azure Diagnostics

Diagnostic Monitor Agent

1. Role starts2. Diagnostic monitor

agent starts3. Diagnostics

configured4. Data buffered

locally5. Data transferred to

storagewad-control-containero Container in Azure blob

storage

Page 19: Inside Azure Diagnostics

Diagnostic Monitor Agent

Page 20: Inside Azure Diagnostics

Configuration Options

Default Configuration

Imperative Configuration

Declarative Configuration

o Trace logso IIS logso Infrastructure

logs

o No transfer

o OnStart()

o Overrides default

o diagnostics.wadcfg

o Overrides imperative

Page 21: Inside Azure Diagnostics

Imperativepublic override bool OnStart(){    // Create the DiagnosticMonitorConfiguration object to use for configuring the monitoring agent.    DiagnosticMonitorConfiguration config = DiagnosticMonitor.GetDefaultInitialConfiguration();     // Performance Counter configuration    config.PerformanceCounters.DataSources.Add(new PerformanceCounterConfiguration    {        CounterSpecifier = @"\Processor(_Total)\% Processor Time",        SampleRate = TimeSpan.FromSeconds(30)    });       config.PerformanceCounters.ScheduledTransferPeriod = TimeSpan.FromMinutes(1);     // Log configuration    config.Logs.ScheduledTransferLogLevelFilter = LogLevel.Information;    config.Logs.ScheduledTransferPeriod = TimeSpan.FromMinutes(1);     // Event Log configuration    config.WindowsEventLog.DataSources.Add("Application!*");    config.WindowsEventLog.DataSources.Add("System!*");    config.WindowsEventLog.ScheduledTransferLogLevelFilter = LogLevel.Warning;    config.WindowsEventLog.ScheduledTransferPeriod = TimeSpan.FromMinutes(1);    // Start the diagnostic monitor with the new configuration    DiagnosticMonitor.Start("Microsoft.WindowsAzure.Plugins.Diagnostics.ConnectionString", config);     return base.OnStart();}

Impacts local agent only!

Page 22: Inside Azure Diagnostics

Imperative

Deployment ID

Page 23: Inside Azure Diagnostics

Declarative Configuration using Visual Studio

demo

Page 24: Inside Azure Diagnostics

1. wad-control-containera. Created for each role instance

2. Imperative codea. RoleInstanceManager.SetCurrentConfiguration() – update instance’s

diagnostics.wadcfg onlyb. DiagnosticMonitor.Start() – impacts current instance only; will not

update diagnostics.wadcfg

3. Declarative configurationa. Root of worker role or bin of web role

4. Default configurationa. Last resortb. Collects, but doesn’t transfer to Azure storage

There’s a Precedence

Page 25: Inside Azure Diagnostics

oDeployment Updateo Change configuration and redeploy package

oRemotelyo Visual Studioo APIo Cerebrata Azure Management Studio

Update Diagnostic Configuration

Page 26: Inside Azure Diagnostics

On-Demand TransferInstruct WAD to transfer specific data sources to storageSpecify which data sourcesSpecify time range to transferSpecify a notification queueCode or API (or tool)

Overwrites current diagnostic configurationUse sparingly . . . . With caution

More info see http://msdn.microsoft.com/en-us/library/gg433075.aspx

Page 27: Inside Azure Diagnostics

Bonus: Verbose LoggingAdditional host-level data – not DiagnosticAgent.exe

WAD*deploymentID*PT*aggregation_interval*[R|RI]Table

Aggregation at 5 minutes, 1 hour, and 12 hour intervals

10 day retention period

Page 28: Inside Azure Diagnostics

Let’s Get Realo Sample every 1 minute*o Transfer every 5 minutes*

o Transfer only what is needed

o Azure Diagnostics writes data in 60 second wide partitions

o Too much data could overwhelm the partition

* Don’t take my word for it. You don’t know me. Test and validate for your situation.

Page 29: Inside Azure Diagnostics

Query Azure Diagnostic Data

demo

Page 30: Inside Azure Diagnostics

o Two separate channels for telemetry dataoVital informationo Application or service failures. Higher level of alerting.o Fix and return to “normal” as soon as possibleo Alert now – email, SMS, dashboard, ninjas from ceiling, etc.

oDay-to-day operational datao Root cause analysisoHow to prevent in the futureo Azure diagnostics

o Fine tune the alerts – reduce false alarms and noise

Set Priorities

Page 31: Inside Azure Diagnostics

Define Key Metrics

Compute node

resource usage

Windows Event logs

Database queries

response times

Application specific

exceptions

Database connection & cmd failures

Microsoft Azure

Storage Analytics

Process for Azure hosted solutions is not that different from traditional, on-premises solutions.

Page 32: Inside Azure Diagnostics

o Log all calls to external services. Challenge an SLA?

o Log details of transient faults

o Partition telemetry data by date (or hour) – reduce impact of data aggregation or reporting

o Use a different storage account!

o Remove old / non-relevant telemetry data

o Apply to development, test, and QA versions – validate performance & ensure telemetry systems operating correctly

Considerations

Page 33: Inside Azure Diagnostics

o Bring Azure diagnostic data into relational databaseo Easier reportingo Periodically fetch from Azure table and insert into SQL Database table.

Use PK and keep most recent.o Custom code

o Supplement Azure diagnostics with other toolso New Relic or AppDynamicso Cerebrata Azure Management Studioo AzureWatch (Paraleap)

Considerations (cont.)

Page 34: Inside Azure Diagnostics

o Instrumentation and telemetry are key to successful projects

o Cloud metrics similar to metrics for traditional applications

o Be realistic and set priorities

o 3rd party tools can be essential tool for troubleshooting

Summary

Page 35: Inside Azure Diagnostics

o Diagnostics Configuration Order of Precedence – http://bit.ly/1eomek9

o Use the Azure Diagnostic Configuration File – http://bit.ly/1mVHN3u

o Cloud Service Fundamentals (wiki) – http://bit.ly/1k1YkjI

o Failsafe: Guidance for Resilient Cloud Architectures – http://bit.ly/Q33mkU

o Best Practices for the Design of Large-Scale Services on Windows Azure Cloud Services – http://bit.ly/1qp4omC

Resources

Page 36: Inside Azure Diagnostics

oMulti-part series on Azure diagnostics

oMany other fantastic articles:o Azure storage queueso Cloud Serviceso Automated testing in Azure

Just Azure

www.JustAzure.com

Page 37: Inside Azure Diagnostics

Questions?

Page 38: Inside Azure Diagnostics

Thank You!Michael S. CollierPrincipal Cloud Architect

[email protected]@MichaelCollierwww.MichaelSCollier.com