dmi314 determine storage performance limit validation of storage design hardware burn-in build-out...

22

Upload: ashlynn-bradley

Post on 01-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DMI314 Determine storage performance limit Validation of storage design Hardware burn-in Build-out tests
Page 2: DMI314 Determine storage performance limit Validation of storage design Hardware burn-in Build-out tests

Neil JohnsonSenior ConsultantMicrosoft Services, UK

Jetstress Notes From the Field

DMI314

Page 3: DMI314 Determine storage performance limit Validation of storage design Hardware burn-in Build-out tests

Determine storage performance limitValidation of storage designHardware burn-inBuild-out tests

What is Jetstress For?

Page 4: DMI314 Determine storage performance limit Validation of storage design Hardware burn-in Build-out tests

When Should I Use Jetstress?

ENVISION PLAN BUILD STABILISEDEPLOY

Use Jetstress during solution design to understand precisely how storage will behave

Use Jetstress during build out to check for build issues and hardware defects

Page 5: DMI314 Determine storage performance limit Validation of storage design Hardware burn-in Build-out tests

Jetstress uses ESE.DLL to generate an Exchange workload

How Does it Work?

Extensible Storage Engine(ESE)

Storage Subsystem

Background Database Maintenance

Transactional I/O

Win

dow

s I/

O M

anag

er

Dev

ice

Driv

ers

Jetstress Application

Auto tuning

Thread Dispatcher

Background Log Checksummer

Offline Log & Database Checksummer

Windows Operating System Hardware

Windows Performance Counters

Reporting and VerificationPe

rfor

man

ce D

ata

Page 6: DMI314 Determine storage performance limit Validation of storage design Hardware burn-in Build-out tests

Test a disk subsystem throughput (Recommended)Easy to configureDatabase configuration manually setWorkload controlled by thread count

Test an Exchange mailbox profile Uses Profile for configurationDatabase configuration manually setWorkload controlled by thread count

Test Types

Page 7: DMI314 Determine storage performance limit Validation of storage design Hardware burn-in Build-out tests

Test ModesPerformance Test “Strict” mode (<= 6 hour test)Average Database Read Latency: 20msAverage Log File Write Latency: 10msMax Database Read Latency: 100ms (6 x Spikes)Max Log File Write Latency: 100ms (6 x Spikes)

Stress Test “Lenient” mode (> 6 hour test)Same Read/Write LatencyMax Database Read Latency: 200ms (6 x Spikes)Max Log File Write Latency: 200ms (6 x Spikes)

Page 8: DMI314 Determine storage performance limit Validation of storage design Hardware burn-in Build-out tests

What is it?Proving that the storage platform will perform adequately, even if a common failure scenario is experienced

What type of failures?Single spindle failure (Raid)Multi-Path failuresDual controller

What should I expect?The test should still pass*

Failure Mode Testing

Page 9: DMI314 Determine storage performance limit Validation of storage design Hardware burn-in Build-out tests

Test Process

Installation Initialisation Testing Cleanup

Copy ESE FilesInstall JetstressConfigure XML

Create Databases

Set Thread countRun 2hr testRun 24hr testRun degradedEvaluate results

Remove JetstressRemove dataReboot

1 2 3 4

Page 10: DMI314 Determine storage performance limit Validation of storage design Hardware burn-in Build-out tests

Installation / ConfigurationUse latest versionCopy ESE files (ESE.DLL, ESEPERF.HXX, ESEPERF.XML, ESEPERF.INI) into installation directory.

Jetstress treats everything as an Active databaseThe test must account for every Active, Standby or Lagged database

Use “Test disk subsystem throughput”Easier to configure

Page 11: DMI314 Determine storage performance limit Validation of storage design Hardware burn-in Build-out tests

InitialisationInitialisation takes roughly 24hrs per 10TB of Data on SAN*Try to arrange your testing schedule to kick this off over a weekendJetstress generates one database and then copies the rest in parallelWith JBOD The more disk spindles you have, the faster the copy process will beCopy throughput can very high on DAS (950MB/sec; ~70TB in 24Hrs)

DATA (TB) 1TB 2TB 5TB 10TB 50TB 100TB

TIME (Hours) 2.4 4.8 12.0 24.1 120.3 240.6

TIME (Days) 0.1 0.2 0.5 1.0 5.0 10.0

Page 12: DMI314 Determine storage performance limit Validation of storage design Hardware burn-in Build-out tests

TestingSett thread countStart low and work up until the test failsUse short test duration (0.5 = 30 minutes) to set thread countJetstress generates roughly 30 Random IOPS/thread

Perform 2hr testPerform 24hr testPerform degraded mode 2hr test (If appropriate)Raid array rebuildingDegraded IO pathsDegraded storage controllers

Page 13: DMI314 Determine storage performance limit Validation of storage design Hardware burn-in Build-out tests

Report WalkthroughThe following data is from a real customer testThanks Boris

Walkthrough of a test report

Page 14: DMI314 Determine storage performance limit Validation of storage design Hardware burn-in Build-out tests

2HR 24 HR DEGRADED

Success Criteria

Meet or Exceeds IOPS

Meet or Exceed Latency Recommendations

Complete Test Run Without Error or Corruption

Page 15: DMI314 Determine storage performance limit Validation of storage design Hardware burn-in Build-out tests

Clean-upCopy test data somewhere “safe”Uninstall JetstressRemove test databasesThere are some scripts hidden in the field guide that can helpCreate-JetstressDataFolders.ps1Remove-JetstressDataFolders.ps1Both require Jetstress.XML file for parsing

Remove Jetstress installation folderReboot

Page 16: DMI314 Determine storage performance limit Validation of storage design Hardware burn-in Build-out tests

Changes in Jetstress 2013The Event log is captured and logged.Errors are logged against the volumeA single IO error anywhere will fail the test.Detects -1018, -1019, -1021, -1022, -1119, hung IO, DbtimeTooNew, DbtimeTooOld.Threads, which generate IO, are now controlled at a global level.This means Auto-Tuning should work again*

Cannot use Jetstress 2013 with Exchange 2010

Basically things are the same as Jetstress 2010 with some bugs fixed and better error handling.

Page 17: DMI314 Determine storage performance limit Validation of storage design Hardware burn-in Build-out tests

Jetstress – Known IssuesCU1 DLL’s and more than 38 DB’s

Server stack trace: at Microsoft.Exchange.Jetstress.Performance.PerfLog.AddCounterWildcard(String wildcardPath) at System.Collections.Generic.List`1.ForEach(Action`1 action) at Microsoft.Exchange.Jetstress.Performance.PerfLog..ctor(String fileName, Boolean binaryLog, Boolean includeJetDatabase, Int32 millisecInterval, TextWriter outWriter, TextWriter errorWriter) at Microsoft.Exchange.Jetstress.Core.StressEngine.ExecuteTest() at System.Runtime.Remoting.Messaging.StackBuilderSink._PrivateProcessMessage(IntPtr md, Object[] args, Object server, Object[]& outArgs) at System.Runtime.Remoting.Messaging.StackBuilderSink.AsyncProcessMessage(IMessage msg, IMessageSink replySink)Exception rethrown at [0]: at System.Runtime.Remoting.Proxies.RealProxy.EndInvokeHelper(Message reqMsg, Boolean bProxyCase) at System.Runtime.Remoting.Proxies.RemotingProxy.Invoke(Object NotUsed, MessageData& msgData) at System.Threading.ThreadStart.EndInvoke(IAsyncResult result) at Microsoft.Exchange.Jetstress.Core.StressEngine.EndExecuteTest() at Microsoft.Exchange.Jetstress.MainConsole.Main(String[] args)

Use SP1 ESE.DLL to workaround this issue.

Page 18: DMI314 Determine storage performance limit Validation of storage design Hardware burn-in Build-out tests

Jetstress – Known IssuesFaulty Logical Disk Performance Counters…

22.10.2013 17:18:00 -- Microsoft Exchange Jetstress 2013 Core Engine (version: 15.00.0775.000) detected. 22.10.2013 17:18:00 -- Windows Server 2012 Standard  (6.2.9200.0) detected. 22.10.2013 17:18:00 -- Microsoft Exchange Server Database Storage Engine (version: 15.00.0712.008) was detected. 22.10.2013 17:18:00 -- Microsoft Exchange Server Database Storage Engine Performance Library (version: 15.00.0712.008) was detected. 22.10.2013 17:18:58 -- Jetstress testing begins ... 22.10.2013 17:18:58 -- Preparing for testing ... 22.10.2013 17:18:59 -- Attaching databases ... 22.10.2013 17:18:59 -- Preparations for testing are complete. 22.10.2013 17:18:59 -- Jetstress testing failed. Error: Jetstress found the following faulty logical disk performance counters: C:\ExchangeDatabases\DAG01-MDB-1. Ensure that all logical disk performance counters are working correctly with System Monitor.

Error: Instance 'C:\ExchangeDatabases\DAG01-MDB-1' does not exist in the specified Category.

Page 19: DMI314 Determine storage performance limit Validation of storage design Hardware burn-in Build-out tests

Related SessionsHow to uncover the secrets of Disk LatencySession: MNG.302 Date: Wednesday Time: 4:45 PM - 6:00 PM Room: MR 19ab Session

Page 20: DMI314 Determine storage performance limit Validation of storage design Hardware burn-in Build-out tests

Q&A

Page 21: DMI314 Determine storage performance limit Validation of storage design Hardware burn-in Build-out tests
Page 22: DMI314 Determine storage performance limit Validation of storage design Hardware burn-in Build-out tests

© 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.