Page 1: Capturing comprehensive storage workload traces in windows

Capturing Comprehensive Storage Workload Traces in Windows

Dr. Bruce WorthingtonWindows Server Performance

Microsoft Corporation

Page 2: Capturing comprehensive storage workload traces in windows

My Motivation• I’m tired of seeing storage research and performance

analysis limited by real-world trace availability.– It’s not all that much better now than it was when I

was in grad school in the early 90’s…• I’ve been saying I would help supply researchers with

long-term real-world traces and post-processing tools for almost a decade. – I’m finally following through on the promise to supply

production traces.– More importantly, Microsoft has made it easy for

anyone to capture detailed storage workload traces on Windows systems (along with many other types of traces and profiles).

Page 3: Capturing comprehensive storage workload traces in windows


• The tools– Event Tracing for Windows (ETW)– New: xperf, xperfinfo– Old: logman, trace*

• The traces– Benchmark (steady-state) workloads– Production workloads

• The challenge

Page 4: Capturing comprehensive storage workload traces in windows

Event Tracing for Windows (ETW)• ETW has been the core Windows tracing component

since Windows 2000 and is continually improved• Many Windows components, including the kernel,

produce events describing their behavior• Events from user-mode applications and kernel-mode

drivers can be logged• High performance, low overhead, highly scalable

– Efficient buffering and non-blocking logging mechanisms using per-processor buffers written to disk by a separate thread

• Tracing can be enabled/disabled dynamically without requiring system reboots or application restarts

Page 5: Capturing comprehensive storage workload traces in windows

Event Tracing for Windows (ETW)• Events can be sample-based, but most are single-instance (event A

occurred at time T)• Support for real-time consumption and file-based tracing• Configurable logging mode, buffer size, buffer count

– Sequential traces– Circular traces– Circular traces in memory (flight black-box) [Vista]

• Adding custom events enables better correlation of application activity with low-level resource usage

• On a standard Vista computer, the logging API (EventWrite) takes about 5,000 cycles, mostly spent in acquiring the timestamp via QueryPerformanceCounter (QPC)– About 2.5% processor overhead for a sustained rate of 10,000

events/second on a 2GHz processor – not including the cost of flushing trace buffers to disk

• Postprocessing the binary disk log correlates events with context and domain specific knowledge

Page 6: Capturing comprehensive storage workload traces in windows

ETW Architecture• Provider

– Provides event traces. Can be user-mode app, kernel-mode driver, or kernel.

– Providers use ETW APIs to register with the ETW framework to send event traces from various points in the code.

– When enabled, the provider sends event traces to a specific trace session designated by the controller.

• Controller– Assists in starting, stopping or updating trace

sessions in the kernel as well as enabling or disabling providers

– Sets trace session properties such as sequential or circular file logging or direct delivery to consumers

• Consumer– Reads trace files or listens to active trace sessions

and processes logged events– Not aware of the Providers– Only receive event traces from the trace sessions

or log files


• Event Trace Session infrastructure– Brokers the event traces from provider(s)

to consumer(s) and adds data to each event (e.g., TimeStamp, Thread, Process, CPU)

Page 7: Capturing comprehensive storage workload traces in windows

“NT Kernel Provider”• Events

– Process, thread and image– Sampled Profile– Context Switch– Dispatcher (Ready Thread)– DPC (Deferred Procedure Call)– ISR (Interrupt Service Routine)– Disk I/O– File I/O– Registry– Hardfault– Pagefault– Driver delay– TCP/UDP– Power– ALPC

– Virtual Allocation– Heap– Memory– …

• Related providers– Thread Pool– Power Transition– Winlogon– Services– Prefetch

• Other providers– Shell– Internet Explorer– Media Foundation– Media Center– …

Page 8: Capturing comprehensive storage workload traces in windows

System Config: ETW Instrumentation• Automatically added to kernel traces• Rundown of system configuration at trace start/stop time

– CPU (number of logical and physical processors, frequency)– Memory (memory size, page size, allocation granularity)– Disk (physical disks, partitions, volumes)– Video adapters– Network adapters (IPv4, IPv6)– Services (including service tag)– Plug-and-Play Information– IRQ Assignment– Power capabilities (S1 - S5)– Network Identity (computer name, domain name)– Group Masks (what kernel flags are enabled)

Page 9: Capturing comprehensive storage workload traces in windows

Storage-Related Instrumentation• Disk Events:

– Read, Write, Flush Initiation/Completion• File Events:

– Filename Create, Delete, Rundown (when trace stopped)– File I/O Initiation, Hard Fault

• Create, Cleanup, Close, Flush, Read, Write, Set Info, Query Info, FSCTL, Delete, Rename, Directory Enumeration, Directory Notification

– File I/O Completion• Driver Events:

– Driver Call, Return (Major Function)– Driver Complete Request, Complete Request Return– Driver Completion Routine

• Binary storage-related event sizes range from ~30-80 bytes (not counting events that dump unique filenames)

Page 10: Capturing comprehensive storage workload traces in windows

The new tools: xperf & xperfinfo• Extensible performance analysis toolset• High-level control and decoding of ETW traces

– Emphasis on kernel events and system-wide resource usage

– Support for 3rd-party events, primarily in conjunction with kernel events

• Cross-platform– Windows XP SP1+, Vista– Windows Server 2003, Windows Server 2008

• Cross-architecture (x86, x64, ia64)• Capture-anywhere, process-anywhere

Page 11: Capturing comprehensive storage workload traces in windows

xperf• Detailed interactive analysis of performance

traces• High-level resource usage graphs on common

trace timeline with zoom capability• Low-level discrete graphs for resource state

transitions– Individual context switch and disk I/O events

• Powerful interactive summary tables with dynamic grouping, sorting and aggregation capability

Page 12: Capturing comprehensive storage workload traces in windows

Currently Available Graphs and Summary Tables

– Disk I/O Counts– Disk I/O Detail– Disk Utilization– File I/O– DPC (Deferred

Procedure Call)– ISR (Interrupt Service

Routine)– Hardfault– Pagefault– Driver Delay

– Sample Profile– CPU Availability– CPU Scheduling– Process Lifetime– Registry counts– Services– Plug ’n’ Play– Marks– Generic– …

Overview of xperf

Page 13: Capturing comprehensive storage workload traces in windows

Storage Activity Notes• Disk Reads, Writes and Flushes

– Vista introduced low-priority I/Os, which are deferred in a special queue to allow current and near-term future normal priority I/Os to complete

– Flushes may be “completed” by low-level storage drivers under certain conditions

• Hard Faults– Synchronous I/Os that block execution of issuing thread– Paging-in from disk pages not currently present in memory

• Communication aspect: System read-ahead and write-back– Asynchronous just-in-time prefetch for sequentially read buffered files– Asynchronous buffered writes– Issued from “System (4)” process

• xperf & xperfinfo infer disk “queue” parameters (wait time, service time, queue depths, skip behavior) assuming a single serialized spindle– not a valid assumption for disk arrays– Queue depth can be thought of as “number of requests in flight”

Page 14: Capturing comprehensive storage workload traces in windows

xperf Demo

• System Config• CPU• DPC & ISR• Process• Disk Summary Table• Disk I/O Detail• Disk I/O Detail Summary• Hard Faults

Page 15: Capturing comprehensive storage workload traces in windows

Sample xperf Screenshots

• Sidebar chart selection• Selecting a time range• CPU Usage Summary Table• DPC and ISR CPU Usage Frames• Disk I/O Summary Table• Disk I/O Detail• Disk I/O Detail Summary Table

Page 16: Capturing comprehensive storage workload traces in windows

Sidebar Frame Selector


Overview of xperf

Frame Scrollbar

Page 17: Capturing comprehensive storage workload traces in windows

Selecting a Time Range (on CPU Usage Frame)

Overview of xperf

Page 18: Capturing comprehensive storage workload traces in windows

Go to CPU Usage Summary TableContext-Menu Summary Table

Overview of xperf

Page 19: Capturing comprehensive storage workload traces in windows

CPU Usage Summary Table

Status Bar Report

% of Time excluding DPC and ISR

% Total Time

Selected Time Interval

Close Summary Table

Overview of xperf

Page 20: Capturing comprehensive storage workload traces in windows

DPC and Interrupt CPU Usage Frames

Overview of xperf

Page 21: Capturing comprehensive storage workload traces in windows

Go to Disk I/O Summary Table

Summary Table

Overview of xperf

Page 22: Capturing comprehensive storage workload traces in windows

Disk I/O Summary Table


Individual I/Os

Overview of xperf

Page 23: Capturing comprehensive storage workload traces in windows

Go to Disk I/O Detail

Detail Graph

Overview of xperf

Page 24: Capturing comprehensive storage workload traces in windows

Disk I/O Detail (Disk #0)

Overview of xperf

Change Disk Select


Page 25: Capturing comprehensive storage workload traces in windows

Disk I/O Detail (Disk #1)

Overview of xperf

Page 26: Capturing comprehensive storage workload traces in windows

Disk I/O Detail (Disk #1)


Overview of xperf

Page 27: Capturing comprehensive storage workload traces in windows

Disk I/O Detail Summary Table

Default sort field

Overview of xperf

Page 28: Capturing comprehensive storage workload traces in windows


• High level control and decoding• Merging and dumping of ETW traces• Many command line actions to analyze and

report on various aspects of a trace• Various buffering and log file options• Multiple timer sources• Traces of boot activity

Page 29: Capturing comprehensive storage workload traces in windows

xperfinfo Demo

• Start/stop• Provider list• Dump• Postprocessing summaries

Page 30: Capturing comprehensive storage workload traces in windows

Taking a Kernel Trace

• Start kernel trace; run scenario; stop and merge

• Start user trace; run scenario; stop

• Hint: Retrieve all known kernel flags and groups

Overview of xperfinfo

C:\analysis> xperfinfo –on base+FILE_IO+INTERRUPTC:\analysis> MyTestApp.exeC:\analysis> xperfinfo –d trace.etl

C:\analysis> xperfinfo –help providers

C:\analysis> xperfinfo –start MySession –on Kerberos+MRxSmb –f kerberos.etlC:\analysis> MyTestApp.exeC:\analysis> xperfinfo –stop MySession

Page 31: Capturing comprehensive storage workload traces in windows

Dumping a Trace

Overview of xperfinfo

C:\analysis> xperfinfo -i trace.etl –o trace.txt[1/2] 100.0%[2/2] 100.0%C:\analysis> notepad trace.txt

Page 32: Capturing comprehensive storage workload traces in windows

xperfinfo Named Providers• DISK_IO: Disk I/O• DISK_IO_INIT: Disk I/O initiation• SPLIT_IO: Split I/O• FILE_IO: File system op end

times/results• FILE_IO_INIT:

Create/open/close/read/write• FILENAME: Create/delete/rundown• HARD_FAULTS: Hard page faults• ALL_FAULTS: All page faults including

hard, copy-on-write, demand-zero faults• DPC: Delayed Procedure Calls• INTERRUPT: Interrupts• DRIVERS: Driver events• PROC_THREAD: Create/delete• CSWITCH: Context switch• COMPACT_CSWITCH• DISPATCHER: CPU Scheduler

• PREFETCH: Prefetching• LOADER: Image load/unload• SYSCALL: System calls• PROFILE: CPU sample profile• MEMORY: Memory tracing• POOL: Memory pool tracing• VIRT_ALLOC: Virtual alloc reserve and

release• NETWORKTRACE: TCP/UDP, send/rcv• REGISTRY: Registry tracing• POWER: Power management• WORKER_THREAD: System worker

thread• PERF_COUNTER: Process perf

counters• ALPC: Advanced Local Procedure Call• …

Page 33: Capturing comprehensive storage workload traces in windows

Available xperfinfo Reports (“actions”)

• tracestats• sysconfig• dumper• diskio• filename• hardfault• pagefault• dpcisr• process• cswitch

• drvdelay• marks• perfctrs• profile• registry• boot• suspend• shutdown• …

Overview of xperfinfo

Page 34: Capturing comprehensive storage workload traces in windows

The old tools• Installed with Windows

– Logman: Collects performance counters– Tracerpt: Processes ETW log files or real-time sessions

• Installed with Driver Development Kit– – Tracelog: Starts, stops, or enables trace logging

•– Tracefmt: Dumps ETW binary files into text files

•– Traceview: Controls and displays ETW information


Overview of xperfinfo

Page 35: Capturing comprehensive storage workload traces in windows

The traces: Benchmark Workloads

• Easy to capture and make available– TPC-C, TPC-E, TPC-H, TPC-DS?– SAP-SD– Terminal Server– NetBench– SPC?– …

Page 36: Capturing comprehensive storage workload traces in windows

Example: TPC-C Trace

• Windows Server 2008 / SQL Server 2005• ~32 minutes• 93.5 million disk I/Os (58.6M read, 35.9M write)• 16-socket, dual-core 3.4 GHz Intel Xeon (16 MB

L3 cache)• 256 GB RAM• 1106 15Krpm FCSCSI disks

– 79 database LUNs

Page 37: Capturing comprehensive storage workload traces in windows

Example: TPC-C Request Sizes

• 94.8% 8KB requests; 2.1% 16KB requests• Remaining requests:

Page 38: Capturing comprehensive storage workload traces in windows

Example: TPC-C Trace Locality• ~3% of all writes are within 128 sectors of the

previous write to the same disk

Page 39: Capturing comprehensive storage workload traces in windows

Example: Terminal Server Knowledge Worker (TS-KW) Trace

• Windows Server 2008, Office 2007– Word, Excel, and Outlook activity– 180 concurrent users

• ~45 seconds– >750,000 Context Switches

• 16 thousand disk writes (in-memory working set)– Mostly sequential

• 2-socket, quad-core 2.66 GHz Intel Xeon (4 MB L2 cache)

• 32 GB RAM (holds working set)• One ATA 120GB disk

Page 40: Capturing comprehensive storage workload traces in windows

Example: TS-KW Interarrival Times

Page 41: Capturing comprehensive storage workload traces in windows

The traces: Production Workloads• First set of internal Microsoft targets:

– SQL Server– Exchange– SharePoint– File Server– Web Server– Media Server– SAP– Active Directory– Security Server– Backup– Search– Office Desktops, Laptops, Tablets

Page 42: Capturing comprehensive storage workload traces in windows

Example: SQL Server Replica for RADIUS Authentication Data for RAS & Wireless (worldwide)

• Windows Server 2003 / SQL Server 2005• Three sequential 1-hour traces

– 126.9 thousand I/Os (16.3K read, 110.5K write)– 122.6 thousand I/Os (10.8K read, 111.8K write)– 101.6 thousand I/Os (5.9K read, 95.8K write)– Locality:

• ~25% sequential requests • More than half of all read and write requests are within 100,000 sectors

of the immediately previous request (to the same disk)– Mostly 512KB reads in first trace; no 512KB reads in other

traces• 4-socket, hyperthreaded 1.9 GHz Intel Xeon• 8 GB RAM• Dual-port Gb network card• Five 4GB “disks” (configuration unknown)

Page 43: Capturing comprehensive storage workload traces in windows

xperf Demo

• Find region of 512KB reads in Trace 1– Identify file being read– Examine corresponding Disk I/O Detail

• Hard Fault Frame• Hard Fault Summary Table

– File + File Offset Disk + Disk Offset(Top to bottom disk I/O tracing)

Page 44: Capturing comprehensive storage workload traces in windows

Select Read-Heavy Region

Overview of xperf

Page 45: Capturing comprehensive storage workload traces in windows

Disk I/O Summary Table

Overview of xperf

One particular *.mdf file

Page 46: Capturing comprehensive storage workload traces in windows

Disk I/O Detail (Disk #0)

Overview of xperf

Page 47: Capturing comprehensive storage workload traces in windows

Disk I/O Detail (Disk #3)

Overview of xperf

Most of this activity is to the mdf file indicated in the Disk I/O Summary Table

Page 48: Capturing comprehensive storage workload traces in windows

Select Hard Fault Region

Overview of xperf

Page 49: Capturing comprehensive storage workload traces in windows

Hard Fault Summary Table

Overview of xperf




Page 50: Capturing comprehensive storage workload traces in windows

RADIUS SQL Server Replica, Trace 2: Disk Offset Distribution for Disk 0

Page 51: Capturing comprehensive storage workload traces in windows

RADIUS SQL Server Replica, Trace 2: Disk Offset Distributions

Page 52: Capturing comprehensive storage workload traces in windows

RADIUS SQL Server Replica, Trace 2: Interarrival Times (within each disk)

Page 53: Capturing comprehensive storage workload traces in windows

The traces: System Configurations

• From mobile devices to datacenter servers– Scale-out and scale-up environments

• 1-32 sockets• 1-64 cores• 1-1000 GB RAM• NTFS, FAT, Raw• ATA, SATA, SCSI, SAS, FC• Solid state drives

Page 54: Capturing comprehensive storage workload traces in windows

The traces: Postprocessing• Simple scripts and programs (e.g., perl and C#) will

be available to:– “Sanitize” traces by replacing some or all file,

directory, and process names with generic strings– Extract basic statistics from xperf dumps on an

overall, R/W, per-disk, or per-size basis• Request sizes• Spatial distributions• Queue lengths• Interarrival times• …

Page 55: Capturing comprehensive storage workload traces in windows

The tools and traces: Availability• xperf & xperfinfo will be released with the next Windows SDK

(in conjunction with Windows Vista SP1 and Windows Server 2008)

• Benchmark traces will be provided to the SNIA IOTTA group in Sept ’07– Scripts for sanitization and basic stats analysis included

• Production traces will be provided as they are captured and sanitized, hopefully on a monthly basis for years to come– Captures are in progress on multiple Microsoft IT servers

with varying workloads– Traces will be dumped in manageable chunks

• All tools and traces have standard Microsoft disclaimers• Microsoft would like to thank Seagate for providing disk drives

to store the internal Microsoft trace repository!!!

Page 56: Capturing comprehensive storage workload traces in windows

xperf/xperfinfo Future Enhancements

• Equivalent file block disk block event correlation for write requests (a la hard fault reads)– Mapped file writes– Lazy writer– Dirty page writer– Unbuffered writes– In the mean time, write-after-read’s and (in some

cases) sequential writes can be translated• Built-in process/file/directory sanitization• Extensibility• …and much more!

Page 57: Capturing comprehensive storage workload traces in windows

Summary• Event Tracing for Windows (ETW) = the engine

– Instrumentation built into the retail Windows operating system– The NT Kernel Provider provides coverage of kernel-level activity

• xperf = the interactive browser– High-level graphs– Summary tables– Individual event detail

• xperfinfo = the command line automation tool– ETW controller and decoder– Exports human-readable decoding of all trace events– Many custom actions distilling various aspects of the trace

• Alpha version of xperf/xperfinfo can be requested from:– [email protected][email protected]

• Additional Resources– Event Tracing for Windows on MSDN– “Windows Internals 4th edition” by Russinovich and Solomon

Page 58: Capturing comprehensive storage workload traces in windows

The challenge: Capture and Share Traces!

• Microsoft is committed to gathering long-term (weeks/months) traces on many production systems within the corporate IT environment

• Start the wheels rolling in your organization to allow similar traces to be captured, sanitized, and published; use existing tools to start with and xperf/xperfinfo when they become available

• Create and share post-processing tools, simulators, models, etc., via SNIA IOTTA repository–

• Provide feedback on xperf & xperfinfo– Be patient, as this is an engineering analysis tool (not

an MS product) and is supported as such.

Page 59: Capturing comprehensive storage workload traces in windows

Q & A

Page 60: Capturing comprehensive storage workload traces in windows

Backup Slides

Page 61: Capturing comprehensive storage workload traces in windows

What is ETW used for?

• Debug application bugs including hangs, crashes, or unexpected behavior

• Diagnose performance problems• Track computing resource consumption at

application transaction level for capacity planning


Page 62: Capturing comprehensive storage workload traces in windows

ETW vs. Performance Counters

ETW• Individual events described using

multiple standard/custom attributes

• Each event requires a timestamp• Each event requires additional


• An ETW trace can be used to compute aggregations on any group of events (filtered by time or any attributes) at post-processing time

– Various perspectives

• Can zoom down to individual events

Performance Counters• Aggregate information about groups

of events

• Each sample requires a timestamp• Each sample requires additional

space• Very light, events aggregated in


• A sampled performance counter trace provides a bottom aggregation level

– Information below the bottom aggregation level is lost

• Individual event information is lost

Top Related