mathew george sr. software engineer microsoft corporation es23
TRANSCRIPT
Optimizing Applications For Remote File Access Over WAN
Mathew GeorgeSr. Software EngineerMicrosoft Corporation
ES23
Agenda
Introduction and Motivation Understanding the problem Improvements to the platform Application guidelines
Optimizing “throughput oriented” applications Optimizing “interactive” applications General considerations
Summary
IntroductionWhy care about file access over a WAN?
Storage consolidation is moving storage away from the application Branch office servers being
consolidated to the data center Data center to data center movement of data
for disaster recovery and content distribution Cloud storage
Trends driving storage consolidation WAN bandwidth is increasing and
cost/bandwidth is decreasing Cost reductions, better resource utilization,
centralized management, and better uptime
IntroductionWhat makes an application WAN unfriendly?
WAN bandwidth is still very valuable – it costs money
Chatty applications can lead to long end user wait times when running over a WAN
Bad programming model can cause app hangs when running over a WAN Humans typically want a response
time of less than 5 sec A hung application is as bad as
a crashed application
IntroductionWhy should I change my application?
Apps often assume files are local and hence fast data and metadata access This assumption is invalid over a WAN
Improvements made to the platform (Windows Vista and Windows 7) could expose new app bottlenecks
Changes to APIs require apps to make use of them
What kind of gains can I expect? 2-10x for throughput oriented applications 30% traffic reduction and response
time improvement for interactive applications
Understanding The ProblemUnderstanding the network parameters
Bandwidth and Round trip Latency (RTT) Bandwidth delay product (BDP)
Example: BDP of 100 ms, 100 Mbps link is 1.25 MB Network utilization
Max theoretical utilization is the amount of outstanding data divided by the BDP
Example: App posting 64KB data gets ~ 5% utilization
North America
Europe Asia South America
Mean 14 33 26 5.5Std dev 28 43 28 7.2Min 1.3 0.9 3.5 0.5Max 155 155 92 25
Continental Bandwidth (Mb/s)
NorthAmerica
Europe Asia
Mean 108 109 163Std dev 69 91 81Min 20 5 20Max 375 372 460
Continental Latencies (ms)
Connectivity from branch offices to nearest data center, Fall 2007
Understanding The ProblemThe SMB file I/O stack
SMB (CIFS) and SMB2 – our core file sharing protocols SMB/CIFS had limitations
running on high BDP networks The SMB2 protocol introduced
with Windows Vista is optimized for high BDP networks
Clients have the ability to cache file data and metadata Cache manager and the SMB
redirector manages the caching The Win32 API provides
the application interface Synchronous versus
overlapped I/O Cached (Buffered)
versus non-cached Handle versus path based APIs I/O cancellation APIs
Application
Client Cache
SMB Redirector
Network
SMB Server
Server Cache
Disk
Filesystem
Win32/NT File I/O APIs
SMB Client
Network Stack
Network Stack
File Server
Understanding The ProblemCopying a large file
Copy a 20 MB file from local disk to a remote server over a 1Gbps, 100 ms link (Windows Server 2003)
Observations Operation takes 38 sec Link BDP = 1 Gbps * 100 ms = 100 Mb ( ~ 12 MB ) Throughput is 20 MB/38 sec = 4.2 Mbps Network utilization is 4.2 Mbps/ 1 Gbps = 0.42%
Analysis The CopyFile API posts
a single 64K buffer at a time Max theoretical utilization
is 64KB/12MB = 0.53% Observed utilization is lesser
because of other overheads
Understanding The ProblemOpening a Word document
Open a 300KB Office 2003 document across a simulated 100 Mbps, 100 ms link
Observations About 23 seconds to open the file Approximately 1200 SMB frames seen on
the wire including traffic in both directions Same file opened and closed repeatedly Data read multiple times from the server Significant metadata traffic
Analysis Bulk of the data transfer is caused due to SMB losing
the ability to cache data due to multiple opens Windows Explorer and Office 2003 interfering
with each other by doing I/O on the same file
Enabling High Throughput ApplicationsPlatform Improvements
SMB2 Redirector and Server Protocol supports larger
buffer sizes for data and metadata operations
Dynamic scaling of the number of outstanding operations based on network BDP
Support for deeper I/O pipelines
Automatic pipelining of large I/O requests
Network (TCP/IP) stack Larger TCP window sizes High BDP optimizations Windows Vista and
later OS releases Cache manager
Larger I/O sizes
Application
Client Cache
SMB Redirector
Network
SMB Server
Server Cache
Disk
Filesystem
Win32/NT File I/O APIs
SMB Client
Network Stack
Network Stack
File Server
Enabling High Throughput ApplicationsPlatform Improvements
Significant optimizations to the CopyFile API. Windows Server 2008 and later OS releases have these optimizations Uses 1 MB I/O requests (as opposed to 64 KB) Issues up to 8 outstanding I/O operations (as opposed to 1 at a time) All inbox file copy tools - copy, xcopy, robocopy,
and Windows Explorer see gains
Robocopy throughput comparison between Windows Server 2003 and Windows Server 2008 transferring a 4.5 GB file over a 1 Gbps WAN link.
Pull = Copy from server to local diskPush = Copy from local disk to server
Enabling High Throughput Applications Application Guidelines
Use overlapped I/O instead of synchronous Use the FILE_FLAG_OVERLAPPED option to
CreateFile ReadFile, WriteFile and DeviceIoControl
APIs Wait for completion (GetOverlappedResult) Completion callback (“Ex” versions of the API) Use completion ports for even better throughput
Issue sufficient I/O to fill the network BDP Limit I/Os based on end to end response time and resources
Works best when buffering is turned off Helps SMB1, SMB2 as well as local I/O
The effect of pipelining on network utilization
Idle Utilized
Non-pipelined Pipelined
Enabling High Throughput ApplicationsApplication Guidelines
Large I/O sizes allow the OS to process the request more efficiently Fewer passes through the I/O stack OS can segment the request into optimal sized
chunks and pipeline each individual chunk Due to limitations in the SMB1 stack, use a 60K chunk
when reading data and a 64K chunk while writing With SMB2 (or for local files), I/O sizes of around 1 MB
works well Very large I/O sizes (> 16 MB) can result in
memory fragmentation and resource shortages Use the CopyFile API for large data transfers
When dealing with lots of small files, use multiple threads to issue parallel CopyFile calls
For Windows 7, we are adding a multithreading option to the robocopy tool
Enabling High Throughput ApplicationsApplication guidelines
Take advantage of the cache manager by doing buffered I/O Useful in scenarios where the app
cannot do asynchronous or large I/Os Can hide the delays caused by slow disks Provide hints when opening the file
FILE_FLAG_RANDOM_ACCESS FILE_FLAG_SEQUENTIAL_SCAN
Minimize extending writes Set the file size first before writing data
Be cautious Opening files with FILE_FLAG_WRITE_THROUGH option. Making frequent calls to FlushFileBuffers
Developing Responsive Applications Understanding the Windows I/O model Handle based I/O
A handle is obtained by opening a file (via the CreateFile API)
All I/O operations are performed on the handle (Example: ReadFile, WriteFile, LockFile, GetFileInformationByHandle, ReadDirectoryChanges)
The handle is closed after use Path based APIs
A sequence of 2 or more handle based primitivesGetFileAttributes Open + QueryAttributes + Close
Similarly, SetFileAttributes, DeleteFile, MoveFile involve multiple I/O operations
Developing Responsive Applications Understanding caching in the SMB context Data caching
Keeping a copy of previously read data Holding onto data written by the application
and “lazily” flushing the data to the server Win32 file I/O is buffered (cached) by default,
except if opened with FILE_FLAG_NO_BUFFERING or FILE_FLAG_WRITE_THROUGH options.
Metadata caching File attributes, directory listings can be cached
Handle caching SMB client holds handle open after application
has closed the file.
Developing Responsive Applications Understanding caching in the SMB context Maintaining cache coherency
Multiple clients accessing the same data SMB uses “opportunistic locks” (Oplocks) Completely hidden from the application
Oplocks tell the client what it can cache Granted by the server when a file is opened BATCH oplock allows the SMB client to cache reads,
writes and the handle (exclusive) SMB client can cache data even after the app closes the file
Level II oplock allows the client to cache reads (shared) SMB client cannot cache data after the app closes the file
Oplocks can be revoked by the server Client loses the ability to cache
Developing Responsive ApplicationsData caching lost by opening multiple handlesCreateFile( GENERIC_READ | GENERIC_WRITE )
Granted batch
oplockReadFile
CloseHandle
Client Server
WriteFile
Break Oplock
Flush cached data
CreateFile( GENERIC_READ )
WriteFile
No more caching !
Create completes
Create completes
ReadFile completes.Data is cached.
Close is not sent out on wire.CreateFile( GENERIC_READ | GENERIC_WRITE )
Cached handle is re-used.Data is written to cache.
Cache is destroyed.
Developing Responsive ApplicationsSMB2 leasing in Windows 7 Enhancement to the SMB2 protocol in
Windows 7 to support better caching semantics Better support existing applications Layered applications are hard to change Mitigate cross application interference
Allows full caching when multiple handles are opened by the same “client”
A new lease level which allows multiple clients to cache reads as well as handles Multiple clients can hold on to cached data
after app closes handle
CreateFile( GENERIC_READ | GENERIC_WRITE )
CreateFile( GENERIC_READ | GENERIC_WRITE )
Developing Responsive ApplicationsSMB2 leasing in action
Granted lease
ReadFile
CloseHandle
Client Server
CreateFile( GENERIC_READ )
WriteFile
Create completes
Create completes
ReadFile completes.Data is cached.
Close is not sent out on wire.Cached handle is re-used.Data is written to cache.
Data is written to cache.WriteFile
Developing Responsive ApplicationsMore Windows 7 caching enhancements Transparent cache
A secondary on-disk cache to augment the client’s in-memory cache
Uses the Windows offline files infrastructure. Selectively enabled based on network latency
and throughput. BranchCache
A peer cache which works in conjunction with the “offline files” cache.
Uses hashes generated by the server to fetch data from peers.
Developing Responsive ApplicationsWindows 7 BranchCache in action
Windows 7Server
High latencyLow-bandwidth WAN link
Client 1 Client 2
Windows 7 Clients
First access to a file on the
server pulls down the file over the slow
WAN link(WAN access)
Second access to the same file from
another user in the branch is satisfied
from the peer (local subnet
access)
Subsequent access from the same client is satisfied from the transparent cache (local machine access)
Developing Responsive ApplicationsApplication guidelines for effective caching Avoid multiple open handles to the same
file at the same time Use the handle based APIs if possible With SMB2 leasing, opening multiple
handles on Windows 7 is no longer a problem Make use of SMB “handle collapsing”.
Identical handles to the same file can be “collapsed” (same access mode, share mode and create options)
Particularly useful for SMB1 because a collapsed open implies that oplocks are not broken.
Developing Responsive ApplicationsApplication guidelines for effective caching Provide hints to the cache manager and
the SMB redirector when opening the file FILE_FLAG_SEQUENTIAL_SCAN tells the
cache manager to cache data just ahead of where the application is reading
Incorrect hints can result in poor caching behavior.
Other caveats Write-only opens are not cached Byte range locks cause loss of all caching
if there are multiple handles open
Developing Responsive ApplicationsPlatform Support for Metadata Caching Metadata queries have significant cost
Each query may take up to 3 round trips Around 40% of SMB roundtrips are for file metadata
Windows SMB clients can cache file metadata Metadata caching is best effort and there
are very limited consistency guarantees Metadata caches expire after a fixed time SMB1 client caches only file attributes,
timestamps, and file sizes by default SMB2 client caches directory enumeration in addition
Developing Responsive ApplicationsApplication guidelines for metadata access Maximize use of the metadata cache
GetFileAttributes, GetFileSize, GetFileTime are cached
Directory enumeration via FindFirstFile/FindNextFile are cached for SMB2 only
Only the FileBasicInfo, FileStandardInfo and FileNameInfo classes supported by the GetFileInformationByHandleEx API are cached
Avoid repeated queries for non-cached metadata by caching at the application level
Use the GetFileInformationByHandle(Ex) API Use large buffers for variable length queries
Security descriptors, stream enumeration Use the GetFileInformationByHandleEx API
to enumerate directories (SMB2 on Windows 7 only!)
Developing Responsive ApplicationsGeneral considerations Support I/O cancellation
Starting with Windows Vista, creates can be cancelled via the CancelSynchronousIo API CreateFile calls can sometimes incur connection
establishment and authentication delays Majority of app hangs involve a code
path trying to open the file Use overlapped I/O whenever possible
Does not block Can be selectively cancelled via CancelIoEx API
Don’t do blocking network I/O on your main application thread
Don’t pipeline too much data
Summary
For throughput oriented applications Fill the network BDP using asynchronous I/O
and large I/O chunks. Use the CopyFile API when applicable. Use multithreading when operating on large
number of small files. For interactive apps
Use the handle based APIs. Help the system cache data effectively by
watching your open patterns. Watch out for metadata queries. Support cancellation
Process monitor Effectively track I/O issued by the application. Monitor file, registry, thread and process
activity. Available at http://technet.microsoft.com/en-
us/sysinternals/bb896645.aspx Netmon 3
A network sniffer to monitor traffic Parsers are available for both SMB and SMB2
protocols. Available at http://blogs.technet.com/netmon/
Performance Monitoring Tools
Conclusion
Be aware that your application will be used over a slow network even though you didn’t design for it
We are constantly improving the platform The guidelines presented here are
applicable to other WAN scenarios also Use the APIs provided by
the system to your advantage Understanding how the system works
can help us write well behaved apps
Evals & Recordings
Please fill
out your
evaluation for
this session at:
This session will be available as a recording at:
www.microsoftpdc.com
Please use the microphones provided
Q&A
© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market
conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.