exchange deployment planning services exchange 2010 storage
TRANSCRIPT
Exchange Deployment Planning Services
Exchange 2010 Storage
Exchange 2010 StorageThe Exchange 2010 Storage has the following goals:
Exchange storage InnovationsDisk storage technology 2010+Microsoft Exchange 2010 storage architecture
Store innovationsESE database innovations
Exchange Server 2010 storage designSummary
Exchange 2010 Storage
During this session focus on the following : How will we leverage the new storage
improvement in our organization? What compliance requirements do we
have around our messaging solution?
Ideal audience for this workshop Messaging SME Network SME Security SME
Exchange 2010 Storage
Agenda
Exchange storage InnovationsDisk storage technology 2010+Microsoft Exchange 2010 storage architecture
Store innovationsESE database innovations
Exchange 2010 storage designSummary
Exchange Server 2007Storage Innovations
• Reduce storage I/O (70%)• Use large amounts of memory (64-bit)• Increased page size (4 KB -> 8 KB)• Lower storage costs• Large mailbox support (> 1 GB)• Fast search/indexing• Continuous replication (log shipping)• High Availability + fast recovery
Disk Storage Technology2010 and Beyond
• Disk Capacity trend predicted to continue− 2 TB desktop-class SATA disks available− 3 - 4 TB soon− 1 and 2 TB Nearline/Midline SAS disk available
• Sequential I/O throughput increasing linearly based on areal density (2010 SATA = ~250 MB/sec)
• Random I/O performance not expected to improve substantially (15K RPM is the ceiling)
• SSD/Flash− High $/GB, low $/IO− Write performance improving− Reliability mostly addressed with wear leveling
Random vs. Sequential Disk IO• Random I/O
− Disk head has to move to process subsequent IO
− Head movement = High IO latency
− Seek Latency limits IOPS
• Sequential I/O− Disk head does not
move to process subsequent IO
− Stationary Head = Low IO latency
− Disk RPM speed limits IOPS
7.2K RPM SATA Disk (20 ms latency)Random = 50 IOPSSequential = +300 IOPS!
Disk Head
Exchange 2010 Storage Vision
I/O ReductionSequential
I/OLarge, Fast,
Low-cost Mailboxes
SATA/Tier 2 Disk
Optimization
Storage Design
Flexibility
RAID-less Storage (JBOD)
Store Schema ChangesIOPS Reductions
• Store Schema: the way the Store organizes data in the ESE Database
• One simple theme: move away from doing many, random, small size, disk I/Os to doing fewer, sequential, large size, disk I/Os
Store Schema ChangesIOPS Reductions
• Significant benefits, including fast/efficient…− OWA/Outlook Online Mode
− …end user viewing for “cold” states/first time view creation
− …Calendar Operations− …Search performance
− Outlook Cached Mode/Exchange ActiveSync®− OST sync = sequential IO− EAS sync = sequential IO
− Server Management− …Move mailbox− …Content Index Crawls
I/O ReductionsStore Table Architecture
Exchange Server 2007
Message/Folder Table (MFT)
Joe:Inbox:H1
Joe:Inbox:H2
Joe:Inbox:H3
Per Database Per Folder
Mailbox Table
Jeff’s Mbx
Ann’s Mbx
Joe’s Mbx
Attachments Table
Jeff:Excel.xls
Ann:Pic.bmp
Joe:Help.doc
Message Table (Msg)
Joe:Msg10
Jeff:Msg32
Ann:Msg180
Folders Table
Jeff:Inbox
Ann:Drafts
Joe:Unread
Exchange 2010
View Tables (e.g. From)
Joe:H920
Joe:H302
Joe:H10
Secondary Indexes used for Views
Per Mailbox
Mailbox Table
Jeff’s Mbx
Ann’s Mbx
Joe’s Mbx
Message Header Table
Joe:H10
Joe:H302
Joe:H920
Folders Table
Joe:Inbox
Joe:Drafts
Joe:Unread
Per Database
New store schema = no more single instance storage within a database
Per View
Body Table
Joe:Msg10
Joe:Help.doc
Joe:Msg302
Store Schema ChangesPhysical Contiguity
1078
B+ Tree
924577
6 8727210
3278
219346
1078
B+ Tree
1079
1080
1081
1082
1083
3456
3457
3458
Exchange Server
2007
Exchange 2010
Many, small size, I/Os (1 per 8K page)
Fewer, larger size, sequential I/Os
DB pages (page numbers)
Exchange Server 2007
Exchange 2010
Many, random, I/Os (1 per update)
Fewer, sequential, I/Os (1 per view)
TimeM1 arrives M2 arrives M1 flagged M3 arrives M2 deleted
User uses OWA/Outlook Online and switches to this view
Nickel & Dime Approach
Pay to Play Approach
DB I/O
Store Schema ChangesLazy View Updates
All Unread or Flagged items (view)
M1 M2 M1 M3 M2
All Unread or Flagged items (view)
M1 M2 M1 M3 M2
IOPS ReductionsESE Changes• Optimize for new Store Schema
− Space Hints (allocate database space in contiguous manner)
− Re-wrote how database maintenance works (maintain database contiguity)
− Utilize space efficiently (Database compression)
• Increase I/O Sizes− DB page size increased from 8 KB to 32 KB− Improved read/write I/O coalescing (Gap coalescing)− Provide improved async read capability (Pre-read)
• Increase Cache Effectiveness− 100 MB Checkpoint Depth (HA configurations only) − DB Cache Compression (Dehydration)− DB Cache Priority (Fast Evict)
I/O ReductionsAllocate Space Based On Contiguity
• Database table space allocation hints− Allocate DB space based on either data compactness or
data contiguity (based on usage pattern)
Page 1
Used Page
Page 3
Used Page
Disk
DB Cache
Page X
Msg Header
Page Y
Msg Header
Page Z
Event History
Contiguity
Space Contiguity
Space Compactness Page
4
Msg Header
Page 5
Msg Header
Page 2
Event History
Sequential/BloatRandom/Compact
I/O ReductionsMaintain Contiguity Over Time
ESE Function Exchange 2007 SP1 Exchange 2010
Cleanup (deleted items/mailboxes)
Cleanup performed during OLD which occurs during OLM time window
ESE performs cleanup at run time (when store hard delete occurs). Happens during Store dumpster cleanup (OLM), pages are zeroed by default.
Space Compaction
Database is compacted and space reclaimed during OLD
Database is compacted and space reclaimed at run-time by OLD2. Auto-throttled.
Maintain Contiguity
N/A: Contiguity is compromised by space compaction
Database is analyzed for contiguity and space at run time and is defragmented in the background (B+Tree Defrag/OLD2). Auto-throttled.
Database Checksum
When configured, ½ of OLD maintenance window reserved for sequential scan (Checksum), manual throttle. Active DB copy only.
Two options (both Active and Passive copies):1. Run DB Checksum in the
background 24x7 (default). Sequential I/O
2. Run DB Checksum during OLM window. Sequential I/O
I/O Reduction: Database Contiguity ResultsExchange Server 2007 Message Header Table (aka MFT)
Exchange 2010 Message Header Table (aka MsgHeader)
Blue = contiguous (good)Red = fragmented (bad)
*Production/Dogfood database analysis
Random deletes at the tail
FRAGMENTED
CONTIGUOUS
DB Page Numbers
Database CompressionMitigate DB Space Growth
• Store Schema change, Space Hints, B+Tree Defrag & 32KB page size combine to increase DB file size by 20%.
• Growth is 100% mitigated by Database Compression− 7bit/XPRESS (based on LZ77) Compression for message
headers and text/html bodies (Long Values)
E2007/RTF E2010/RTF E2010/Mix E2010/HTML
0.000.200.400.600.801.001.201.40
1.001.20
1.000.88
1 Database, 750 x 250MB mailboxes, RTF = RTF Compressed, Mix = 77% HTML, 15% RTF, 8% Text, Avg. Message size = ~50KB
DB File Size Comparison
DB Page Size Increased to 32 KBIOPS Reductions
Page 1
Msg Header
Page 2
X
Page 3
Msg Body
Disk
Page 4
X
Page 5
MsgBody
DBCache
Page 1
Msg Header
Page 3
Msg Body
Page 5
MsgBody
3 Read IOs
Page 1 (32 KB)
Msg Header, Msg Body
Disk
DBCache
1 Read IO
Ex2007 DB Read20 KB Message
Ex2010 DB Read20 KB Message
8 KB Pages
32 KB Pages
Page 2 (32 KB)
X
Page 1 (32 KB)
Msg Header, Msg Body
IO Gap CoalescingIOPS Reduction: Read Case
Page 1
Msg Header
Page 2
X
Page 3
Msg Body
Disk
Page 4
X
Page 5
Msg Body
Ex2007 DBRead Behavior
Ex2010 DBRead Behavior
DBCache
Page 1
Msg Header
Page 3
Msg Body
Page 5
Msg Body
3 Read IOs
Page 1
Msg Header
Page 2
X
Page 3
Msg Body
Disk
Page 4
X
Page 5
Msg Body
DBCache
Page 1
Msg Header
Page 3
Msg Body
Page 5
Msg Body
Page 2
Temp Buffer
Page 4
TempBuffer
1 Read IO
DB Cache CompressionIOPS Reductions
Problem: New Store Schema + 32 KB pages can reduce efficiency of cache (e.g., a page with 8 KB of data consumes 32 KB of memory in the DB Cache)Solution: Implement DB Cache Compression to shrink partially used cached pages in memory; allowing more Effective cache.
Page 1 (32 KB)
8 KB
Disk
DBCache
Page 1 (32 KB)
8 KB
1. 32 KB Page with only 8 KB of data is read off disk
2. 32 KB page is compressed to an 8 KB in-memory image
Up to 30% more cache/mailbox serverMore Cache = Less DB I/O!
Page 1 (8 KB)
8 KB
100 MB Checkpoint Depth (Active Copies)IOPS Reductions• Checkpoint Depth: amount of data waiting to be committed
to the database file (edb)• Default Checkpoint Depth Max increased from 20 MB to 100
MB for active mailbox databases configured for HA (non-HA is 20 MB, passive is 5 MB)
• Deep Checkpoint Benefit = Efficient DB writes (~40% reduction)
• Deep Checkpoint Risks: long store shutdown times, long crash recovery times− Risk Mitigation: shutdown databases in parallel, failover on store crash
Loadgen Test: 3000 Mailbox, 12 DB, Outlook 2007 Online Very Heavy Profile
20 40 60 80 100 2000
20
40
60
80
100
120
Database Pages Repeatedly Written/sec
DB Writes/sec (avg)
Checkpoint Depth (MB)
100MB Checkpoint Depth = 40% DB write IO reduction
IOPS Reduction: DB Cache Priority• Problem: Background and other operations
can pollute the cache (e.g., checksumming, OLD2, HA log replay)Solution: Implement DB Cache Priority to allow lower cache priorities for background/replay operations
Now Past Future
DB Cache Time
Outlook Message Read
HA Log Replay (Passive)
DB Maintenance
Cache Eviction
Cache Entry
ESE Caching Algorithm = LRU-K (Least Recently Used)
Ex2007 vs. Ex2010 IOPS Reduction Results
E2007 E20100
50
100
150
200
250
300
350
400
450
500
DB IOPS Comparison
DB Read IO/SecDB Write IO/SecDB IO/Sec
+70% Reduction!
3000 Mailboxes, 3MB DB Cache/user, Loadgen Outlook 2007 Online Very Heavy Profile, 250MB Mailbox Size
Exchange IOPS Trend
Exchange 2003
Exchange 2007
Exchange 2010
0
0.2
0.4
0.6
0.8
1
DB IOPS/Mailbox
IOPS/Mailbox
+ 90% Reduction!
Optimize for SATASmooth Database Write I/O
Exchange Server 2010 Baseline
Exchange Server 2010 Smooth DB
IO
0
10
20
30
40
5049
34
3.7 0.700000000000001
10.15.1
Exchange 2010 Smooth DB I/O Benefit
DB Read Latency (ms)
Log Write Latency (ms)
RPC Average Latency
50% Reduction!
3000 Mailboxes, 3MB DB Cache/user, 12 x 7.2k SATA disks (DB/Logs on same spindles), Loadgen Outlook 2007 Online Very Heavy Profile
JBOD/RAID-less Storage: Now An Option• JBOD : 1 disk = 1 database (with logs)• Requires Exchange 2010 High Availability (3+ DB Copies)• Annual Disk Failure Rate (AFR) = 5%
JBOD AdvantagesReducing Storage Costs/Complexity
Eliminates unnecessary DB copies: server and storage redundancy can be symmetrical
Reduces disk I/O: eliminates RAID write penalty
Enables simple storage design: 1 disk = 1 database (with logs)
Enables simple storage failure recovery
JBOD ChallengesExchange HA/storage must replace RAID functionality
Disk striping performance (e.g. RAID10) cannot be leveraged
Disk failure = database failover (~30 second outage)
Re-enabling resiliency = spare disk assignment/partitioning/format/DB re-seed (scriptable)
Soft disk errors (bad blocks) must be detected and repaired
JBOD/RAID-less Storage: Exchange 2010 Optimizations
• Optimize HA failovers/switchovers
• ESE tuned to leverage DB cache between passive->active transitions (cache warming)Active/passive copy background scan (checksum)Active/passive copy lost write detection
Utilize DB passive copy for seeding sourceAvoid re-seed by using single page restore (active and passive)
Improve storage failure detection (bad blocks/corruption)
Improve database seeding/repair
Improve HA storage failure detection and failover
HA now detects storage failures and automatically fails over (~30 seconds)
Mailbox Server Node
1
Mailbox Server Node
2
Database Availability Group (DAG)
Page1
Page2
Page3
Mailbox Server Node
3
1. Page corruption detected on Active Copy (e.g., -1018)
2. Active DB places marker in log stream to notify passive copies to ship up to date page
3. Passive receives log and replays up to marker, retrieves good page, invokes Replay Service callback and ships page
4. Active receives good page, writes page to log, DB page is patched
DB1-Active
Database
Log
Page1
Page2
Page3
DB1-CopyA
Database
Log
Page1
Page2
Page3
DB1-CopyB
Database
Log
5. Subsequent page repair from additional copies ignored
JBOD/RAID-less StorageSingle Page Restore (Active)
Mailbox Server Node
1
Mailbox Server Node
2
DAG
Page1
Page2
Page3
Mailbox Server Node
3
1. Page corruption detected on DB Passive Copy (e.g., -1018)
2. Passive copy pauses log replay (log copying continues)
3. Passive retrieves the corrupted page # from the active using DB seeding infrastructure
4. Passive copy waits till log file which meets max required generation requirement is copied/inspected, then patches page
DB1-Active
Database
Log
Page1
Page2
Page33
DB1-CopyA
Database
Log
Page1
Page2
Page3
DB1-CopyB
Database
Log
5. Passive resumes log replay
JBOD/RAID-less StorageSingle Page Restore (Passive)
Exchange 2010 Storage Design Flexibility
• Exchange Online archive provides mailbox storage flexibility− One mailbox per user or two
• Exchange 2010 optimized for DAS storage but SAN storage is supported− IOPS reductions/SATA optimizations enable lower performing storage− Exchange 2010 HA architected for DAS (simpler)
• JBOD* and RAID storage support• Exchange 2010 optimized for Tier 2 (SATA) disks but
Enterprise disks are supported• SSD/Flash storage supported but not recommended for
mainstream due to high $/GB• Max 100 databases/server, storage groups are gone• Max recommended DB Size = 2 TB**• Max recommended folder Item Count = 100 K***
* 3+ copies** 2+ copy High Availability only
*** Assuming no 3rd party applications (OWA/Outlook Online)
Summary
• Exchange 2010 store has…− Reduced DB IOPS by +70%...again!− Optimized for large mailboxes (+10 GB)
and 100K item counts− Optimized for large/slow/low-cost disks
(SATA/Tier2)− Made JBOD/RAID-less storage a viable
option− Enables unmatched storage flexibility to
push storage Capex costs down
End of Exchange 2010 Storage Module
For More Information
• Exchange Server Tech Centerhttp://technet.microsoft.com/en-us/exchange/default.aspx
• Planning serviceshttp://technet.microsoft.com/en-us/library/cc261834.aspx
• Microsoft IT Showcase Webcasts http://www.microsoft.com/howmicrosoftdoesitwebcasts
• Microsoft TechNet http://www.microsoft.com/technet/itshowcase
© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after
the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.