dovecot m ail storage

27
Dovecot Mail Storage Timo Sirainen

Upload: niles

Post on 23-Feb-2016

131 views

Category:

Documents


1 download

DESCRIPTION

Dovecot M ail Storage. Timo Sirainen. Me: Timo Sirainen. Born 1979 in Finland First C64 BASIC programs around 1988 Open source coding since about 1998 Irssi IRC client 1999-2004, still widely used Worked as programmer since 1999 Went to university in 2006 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Dovecot  M ail Storage

Dovecot Mail Storage

Timo Sirainen

Page 2: Dovecot  M ail Storage

Me: Timo Sirainen

• Born 1979 in Finland• First C64 BASIC programs around 1988• Open source coding since about 1998– Irssi IRC client 1999-2004, still widely used

• Worked as programmer since 1999• Went to university in 2006• Dovecot project started in 2002– Working full time on it since about 2007– 2009: Rackspace, USA– 2010: SAPO, Portugal

Page 3: Dovecot  M ail Storage

Dovecot

• Open source IMAP/POP3 server– Only mail retrieval to clients, no mail sending

• First version released in 2002• Mostly written by me– Except Sieve by Stephan Bosch

• High performance is an important goal– Disk I/O is typical bottleneck -> everything

optimized to reduce it

Page 4: Dovecot  M ail Storage

Talk Overview

• Traditional mailbox formats• Dovecot indexes• Dovecot mailbox formats• Full text search indexes• Future ideas

Page 5: Dovecot  M ail Storage

mbox

• One file per mailbox• Metadata in headers that are filtered out– X-UID, Status, X-Status, X-Keywords, etc.

• Deleting requires moving data around– Fragile: corruption if crashes in the middle– Slow when deleting old messages

• May become fragmented with constant appends• But non-fragmented file is fast to read

Page 6: Dovecot  M ail Storage

Maildir

• One file per message– Reading through all files can be slow

• Message flags in filename (name:2,<flags>)– Lots of renaming– Finding the current filename can be difficult

• Maildir is lockless? Not so much, Dovecot uses write/sync lock– Otherwise files can temporarily be lost during renames

• Was the file really deleted or just renamed?

Page 7: Dovecot  M ail Storage

Dovecot Index Files

• Main index– List of messages– Message flags– Offsets to cache records

• Cache file– Message size, some headers, etc.– Keep only data that client actually uses• Different clients want different data for different

amount of time

Page 8: Dovecot  M ail Storage

Dovecot Main Index• In two files:

– dovecot.index: Somewhat recent snapshot– dovecot.index.log: Recent changes

• All changes go through the log• Readers read snapshot to memory and apply latest changes from

log– Once opened, only need to read log updates

• Very efficient with remote filesystems (NFS, cluster FSes)!

• Snapshot is updated “once in a while”– Tries to minimize disk I/O– Writes are usually more expensive than reads

• Log also useful for finding “what changed” events for IMAP clients

Page 9: Dovecot  M ail Storage

Dovecot Cache

• The main reason for Dovecot’s good performance• Different IMAP clients want different data

– Caching data that client doesn’t use wastes disk space and disk I/O

• Flexible format, allows adding any number of fields– Per-field caching decisions: “no”, “temporary”, “permanent”

• Cached fields never change (IMAP guarantees)– Data is added without locking -> duplicate data is possible

• Once in a while the file is recreated -> deleted and unwanted records are dropped

Page 10: Dovecot  M ail Storage

Locking

• Lock waits are bad– Higher user visible latency– Timeout failures during high load

• Dovecot v0.99 used traditional read/write index locks– Locking timeout problems– Redesigned v1.0 to do lockless reads

Page 11: Dovecot  M ail Storage

Lockless reads: rename()• For:

– Small files– Rarely changing files– If a large part of the file changes

• Writer– Lock– If file has changed, read+update internal state– Write the updated data to temp file– rename() over the original file– Unlock

• Reader– Just read the file.

#1

#2

Temp file rename()

Page 12: Dovecot  M ail Storage

Lockless reads: Appends• For append-only files with “size” header in each written

record• Writer

– Lock– Write data with size=0– Write size with each byte’s highest bit set to 1– Unlock

• Reader– Read one record at a time– Stop when seeing a size that isn’t fully written

DataSize

Bits Content

0-6 Bits 0-6 of size

7 Always 1

8-14 Bits 7-13 of size

15 Always 1

etc.

Page 13: Dovecot  M ail Storage

Lockless writes in future?

• open(path, O_APPEND) usually provides atomic writes– Except with NFS– write() may also return less bytes than intended?

(signal, out of space)– read() during a write may see incomplete data?

Page 14: Dovecot  M ail Storage

Single-dbox

• One file per message (u.<IMAP UID>)• Files have immutable metadata section– GUID, POP3 UIDL, received date, etc.

• Advantages over Maildir:– Filenames don’t change– No IMAP UID <-> filename mapping required

• Flags stored only in Dovecot index files– Automatically creates dovecot.index.backup once in a while– When fixing corruption, tries very hard to preserve flags

based on (corrupted) index and backup files

Page 15: Dovecot  M ail Storage

Multi-dbox• Multiple messages in a single file (m.<id>)

– File format same as with single-dbox• Multiple files in a single mailbox

– Files are about 2 MB (configurable)• Larger files -> less fragmentation, but deletion slower• Preallocation

– Can be rotated every n days (for incremental backups)– Delayed (ioniced) nightly deletions (“doveadm purge”)

• Crash or power loss can’t corrupt or lose data• Tries very hard to preserve as much data as possible in case of

(filesystem) corruption.– Saves a backup of the original broken file

Page 16: Dovecot  M ail Storage

Benchmarks

• Realistic IMAP benchmarks are difficult to do• Depends on clients and user behavior

Page 17: Dovecot  M ail Storage

Benchmarks

• Reading 10k messages via IMAPSSD, OSX, HFS+ Uncached Cached

mbox 2.9 s 1.6 s

Maildir 3.9 s 0.6 s

Single-dbox 3.9 s 0.6 s

Multi-dbox 1.5 s 0.4 s

HDD, Linux, ext4 Uncached Cached

mbox 2.8 s 2.3 s

Maildir 8.0 s 0.9 s

Single-dbox 6.8 s 0.9 s

Multi-dbox 1.6 s 0.7 s

Page 18: Dovecot  M ail Storage

Benchmarks: # NFS ops

• Reading 10k messages via IMAP• Above: uncached, below: cached

mdbox

sdbox

Maildir

mbox

0 5000 10000 15000 20000 25000 30000 35000

Reads

Lookup

Access

Getattr

Page 19: Dovecot  M ail Storage

Benchmarks: # NFS opsimaptest logout=5 msgs=1000 delete=10 expunge=10 secs=60 seed=1Random IMAP commands sent with:

L+A+G = lookup + access + getattr

mbox

Maildir

sdbox

mdbox

ReadWriteReaddirL+A+GOther

Page 20: Dovecot  M ail Storage

New dbox-only Features

Page 21: Dovecot  M ail Storage

Alternative Mail Storage

• Users rarely access their old mails• Lower performance storage is cheaper -> Move old mails there• dbox supports “alternative path” setting: If u.* or

m.* file isn’t found from primary path, it’s looked up from alternative path – Files could even be moved with /bin/mv

• But easier/safer with “doveadm altmove”– This would be difficult with Maildir because its filenames

change

Page 22: Dovecot  M ail Storage

Detached Mail Attachments• MIME parts can be saved to external files

– Only if they’re large enough (default: 128 kB)– Also can be filtered based on Content-Type, etc. headers

• Avoid extra disk seek for downloading attachments that clients automatically display inline

• Supports saving base64 encoded MIME parts decoded (25% less disk space)– Only if re-encoding can be done to 100% original

• dbox-only– Metadata contains pointers to external parts

• Saving is done via simplified “filesystem API”

Page 23: Dovecot  M ail Storage

Single Instance Storage

• Storage’s internal deduplication– Could be enabled only for attachment storage

• Dovecot’s SIS– FS API backend– Based on file hashes and hard links

• Hash is configurable (e.g. SHA256 + size)– Byte-by-byte verification after hash found

a) Never, trust hash uniqueness (not implemented)b) Immediate comparison during savingc) Delayed (nightly) comparison and deduplication

Page 24: Dovecot  M ail Storage

Dovecot SIS• Attachments saved to “HA/SH/HASH-GUID” under global

attachment dir (e.g. /var/attachments/)– GUID guarantees filename uniqueness– e.g. file with hash “123456” is saved to 12/34/123456-GUID– “HA” and “SH” may be symlinks to other mounts

• SIS is done by hard linking HA/SH/hashes/HASH to HA/SH/HASH-GUID if it exists.– Basically: “ln hashes/123456 123456-guid”– No attempts to create cross-mount hard links

• Safe to move/backup/restore attachment files– But hashes/HASH is auto-deleted only when its link count drops from

2 to 1. External changes may leak it.

Page 25: Dovecot  M ail Storage

Full Text Search Indexes

• Dovecot has abstract FTS API• IMAP protocol says search is about “substring

matching” (e.g. “ello” matches “hello”)– Almost no FTS engines support this– Few people seem to care about this anymore

• Currently supported FTS backends:– Squat: Dovecot’s own indexer, supports substring

matching.• Currently index updating is too inefficient

– Apache Solr

Page 26: Dovecot  M ail Storage

FTS: Solr

• Solr is a search engine server using Lucene• Dovecot talks to Solr via HTTP• Sharding via per-user fts_solr setting

Page 27: Dovecot  M ail Storage

Future

• FS API used for indexes and dbox– Support for key-value databases– Asynchronous disk I/O