one billion notes as 'small data' (dave engberg)

21
One Billion Notes as ‘Small Data’ Dave Engberg, CTO

Upload: ontico

Post on 12-Jul-2015

806 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: One billion notes as 'Small Data' (Dave Engberg)

One Billion Notes as ‘Small Data’

Dave Engberg, CTO

Page 2: One billion notes as 'Small Data' (Dave Engberg)

Evernote Overview

Launched in 2008

“Freemium” subscription service

Mobile and desktop

33.3 million registered accounts

1.12 billion notes

1.6 billion unique attachments

Page 3: One billion notes as 'Small Data' (Dave Engberg)

Evernote Product Family

Food

Clearly

Peek

Hello

Skitch

Penultimate

Page 4: One billion notes as 'Small Data' (Dave Engberg)

Worldwide Reach

Page 5: One billion notes as 'Small Data' (Dave Engberg)

Last 30 days

Accounts created 1.9 million

Distinct users 11 million

Active clients 1337

New notes 77 million

Note edits 152 million

HTTP(S) requests 14 billion

Image OCR 90 million

Page 6: One billion notes as 'Small Data' (Dave Engberg)

Architecture philosophy

Only optimize for your application’stop 1-2 challenges

Page 7: One billion notes as 'Small Data' (Dave Engberg)

Evernote CPU

Page 8: One billion notes as 'Small Data' (Dave Engberg)

Evernote service requirements

ComponentComponent SizeSize Avg LoadAvg Load Peak LoadPeak Load

CPU Low Medium

Bandwidth

Latency

File storage

Metadata

Page 9: One billion notes as 'Small Data' (Dave Engberg)

Network load

Smooth traffic (no bursts)

Nearly symmetric

Sync clients: not latency-sensitive

100 TB in, 180 TB out per month

Page 10: One billion notes as 'Small Data' (Dave Engberg)

Evernote service requirements

ComponentComponent SizeSize Avg LoadAvg Load Peak LoadPeak Load

CPU Low Medium

Bandwidth Medium Medium

Latency Low Low

File storage

Metadata

Page 11: One billion notes as 'Small Data' (Dave Engberg)

File Storage

2.8 billion attachments

1.6 billion unique files

374 TB, de-duplicated

Effectively permanent

High redundancy required (stored 3x)

Page 12: One billion notes as 'Small Data' (Dave Engberg)

Evernote service requirements

ComponentComponent SizeSize Avg LoadAvg Load Peak LoadPeak Load

CPU Low Medium

Bandwidth Medium Medium

Latency Low Low

File storage High Low Medium

Metadata

Page 13: One billion notes as 'Small Data' (Dave Engberg)

Metadata

Strong ACID transactional DB < 10 TB Peak riops: 350 Peak wiops: 50

Near-realtime search < 10 TB Peak riops: 800 Peak wiops: 500

Page 14: One billion notes as 'Small Data' (Dave Engberg)

Evernote service requirements

ComponentComponent SizeSize Avg LoadAvg Load Peak LoadPeak Load

CPU Low Medium

Bandwidth Medium Medium

Latency Low Low

File storage High Low Medium

Metadata Low Medium High

Page 15: One billion notes as 'Small Data' (Dave Engberg)

Sharded architecture

Hardware:SuperMicro 1U2x L5630 CPU96 GB RAM6x 300GB Intel SSDLSI RAID 5 (+spare)~$8,000

Hardware:SuperMicro 4U1x L5630 CPU12 GB RAM24x 3TB (Seagate)LSI RAID 6 (x3)~$12,000

Software:TomcatJava 6MySQL 5.1DRBDXenDebian stable

Software:Apachemod_davDebian stable

Page 16: One billion notes as 'Small Data' (Dave Engberg)
Page 17: One billion notes as 'Small Data' (Dave Engberg)

265 shards

Around 400 Linux servers overall

Page 18: One billion notes as 'Small Data' (Dave Engberg)

Tiers of a cloud

Cloud Provider Strengths

Applications with bursts:- bandwidth- storage- compute

CPU-bound applications

Applications with low or fixed storage,low-medium iops

Fewer operations staff

Evernote’s Service

Consistent network usage

Consistent compute usage

File storage grows indefinitely: users * time

Random iops bound

Page 19: One billion notes as 'Small Data' (Dave Engberg)

CPU + Metadata comparison

Evernote shard

440 GB usable / VMsysbench: 5000 rwiops$8000

Over 4 years:$166/month

200,000 users/shard:$0.01/user/year

AWS EC2 + EBS

High-Memory 2XL:$300/month

Provisioned IOPS EBS volume (max 1000 iops):$155/month

Total: $455/month

Page 20: One billion notes as 'Small Data' (Dave Engberg)

File storage

Evernote WebDAV

WebDAV server, 54 TB$12,100

Over 4 years:$250/month$4.70/TB/month

With triple redundancy:$14/TB/month

AWS S3

$99/TB/month

Page 21: One billion notes as 'Small Data' (Dave Engberg)

Networking comparison

Evernote managed

$5/Mbps/month

~800Mbps peak

$4,000/month

AWS networking (out)

~$0.08/GB/month

180TB/month

$13,700/month