bigger, better, faster, more

73

Upload: jovita

Post on 23-Feb-2016

43 views

Category:

Documents


0 download

DESCRIPTION

Bigger, Better, Faster, More. An Introduction to Super-Scalability. But first…. The Arms Race. 1 ENIAC. 1 Teletype. 1 Mainframe. N Terminals. N Servers. N Terminals. N Servers. N PCs. N Web Servers. N Browsers. N Web Servers. N AJAX Apps. N Clusters. N AJAX Apps. N Clusters. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Bigger,  Better, Faster, More
Page 2: Bigger,  Better, Faster, More

Bigger, Better, Faster, More

An Introduction to Super-Scalability

Page 3: Bigger,  Better, Faster, More

THE ARMS RACEBut first…

Page 4: Bigger,  Better, Faster, More

1 ENIAC 1 Teletype

Page 5: Bigger,  Better, Faster, More

1 Mainframe N Terminals

Page 6: Bigger,  Better, Faster, More

N Servers N Terminals

Page 7: Bigger,  Better, Faster, More

N Servers N PCs

Page 8: Bigger,  Better, Faster, More

N Web Servers N Browsers

Page 9: Bigger,  Better, Faster, More

N Web Servers N AJAX Apps

Page 10: Bigger,  Better, Faster, More

N Clusters N AJAX Apps

Page 11: Bigger,  Better, Faster, More

N Clusters N*M Phones

Page 12: Bigger,  Better, Faster, More

N Cloudlets N*M Phones

Page 13: Bigger,  Better, Faster, More

And So On…

Page 14: Bigger,  Better, Faster, More

What is Scalability?

Page 15: Bigger,  Better, Faster, More

Scalability = Ability to do More

Page 16: Bigger,  Better, Faster, More

More What?

Page 17: Bigger,  Better, Faster, More

More Processing

Page 18: Bigger,  Better, Faster, More

Processing Takes Resources

Page 19: Bigger,  Better, Faster, More

Types of Resources

CPU Disk Memory Network

Page 20: Bigger,  Better, Faster, More

Types of Utilization

Time / Throughput

Space / Capacity

Page 21: Bigger,  Better, Faster, More

Types of Utilization

Time / Throughput

Space / Capacity

Complexity

Locking

Page 22: Bigger,  Better, Faster, More

Resources & Utilization

Page 23: Bigger,  Better, Faster, More

We Want More!(but how to scale?)

Page 24: Bigger,  Better, Faster, More

How to Scale

Just make it bigger (vertical scaling)

Page 25: Bigger,  Better, Faster, More

We Want Even More!(super-scalability)

Page 26: Bigger,  Better, Faster, More

Scaling Strategies

Space Bigger

Complexity Better

Time Faster

Locking More

Page 27: Bigger,  Better, Faster, More

Bigger (Space)

Not SuperOne big data storeOne big memory storeMake it biggerMake it redundantE.g. Full activity logging

PartitioningSharding / HashingGrowth = Add PartitionTradeoff: Splitting PartitionsTradeoff: Redundancy becomes a distribution problem

…CBA

Page 28: Bigger,  Better, Faster, More

Better (Complexity)

Not SuperNumber of objects increaseAs relations increase, add time or space requirementsCommon with graph problemsE.g. PageRank

DistributionChop up problem / workloadMap/ReduceTradeoff: coordinationTradeoff: network

Page 29: Bigger,  Better, Faster, More

Faster (Time)

Not SuperTune your codeTune your databaseTune your networkBetter hardware

OptimizationAs fast as possible

Can’t scale as fast as growthSpecialization – ONE thingCaching - Reduces work in trade for spaceTradeoff: spaceTradeoff: coordination

Page 30: Bigger,  Better, Faster, More

More (Locking)

Not SuperOne at a timeSerialized access

Parallelizing / EstimatingSeparate reads & writesNon-locking estimationReduce contentionTradeoff: spaceTradeoff: coordination

Page 31: Bigger,  Better, Faster, More

But Which Technique(s)?

Page 32: Bigger,  Better, Faster, More

It Depends!

Page 33: Bigger,  Better, Faster, More

All: Divide & Conquer

Partitions: Data & ProcessingShardingWorker Processes

Coordination: Distribution & OrderingQueues & ManagersSeparate Read/Write Access

What does this make the system look like?

Page 34: Bigger,  Better, Faster, More

SOME THEORYAnd now…

Page 35: Bigger,  Better, Faster, More

ACID: reliable transaction systems

Atomicity – all or nothingConsistency – always correctIsolation – changesets executed independentlyDurability – once committed, stays so

Really hard to scale in one big block (although SSDs + RAM helps!)

Page 36: Bigger,  Better, Faster, More

Maybe It’s Not so Important?(it depends)

Page 37: Bigger,  Better, Faster, More

BASE is easier

Basically AvailableSoft StateEventual Consistency

A node will either eventually get a change or retireWell…still need conflict resolution

BASE is NOT ACID (get it?)

Page 38: Bigger,  Better, Faster, More

Can we have a Balanced pH?

Page 39: Bigger,  Better, Faster, More

CAP Theorem

Choose TWO:ConsistencyAvailabilityPartition tolerance

Manager

Replica 1 Replica 2

Double Outage!Double Outage!

Client 1 Client 2

Page 40: Bigger,  Better, Faster, More

Designing a scalable system

Page 41: Bigger,  Better, Faster, More

It Depends!

Page 42: Bigger,  Better, Faster, More

Understand Your Scale Points

LogProfileTuneTestDivideComparePartitionNo, really, log a lot

Page 43: Bigger,  Better, Faster, More

Fallacies of Distributed Computing

1. The network is reliable.2. Latency is zero.3. Bandwidth is infinite.4. The network is secure.5. Topology doesn't change.6. There is one administrator.7. Transport cost is zero.8. The network is homogeneous.

Page 44: Bigger,  Better, Faster, More

SOME “SCALY” TOOLS

Page 45: Bigger,  Better, Faster, More

CQRS Pattern

Separate operations for:Command – perform an actionQuery – returns data about state

Promotes simpler programsAllows Command QueuesReduces locking

Page 46: Bigger,  Better, Faster, More

A Scaly Stack

• Applications

SaaS

• Storage• Identity• Runtime• Queue / BusPaaS

• Compute• Block Data• NetworkIaaS

Page 47: Bigger,  Better, Faster, More

Infrastructure as a ServiceComponent Example

Compute Amazon EC2Azure Web/Worker Roles

Storage Amazon S3Azure TableStore

Network Any CDN

Page 48: Bigger,  Better, Faster, More

Platform as a ServiceComponent Example

Database SQL AzurePostgresMySQL

NoSQL CassandraRedisBigTableMongoDB

Cache Memcache

Queue Azure Service Bus

Processing HadoopStorm

Page 49: Bigger,  Better, Faster, More

Application as a Service

Salesforce? (Also sort of a platform)

Whateva!

Page 50: Bigger,  Better, Faster, More

AN EXAMPLECassandra

Page 51: Bigger,  Better, Faster, More

Cassandra

A “scalable” key-value storeAutomatic partitioningAutomatic replicas

Page 53: Bigger,  Better, Faster, More

Cassandra Data Model

Page 54: Bigger,  Better, Faster, More

So All is Good, Right?

Page 55: Bigger,  Better, Faster, More

A RELATIONAL EXAMPLEWorse than SQL Tuning?

Page 56: Bigger,  Better, Faster, More

Our Database

Page 57: Bigger,  Better, Faster, More

Know your Access Patterns

Get user by user id Get item by item idGet all the items that a particular user likesGet all the users who like a particular item

Page 58: Bigger,  Better, Faster, More

Cassandra Model #1: Relational-y

Can’t get all the items that a particular user likes (without a giant scan)

Page 59: Bigger,  Better, Faster, More

Cassandra Model #2: Indexes

N-M relationship is modeled with two tables. But Properties require secondary lookups.

Page 60: Bigger,  Better, Faster, More

Cassandra Model #3: Denormalization

Can put some data in the indexes if your queries need it. (Or serialize data.)

Page 61: Bigger,  Better, Faster, More

Cassandra Model #4: SuperColumns?

SuperColumns let you store other dimensions of data. (eek?)

Page 62: Bigger,  Better, Faster, More

Cassandra Model #5: Time order

Composite (sorted) column keys let you do neat things like time-order the mapping.

Page 63: Bigger,  Better, Faster, More

IT DEPENDS!Roll your own model – see www.datastax.com for great data model articles

Page 64: Bigger,  Better, Faster, More

Conflict Resolution in Cassandra

Each Tuple has a TimestampLast change winsRequires clock synchronization(Working on other strategies)

Page 65: Bigger,  Better, Faster, More

THE FUTUREBut wait, there’s more….

Page 66: Bigger,  Better, Faster, More

N*M*Q Cloudlets N*M*Q Devices

Page 67: Bigger,  Better, Faster, More

The Internet of ThingsIt’s coming. Can your servers handle it?

Page 68: Bigger,  Better, Faster, More

Things are Getting Smarter

ArduinoNetduinoRaspberry Pi ($25)

Page 69: Bigger,  Better, Faster, More

Servers will do Server Things

Cross-thing sharingData storageAnalysis

Page 70: Bigger,  Better, Faster, More

How Will We Survive?

CommunicationNetwork EffectAnalytics

Page 71: Bigger,  Better, Faster, More

Cell Computing

Self-sufficient unit of scaleAll components required to operate a portion of workloadKnown performance characteristicsKnown cost to interact with other cells

Page 72: Bigger,  Better, Faster, More

THINK BIGHow big is your project?

Page 73: Bigger,  Better, Faster, More

Some Scale

50,000 doctors100 editors500GB of data

Does it matter?