design tradeoffs in distributed systems- how southwest airlines uses geode

50
Design Tradeoffs In Distributed Systems How Southwest Uses Apache Geode

Upload: pivotal

Post on 12-Jan-2017

242 views

Category:

Technology


0 download

TRANSCRIPT

Design TradeoffsIn Distributed Systems

How Southwest UsesApache Geode

Brian Dunlap @brianwdunlap

Technical Lead - Aircraft SystemsMarch 8th, 2016

APACHE GEODE ATSOUTHWEST

OPS SUITECARGOCREWSOUTHWEST.COM

Optimize decisions with integrated schedule information.

Grow over the next 10 years.

Show a real-time view of our operational day…

to 1,000s of web users.

Scale across data centers and ensure consistency.

Support our most critical operational systems.

Southwest’s Network Operations Control integrates decision makers.

BOISE ALBANY

OKLAHOMA CITY

AUSTIN PANAMA CITY BEACH

CHARLESTON

GREENVILLE-SPARTANBURG

TUCSONLUBBOCK

AMARILLO

MIDLAND/ODESSAEL PASO

LITTLE ROCK

NASHVILLE

DALLAS (LOVE FIELD)

SACRAMENTOOAKLAND

SAN JOSE

BURBANKLOS ANGELES

(LAX) ORANGE COUNTYONTARIO

SAN DIEGO

SAN FRANCISCO (SFO)

BIRMINGHAM

LOUISVILLE

CLEVELAND

OMAHA

TULSA

RENO/TAHOE

HARLINGEN/SOUTH PADRE ISLAND

PUERTO VALLARTA

CORPUS CHRISTI

ALBUQUERQUE

DES MOINES

MEMPHIS

CABO SAN LUCAS/LOS CABOS

ROCHESTER

AKRON/CANTON

WICHITA

PENSACOLA

MEXICO CITY

NASSAU

PUNTA CANA

SAN JUAN

MONTEGO BAY

ARUBA

CANCÚN

FLINTGRAND RAPIDS

CHARLOTTE

DAYTON

MINNEAPOLIS/ST. PAUL

PHOENIX

DENVERINDIANAPOLIS

COLUMBUS

RALEIGH/DURHAM

CHICAGO (MIDWAY)

FT. LAUDERDALE (MIAMI AREA)

DETROIT

HOUSTON (HOBBY)

SEATTLE/TACOMA

LAS VEGAS

NEW ORLEANS

ST. LOUIS

MILWAUKEE

BUFFALO/NIAGARA FALLS

ATLANTA

ORLANDO

FT. MYERS/NAPLES

JACKSONVILLE

TAMPA

WEST PALM BEACH

SAN ANTONIO

KANSAS CITY

BELIZE CITY

SAN JOSÉLIBERIA

PORTLAND

WASHINGTON, D.C. (REAGAN NATIONAL)

RICHMOND

MANCHESTER

PROVIDENCEHARTFORD/SPRINGFIELD

NORFOLK/VIRGINIA BEACH

BOSTON LOGAN

PHILADELPHIA

BALTIMORE/WASHINGTON (BWI)WASHINGTON, D.C. (DULLES)

PITTSBURGH

NEW YORK (LAGUARDIA)LONG ISLAND/ISLIP

NEW YORK (NEWARK)

SALT LAKE CITY

SPOKANE

PORTLAND

NOC

CREW

PASSENGER

MAINTENANCE

FLIGHT

GATE

CARGO

AIRCRAFT

FACILITY

OPS SUITE CONSUMES

10M JMS MESSAGES DAILY

RECOVERYOPTIMIZATION

USES

OVER 1,000,000 SCHEDULES

4,000 FLIGHTS700 AIRCRAFT

500K PASSENGERS / DAY

a response time measured

in secondsRECOVERY

OPTIMIZATION DRIVES

Luv!

Thinking about boundaries

TEAMSSOFTWARE

FOCUS

ORG FOCUS

DOMAINSCORE

DOMAIN

SUPPORTINGDOMAIN

GEODENODES

ACROSS AZs

GC,WAN

PATTERNSPARALLEL

PROCESSING

ASYNC BEHAVIOR

Domain tradeoffs

TEAMSSOFTWARE

FOCUS

ORG FOCUS

DOMAINCORE

DOMAIN

SUPPORTINGDOMAIN

GEODENODES

ACROSS AZs

GC,WAN

PATTERNSPARALLEL

PROCESSING

ASYNC BEHAVIOR

What do you own?What do you need?How long can you keep it?

Domain tradeoffs

Books by:@ericevans0@VaughnVernon

Get Organized!Domain Driven Design (DDD)

CORE DOMAINSUPPORTING DOMAINUBIQUITOUS LANGUAGEAGGREGATESDOMAIN EVENTS

What do you own? (core) <invest>

What do you need? (supporting)<simplify>

How long can you keep it?<intentional>

Crew Maint Pax Cargo Flight Gate

Existing domain silos…

OVER 15 YEARS

OF COMPLEXITY

Crew Maint Pax Cargo Flight Gate

100% 100%

CORE DOMAINSSUPPORTING DOMAINS

INCREMENTALSTEPS

SELECTED INTEGRATION

What do you own? (core) <focus>

What do you need? (supporting)<simplify>

How long can you keep it?<intentional>

Adding is very easy.Watch out for data that’s around for too long.

Do all of these data need to be in-memory?

Data at rest for a long time? (>365 days)

GEODEREGION

SIZES

Determine if each subdomain should use Geode.

Don’t make an automatic decision.

Domain tradeoffs

Maybe it needs an entirely different home?

Domain tradeoffs

Pattern tradeoffs

TEAMSSOFTWARE

FOCUS

ORG FOCUS

DOMAINCORE

DOMAIN

SUPPORTINGDOMAIN

GEODENODES

ACROSS AZs

GC,WAN

PATTERNSPARALLEL

PROCESSING

ASYNC BEHAVIOR

How far?How fast?

Pattern trade-offs

The chart of scalability!

OLD NEWNORMALIZED JOINS REGIONS FOR READS

REGIONS FOR AGGREGATES

BLOCKING THREADS ASYNC - AKKA / ACTORS

ACTIVE / PASSIVE ACTIVE / ACTIVE

MUTABLE STATEIMMUTABILITY / EVENT

SOURCINGDATA CONVERGENCE

CRUD CQRS / DDDEVENT DRIVEN

ServiceManagerHandlerIm

pl

We’re learning!

OLD NEWNORMALIZED JOINS REGIONS FOR READS

REGIONS FOR AGGREGATES

BLOCKING THREADS ASYNC - AKKA / ACTORS

ACTIVE / PASSIVE ACTIVE / ACTIVE

MUTABLE STATEIMMUTABILITY / EVENT

SOURCINGDATA CONVERGENCE

CRUD CQRS / DDDEVENT DRIVEN

We write immutable domain events into event regions.

Client’s receive events using Geode CQs.

Client’s checkpoint their position into separate regions.

Event regions expire messages.

checkpointing

Akka Cluster manages Actor Singletons which coordinate parallel processing based on a logical groupId.

Backpressure is implemented through a competing consumer pattern. Take a look at Akka Streams!

All Geode replicate regions use distributed ack. We don’t want to converge. (some write wins)

coordination (*important concept)

JMS adapterCommand adaptersCommand handlers - to CQ clientsView model builders - to CQ clientsJMS publishers

data flow

PUSH or PULLHow do we scale expensive read I/O?

Contain expensive reads

With CQRS view model builders, perform heavy state enriching “select *” once. Push read updates vs. polling (Geode CQs)

Conflate triggering view model rebuild events

Be careful with timeouts!

Be careful with alerts!

Be careful with joins!

Be careful with large values!

Be careful with old habits!

safety tips

TeamsTEAMSSOFTWARE

FOCUS

ORG FOCUSDOMAIN

COREDOMAIN

SUPPORTINGDOMAIN

GEODENODES

ACROSS AZs

GC,WAN

PATTERNSPARALLEL

PROCESSING

ASYNC BEHAVIOR

Distributed systems are created by distributed teams.

Communication coordination is a thing.

Integrate Geode security with a directoryTune JVM size and GCDeploy and upgrade environmentsSize and configure VMsSupport production eventsEnable WAN Gateway Sender / ReceiversLoad snapshots between environmentsAutomate starting and stopping clustersTeaching distributed concepts - like CAP

How do we share new distributed system responsibilities?

DBAsUNIXDEVs

MiddlewareRelease Management

Offshore SupportNew Geode Team

DevOps

EARLIER IS

BETTER

Learn to luv conversation tension.

When there’s tension, you’re on the right track!

opssuite-allschedule-core.jar

Use separate repos to help with boundaries.Align Teams with repo ownership.Minimize jar dependencies across teams.

EMBRACE a 100X MENTALITY!

What does 100x mean? (msg/sec)

Normal rate: 50Busy rate: 500Recover rate: 5,000

HOW FAST CAN WE RECOVER?

Create great learning resources

Watch out for old habits!

Geode

TEAMSSOFTWARE

FOCUS

ORG FOCUS

DOMAINCORE

DOMAIN

SUPPORTINGDOMAIN

GEODENODES

ACROSS AZs

GC,WAN

PATTERNSPARALLEL

PROCESSING

ASYNC BEHAVIOR

Prefer less-shared disk I/O.(local to a VM rack, or dedicated)

Prefer larger + fewer Geode nodes.(4 larger nodes vs. 8 smaller ones)

Take advantage of availability zones (AZs).

CONVERSATIONLEADERSHIP

ACROSS TEAMS

SHARED or SHARED LESSWhat infrastructure supports Geode?

Know your memory (and GC) limits.

Watch out for slow heap growth that triggers continuous GC. -XX:+UseConcMarkSweepGC

-XX:CMSInitiatingOccupancyFraction=60 -Xloggc:/your/path/node-name.GC.log -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution -XX:+PrintGCCause -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=20 -XX:GCLogFileSize=5M

Check out GCViewer for GC log analysis.

Essential tool for real-time decision optimization testing!

Helpful for QA performance and functional testing.

Wonderful Geode feature!

WAN Gateway

Optimization binary consumes PDX via C++ Native Client

Moving > 200 MB per optimization request

Be careful with refactoring PDX data types!

C++ Native Client

Questions

TEAMSSOFTWARE

FOCUS

ORG FOCUS

DOMAINCORE

DOMAIN

SUPPORTINGDOMAIN

GEODENODES

ACROSS AZs

GC,WAN

PATTERNSPARALLEL

PROCESSING

ASYNC BEHAVIOR

QUESTIONS