aws meetup - nordstrom data lab and the aws cloud

Post on 15-Jan-2015

537 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

The Nordstrom Data Lab is building out an API that powers product recommendations for our customer online and beyond. Recommendo, our flagship product, was built from the ground up using Node.js and AWS in a little over three months. Since launch in November 2013 we've served up over three billion recommendations and survived Black Friday and Cyber Monday without breaking a sweat. We'll be sharing our learnings for building and operating a high traffic API on the AWS platform as a service focusing on Node.js, Elastic Beanstalk, and DynamoDB. Additionally we'll discuss some of the cultural challenges and opportunities presented when adopting the public cloud at a large corporate IT organization. In short, we believe there are tremendous advantages to be had for enterprises willing to make the leap to the cloud.

TRANSCRIPT

Jason Wilson & David Von LehmanPRESENTING

AWS and the Nordstrom Data Lab

Recommendo Overview

• REST-ful product recommendations API • Live on nordstrom.com in November• Service emails live in January• Lives in the AWS cloud – Elastic Beanstalk,

DynamoDB, node.js• 3rd party rec vendors don’t tap into what is

unique about Nordstrom or fashion

By the Numbers

• Over 4 billion recommendations served• >3 million API hits per day• 105 days between first commit and go-live (Aug 6

and Nov 19 respectively)• 5 servers with auto-scaling to 20 (turns out we don’t

need them)• 90ms average request latency

50/50 test againstincumbent vendor

How We Built It

• Continuous integration and deployment from the first week

• 90+ percent code coverage• Fewer moving parts == less to monitor, fewer

ways for things to go wrong• Fully PaaS based to minimize sys admin

responsibilities• How can we support this ourselves without

carrying pagers?

DynamoDB

• Fully managed NoSQL database-as-a-service• Web API with SDK support for Python, Ruby,

node.js, .NET, and Java• High performance queries, backed by SSD• Maintains predictable performance for data at any

size through horizontal scale out• Auto replication across 3 availability zones• Need to understand data access patterns up front• Pay for only what you use/need – both storage and

R/W throughput

• JavaScript on the server atop the Google V8 engine• Asynchronous event loop makes it ideal for real-time

data intensive applications• Vibrant open-source community around excellent

npm package manager (50K+ packages)• Seeing increased adoption in enterprises including

Wal-Mart, LinkedIn, PayPal, Dow Jones, Microsoft, New York Times

JavaScript – Learn to Love It

• No type checking, don’t find errors until runtime

• Not classical OO• var keyword • Callback hell• Server debugging too hard• But wait..

• Chrome and V8• Dynamic can be your friend• npm! • express, async, mocha

AWS Components

• EC2 – Provides web-scale computing as a service.

• ELB – elastic load balancer. Routes incoming traffic to ec2 instances, scales up to meet demand.

• Auto-scaling group – a logical collection of EC2 instances behind an ELB

AWS Components

Elastic Beanstalk

• AWS PaaS – lightweight abstraction layer atop EC2/ELB with no additional costs

• More transparent than Azure or Heroku• Supports Java, .NET, Python, Node.js, PHP, and Ruby• git push deployment• Auto-scaling group with custom triggers and auto applied

config• Possible to configure the AMI including yum packages,

environment variables, and more• Supports custom AMIs• Automated health checks

Continuous Deployment

git push to dev branch

Jenkins CI

unit tests

git push to EB

git pull dev

git checkout master

git merge dev

git push master

Jenkins CI

unit tests

git push to EB (prod)

DevelopmentProduction

Performance testing

• Initial performance was poor.• Disable DNS caching when load testing against

ELB. • Pre-warm ELB for higher upfront throughput• jmeter-ec2, bees with machine guns

Early Perf results – YIKES!Transactions per second

Response time (seconds)

Performance tuning

• New relic, Nodetime– Real-time performance monitoring of node

runtime• node-mem-watch

– Evented inspection of heap, gc events, leak events, and heap diffing

• ssh into instances

Real Performance

• Pleasantly surprised • Average latency ~90ms• Dynamo response times <10ms• Handful of auto-scaling up and back events• One outage due to bad exception handling

400%

64%

DynamoDB

Lessons Learned / Pitfalls

• True zero downtime deployment is difficult to achieve

• Thoroughly explore the Elastic Beanstalk configuration options

• Catch those errors – a rogue unhandled exception can bring it all down

• Health checks that actually do something• Out of the box monitoring is pretty good

Harness the Cloud

On-Premise IaaS PaaS

% time

infrastructure experience

Logging Monitoring

Redundancy

DeploymentAutomation

High-Availability

Scalability

Iterative Development

Build to Experiment

EvolutionaryArchitecture

Change Tolerant

FrequentReleases

Small TeamsDo Both!

Agility vs. Industrial Strength

Security

PaaS Venn Diagram

Robust Systems

Rapid Delivery

Platypus as a Service

Recommendo 2.0

• Sku based recommendations – size!• Truly personalized recs based on individual browse

and purchase history

DynamoDBBatchRecs

Real-Time

Refinery

Scorer Ingester Redis

Streams

Additional AWS Services

• Elasticache and Redis• Elastic Beanstalk worker tiers• SQS• S3

Wrap-Up

• Recommendo – initial success, now building upon what we have learned

• Node.js + DynamoDB + Elastic Beanstalk is a winning combination

• Possible to out-perform an incumbent vendor solution in a competitive differentiating capability

• Cloud and PaaS enable small teams to move quick and deliver solid production caliber systems

• Incremental cost of “gold plating” steadily shrinking• Your company benefits when percent of resources devoted to

core competency is maximized

Thank you

• Questions / comments?• @davidvlsea• ds@nordstrom.com

top related