5 years of building saas on aws
TRANSCRIPT
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Christian Beedgen
October 2015
5 Years of Building SaaS on AWSA Story by Sumo Logic
$ whoami
Co-Founder & CTO, Sumo LogicCloud-based Machine Data Analytics ServiceApplications, Operations, Security
Chief Architect, ArcSightMajor SIEM player in the enterprise spaceLog Management for security and compliance
From Data to Decisions
DEVOPS
Streamline continuous delivery
Monitor KPI’s and Metrics
Accelerate Troubleshooting
IT INFRASTRUCTURE AND OPERATIONS
Monitor all workloads
Troubleshoot and increase uptime
Simplify, Modernize, and save costs
COMPLIANCE AND SECURITY
COMPLIANCE AND SECURITYAutomate and demonstrate compliance
Audit all systems
Think beyond rules
Cloud Analytics Platform
DEVOPSIT INFRASTRUCTURE
AND OPERATIONSCOMPLIANCE AND
SECURITY
Cloud Analytics Platform
From Data to Decisions
DEVOPSIT INFRASTRUCTURE
AND OPERATIONSCOMPLIANCE AND
SECURITY
Customer A Cloud
COLLECTOR COLLECTOR
Customer A Data Center Customer B Data Center
COLLECTOR
Customer B Cloud
COLLECTOR
Why SaaS?
Because enterprise software sucks™
Why SaaS?
Because enterprise software sucks™
Too much pain for the customerTime spent running the system is not spent using the systemExpensive when done adding hardware and people
Why SaaS?
Because enterprise software sucks™
Too much pain for the customerTime spent running the system, not spent using the systemExpensive when done adding hardware and people
Disastrous for the vendorNo control over the runtime, hard to diagnose problemsKills innovation because each release lives forever
Why AWS?
We are developers, not data center people
AWS has turned the data center into an API
As developers, we understand reuse (libraries, OSs, …)
Today’s systems require reuse on a higher level
Do you really want to care for 4,000 machines? HA? DR?
Anti-monolithic
In previous gigs, we dealt with monolithic systems
With Sumo, we knew what we needed to build, no MVP required
Get data into the system, index it, provide query function
So we had a logical breakdown immediately
And we knew it had to scale…
…not just to the biggest customer, but to all customers!
Ingestion Path
Receiver Bus Index
Raw
CQ
S3
Analytics Path
Query
Service
CQ
S3
Scale Today
50 TB of new incoming data per day
Double-digit PB of data under management
>2,000,000 queries/day
Thousands of instances in 4 regions globally
Divide & Conquer
Divide & Conquer
3 to 1000s of instances!
Divide & Conquer
Each box in the previous images
is an application
Divide & Conquer
Blast radius, bulk-heading,
concern separation
Divide & Conquer
Not everything will break all the
time – repair engines, not plane
Divide & Conquer
Not everybody will need to work
on everything all the time
What We Actually Did
Compose applications from layers of modules
Whole system is Scala on top of the JVM
One Maven POM per module, one main() per application
Initially one GitHub repository per module, today just one project
Right size AWS instance for each application cluster
Each application exposes a façade
Avro over HTTP, or Avro over HornetQ, or Avro over Kafka
How I Actually Visualize Microservices
2 to the power of 5 services (“32”), 170+ modules
Don’t even ask about the # of dependencies
At least 3 of each – everything is a separately scalable cluster
Service Discovery
Loose coupling in the large…A deployment is made up of many thingsSome of these things need to talk to each otherSome of these things come and goDon’t pass in a huge list of static dependenciesStart each application with one parameter
$ bin/receiver prod.service-registry.sumologic.com
Anti-singletenant
Multi-dimensional scaling predicates multitenancy
This is a data processing platform – cost matters!
Autoscaling single tenants is too fine-grained for us
Also, efficiency… one code line “master” in deployment
Customers aren’t pets, they are cattle
Anti-singletenant
Multi-dimensional scaling predicates multitenancy
This is a data processing platform – cost matters!
Autoscaling single tenants is too fine-grained for us
Also, efficiency… one code line “master” in deployment
Customers aren’t pets, they are cattle
Anti-singletenant
Multi-dimensional scaling predicates multitenancy
This is a data processing platform – cost matters!
Autoscaling single tenants is too fine-grained for us
Also, efficiency… one code line “master” in deployment
Customers aren’t pets, they are cattle
Yum yum yum… FEATURE FLAGS!!!
Just one typical Sumo Logic customer - 8x Variance!
Just one typical Sumo Logic customer - 8x Variance!
Money flushed down the toilet
Just one typical Sumo Logic customer - 8x Variance!
Money flushed down the toilet
Load per tenant fluctuates wildly, but
aggregated system load just goes up slowly
Anti-manual
We use Jenkins, of course
We still build system versions as cross-cuts and QA them
We are busy moving toward true continuous delivery
Application Groups for things that evolve together…
…and that can be deployed together
ProdLongStagNite
dsh: Another AWS Deployment Tool
Model-driven, describe desired state, run to make it so
High performance due to parallelization
Covers all layers of the stack – AWS, OS, Sumo Logic
Easy to use and extend, scriptable CLI
Developer-friendly, Scala-based, high-level APIs
Data Access Layer
Delivery
Authentication &Authorization
MeteringMonitoring
Ordering
Provisioning
BillingAnalytics
Resource Management SaaS Application(s)
Business Services Core Platform Services
Interaction
Application
Add
ition
al A
pplic
atio
ns
Application LifecycleManagement
EC2
EC2
Route53
S3 Glacier
CloudFront
DynamoDB RDSElastiCache
DynamoDBDynamoDB RedShift
WorkSpaces
CloudWatch CloudTrailIAM
CodeDeploy
BeanstalkCloudFormationOpsWorksSWF
SWF
EMR EMR Kinesis
SNS
MobileAnalytics
Kinesis SNSCognitoDirectory
Service
CloudSearch
AppStream
SES SQS
SWF XCode
Data Pipeline
Data Access Layer
Delivery
Authentication &Authorization
MeteringMonitoring
Ordering
Provisioning
BillingAnalytics
Resource Management SaaS Application(s)
Business Services Core Platform Services
Interaction
Application
Add
ition
al A
pplic
atio
ns
Application LifecycleManagement
3 ELBs (service, api, receiver)
Data Access Layer
Delivery
Authentication &Authorization
MeteringMonitoring
Ordering
Provisioning
BillingAnalytics
Resource Management SaaS Application(s)
Business Services Core Platform Services
Interaction
Application
Add
ition
al A
pplic
atio
ns
Application LifecycleManagement
EC2, obviouslyRIs, dabbling with Spot
Data Access Layer
Delivery
Authentication &Authorization
MeteringMonitoring
Ordering
Provisioning
BillingAnalytics
Resource Management SaaS Application(s)
Business Services Core Platform Services
Interaction
Application
Add
ition
al A
pplic
atio
ns
Application LifecycleManagement
SES for alert emails to our customers
Data Access Layer
Delivery
Authentication &Authorization
MeteringMonitoring
Ordering
Provisioning
BillingAnalytics
Resource Management SaaS Application(s)
Business Services Core Platform Services
Interaction
Application
Add
ition
al A
pplic
atio
ns
Application LifecycleManagement
SQS for user registration from corporate website
Data Access Layer
Delivery
Authentication &Authorization
MeteringMonitoring
Ordering
Provisioning
BillingAnalytics
Resource Management SaaS Application(s)
Business Services Core Platform Services
Interaction
Application
Add
ition
al A
pplic
atio
ns
Application LifecycleManagement
Petabytes of S3
Data Access Layer
Delivery
Authentication &Authorization
MeteringMonitoring
Ordering
Provisioning
BillingAnalytics
Resource Management SaaS Application(s)
Business Services Core Platform Services
Interaction
Application
Add
ition
al A
pplic
atio
ns
Application LifecycleManagement
ElastiCache Memcache for client object caches
Data Access Layer
Delivery
Authentication &Authorization
MeteringMonitoring
Ordering
Provisioning
BillingAnalytics
Resource Management SaaS Application(s)
Business Services Core Platform Services
Interaction
Application
Add
ition
al A
pplic
atio
ns
Application LifecycleManagement
DynamoDB for feature flags and configuration
Data Access Layer
Delivery
Authentication &Authorization
MeteringMonitoring
Ordering
Provisioning
BillingAnalytics
Resource Management SaaS Application(s)
Business Services Core Platform Services
Interaction
Application
Add
ition
al A
pplic
atio
ns
Application LifecycleManagement
RDS MySQL for configuration and content objects
Data Access Layer
Delivery
Authentication &Authorization
MeteringMonitoring
Ordering
Provisioning
BillingAnalytics
Resource Management SaaS Application(s)
Business Services Core Platform Services
Interaction
Application
Add
ition
al A
pplic
atio
ns
Application LifecycleManagement
SimpleDB for deployment location
Data Access Layer
Delivery
Authentication &Authorization
MeteringMonitoring
Ordering
Provisioning
BillingAnalytics
Resource Management SaaS Application(s)
Business Services Core Platform Services
Interaction
Application
Add
ition
al A
pplic
atio
ns
Application LifecycleManagement Sumo Logic
Data Access Layer
Delivery
Authentication &Authorization
MeteringMonitoring
Ordering
Provisioning
BillingAnalytics
Resource Management SaaS Application(s)
Business Services Core Platform Services
Interaction
Application
Add
ition
al A
pplic
atio
ns
Application LifecycleManagement Sumo Logic
CloudWatch, CloudTrail
Data Access Layer
Delivery
Authentication &Authorization
MeteringMonitoring
Ordering
Provisioning
BillingAnalytics
Resource Management SaaS Application(s)
Business Services Core Platform Services
Interaction
Application
Add
ition
al A
pplic
atio
ns
Application LifecycleManagement
Sumo Logic!
Data Access Layer
Delivery
Authentication &Authorization
MeteringMonitoring
Ordering
Provisioning
BillingAnalytics
Resource Management SaaS Application(s)
Business Services Core Platform Services
Interaction
Application
Add
ition
al A
pplic
atio
ns
Application LifecycleManagement
Zuora for billing
Data Access Layer
Delivery
Authentication &Authorization
MeteringMonitoring
Ordering
Provisioning
BillingAnalytics
Resource Management SaaS Application(s)
Business Services Core Platform Services
Interaction
Application
Add
ition
al A
pplic
atio
ns
Application LifecycleManagement Jenkins, GitHub
Data Access Layer
Delivery
Authentication &Authorization
MeteringMonitoring
Ordering
Provisioning
BillingAnalytics
Resource Management SaaS Application(s)
Business Services Core Platform Services
Interaction
Application
Add
ition
al A
pplic
atio
ns
Application LifecycleManagement
Our own automation framework – “dsh”
Data Access Layer
Delivery
Authentication &Authorization
MeteringMonitoring
Ordering
Provisioning
BillingAnalytics
Resource Management SaaS Application(s)
Business Services Core Platform Services
Interaction
Application
Add
ition
al A
pplic
atio
ns
Application LifecycleManagement
CloudFormation for Mesos cluster setup
Integrations
Generic S3 CollectionAmazon S3 AuditElastic Load BalancingAmazon CloudFrontAWS CloudTrailAmazon VPC Flow LogsAWS Config
What Does the Future Hold?
Super happy to see Amazon EFS introduced
Borderline unnaturally excited about AWS KMS
Planning on using AWS Lambda as a “plugin system”
Implementing Mesos for new services
Very excited about Docker to enable better utilization
Thank You!
@raychaser