scalability and performance
TRANSCRIPT
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Scalability & PerformancePrinciples & Techniques
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
What is a Performance Problem?
System is Slow for a Single User
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
What is a Scalability Problem?Fast for Single User
butSlow under Heavy
Load
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
How do you measure Performance?
Response Time for 1 Useri.e. how long the user waits
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Number of Users thatcan work simultaneously
with acceptable performance
How do you measure Scalability?
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
95% of time is spent
in fronten
d
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
But, we won’t talk about frontend performance improvements today.
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Let’s talk
about backen
d
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Relationship between Performance & Scalability
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Performance & Scalability Mantra
Strive for maximum throughput with acceptable response times
Copyright © 2014-2017 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
To Improve Scalability...
Improve Performance
AddCapacity
OR
Copyright © 2014-2017 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Response Time : 1sWorkers : 4
Machines : 1
Poor Performance & Scalability
4 requests/secondSlowest response : 1s
Copyright © 2014-2017 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Better Performance & Scalability
Response Time : 500ms
Workers : 4Machines : 1
8 requests/secondSlowest response : 1s
Copyright © 2014-2017 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Even Better Performance ...
Response Time : 100ms
Workers : 4Machines : 1
40 requests/secondSlowest response : 1s
Copyright © 2014-2017 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Great Performance & Scalability
Response Time : 10msWorkers : 4
Machines : 1
400 requests/secondSlowest response : 1s
Copyright © 2014-2017 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
You cannot make response time 0ms
To support more than 400 requests….
Increase Number of Workers
Copyright © 2014-2017 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
With double the workers...
Response Time : 10msWorkers : 8
Machines : 2
800 requests/secondSlowest response : 1s
Copyright © 2014-2017 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
and with even more workers...
Response Time : 10msWorkers : 16Machines : 4
1600 requests/secondSlowest response : 1s
Copyright © 2014-2017 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Coming Back To Reality
Increasing Capacity Should
Increase Throughput
It Does Not.
Not Until You Design Your Application Correctly.
Copyright © 2014-2017 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
You have to increase capacity at
each layer.Web & App Server
Cache Server, Database CPU, Network, Disk
Why?
And there are locks.
Database LocksSynchronized Code Blocks. MutexesFile System Locks
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
What should our goal be?
1. Reduce Response Times2. Make it possible to add
more Capacity
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Scaling Data Storage
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Browser Edge Server
Load Balancer
Web Server
App Server REST API DatabaseUser
Browser Edge Server
Load Balancer
Web Server
App Server REST API DatabaseUser
Browser Cache CDN Web
AcceleratorNginx /Apache
Object Cache
AkamaiAWS
Cloudfront
VarnishAll BrowsersAll Mobiles
mod_proxy Static VariablesRedis,
MemcachedEHCache
Less Granularity, More Effective More Granularity, Less Effective
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
HTTP Caching StrategyCache ForeverCache-Control : maxageExpires :
Do Not CacheCache-Control : maxage=-1Expires : 1970-01-01
Cache TemporarilyCache-Control : maxage=3600Expires : <now plus 1 hour>
➔ Use for HTML pages with dynamic content➔ Avoid for static resources
➔ Use for high traffic public html pages - i.e. homepage➔ Specify etag or expires header to use conditional
GET➔ Use javascript to load user specific data➔ Avoid for static resources
➔ Use for static resources - css, images, js➔ Change URL in HTML when resource is modified➔ Use a pre-processor to simplify management
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Cache TemporarilyRender this using JS
after page load
Cache for 2h.Don’t overdo. You
cannot change the URL of your homepage if the content has to change.
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Cache Static Files PermanentlyStatic files are loaded
from browser cache
Cache ForeverIf base.css changes, serve it
from base.css?v=2New URL. Fresh Download.
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Cache-Control: privateLogged in user-
specific pages or APIs.
Only Cached By Browser
CDN, Web Accelerator, Proxies and Web Servers will not
cache.
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Ask These Questions
1.How often does the data change?2.Can you tolerate stale data? For
how long?3.How critical is the data? Can you
lose some of it?
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Caching Strategy : Split Objects
Split objects based on frequency of change.Freshers create their profile once, but apply to jobs very often.
FresherPersonal Details
Education & Work Ex
Job Application Status
Fresher ProfilePersonal Details
Education & Work Ex
Job ApplicationsJob Application Status
Create Two Objects With Different TTLs
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Cache In-Memory
Put them in settings file
➔Frequently Accessed Data➔Infrequently Changing Data➔Configuration & Settings
Make It Easy To Deploy Just The Settings
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Manually Clear Cache
Cache ForeverDelete as Needed
➔Dynamic Settings➔Throttles & Blacklists
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Delete on Modification, Rebuild on Read
When data changes, delete it from the cache. Next read will automatically fill up the cache.
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
DB Design For Scalability
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
First, Know your Data1. Sizing
How many records in 6 months / 1 year?
2. Query VolumesHow many reads / writes?
3. Hot TablesMost frequently accessed tables?
4. Criticality of DataHow important is it to not lose data?
5. Availability v/s ConsistencyHow important is it to not lose data?
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
What is Availability?
Ensuring your system can be used anytime
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
What is Consistency?
Data is in same state across all the copies
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
You can’t get both...
You can choose only one Consistency or Availability
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Two Salesmen Selling Apartments➔ Each has a diary of sold flats➔ Call and confirm before selling
What happens when one salesman is offline, and a customer calls?
1. He takes the order. But there is a chance the other salesman also sold the same flat…Not Consistent
2. He does not take the order…Not Available.
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Different Data, Different Guarantees
Sales Transactions must be Consistent
Product Catalog must be Available
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Start with a Normalized Schema...
…which essentially means no redundant data.
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Optimize Heavy Read Operations
Selectively de-normalize to eliminate joins.
1. Counts of objects2. Summary Statistics3. Events / Activities
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Denormalize - Number of people watching this Issue
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Not everything needs to be Accurate
Choose wisely
between Accuracy
and Performance
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Create a Separate Reporting Schema
1. Use a Star Schema2. Aggregate Data
a. by timehour, day, week, month, quarter, year
b. by regionnorth, south, east, west, central
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Use a Search Engine
1. Relational DB as Source of Truth2. SOLR or ElasticSearch as Index3. Cron Job to update Search Engine
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Things to Avoid in a RDBMS
1. Don’t store files in DB2. Don’t create task queues3. Don’t maintain counters
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Use Read Replicas
Use Master-Slave Replication, and use Slaves for Reads.
Only use for non-transactional reads.
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Shard your Data
1. Choose shard key wiselyLocation is usually a poor choice
2. Sharding later is painfulIf you think you may need it, shard upfront.
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Async & Availability - Best Friends
Pre-computationSlow Jobs
Offload Work to a Job Queue
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Database Optimization
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Quick Recap
For New Systems - See Database Design
For Existing Systems - See Database Optimization
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Asking Deepak for a Notepad
How does he get you a notepad?The stationery shop is the database
1. Waits for the elevator2. Walks down the street3. Waits for the pedestrian traffic light4. Reaches Store5. Waits for the previous customer6. Requests for a Notepad7. Waits for the attendant to search8. Bonus : Attendant misplaces notepads
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Three General Techniques
1. Minimize Queries
2. Do More Work in One Trip
3. Make the Query Efficient
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
The Fastest Query...
is Never Executed
Cache Aggressively to Minimize Queries
Don’t use ORM for Reports
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
DB Optimization : More in One Query
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
N + 1 Problemdoc_ids = db.query(“select id from documents where user = ?“, user)
docs = []for docid in doc_ids: doc = db.query(“select … from documents where id = ?”, docid) docs.append(doc)
Query in a Loop = Disaster
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Homogeneous Queries - Union AllTablesmonthly_incomemonthly_expenditure
Select ‘income’ as heading, month, income from monthly_incomeUNION ALLSelect ‘expenditure’ as heading, month, expenses from monthly_expenditure
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Anti-Pattern : Fetch and Update
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Bulk Operations & Batch Inserts
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
DB Optimization : Efficient Queries
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Find matching lines from a book
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Finding a word in a novel
Full Table Scan
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Finding a Topic in a Tech Book
Index Seek
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Lookup Meaning of a Word
Find by Primary KeyClustered IndexData is Sorted
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Find Words like ab* and ac*
Clustered Indexes are Great for Range Queries
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Query Planner Algorithm
For each table in a query : Find constraining columns (where, join) For each Index on the table : Find if the index can be used If multiple indexes : Find Best Index
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Find if Index can be used
1. Index column must be in where clause2. For multi-column indexes, the starting columns must
be in where clause
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Best Index?
Index Cardinality
Table Statistics
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
When is an Index NOT Used?
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Slow Queries / Profiler
MySQL Slow Queries LogMS SQL Profiler
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Anti Pattern : in clause with subquery
Select …. from table1 where id in (select id from table2 where…)
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
Database Locks & Isolation Levels
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL
What is a Lock?
Mechanism to prevent data corruption when multiple people access the database concurrently.