tweaking perfomance on high-load projects_Думанский Дмитрий
DESCRIPTION
Конференция AI&Big Data Lab, 12 апреля 2014TRANSCRIPT
![Page 1: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/1.jpg)
Tweaking performance on high-load projects
![Page 2: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/2.jpg)
Dmitriy DumanskiyCogniance, mGage project
Java Team Lead
![Page 3: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/3.jpg)
Project evolution
Mgage Mobclix
XXXX
![Page 4: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/4.jpg)
Mgage delivery load
3 billions req/mon. ~8 c3.xLarge Amazon instances.
Average load : 2400 req/secPeak : x10
![Page 5: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/5.jpg)
Mobclix delivery load
14 billions req/mon. ~16 c3.xLarge Amazon
instances.Average load : 6000 req/sec
Peak : x6
![Page 6: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/6.jpg)
XXXX delivery Load
20 billions req/mon. ~14 c3.xLarge Amazon instances.
Average load : 11000 req/secPeak : x6
![Page 7: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/7.jpg)
Is it a lot?
Average load : 11000 req/sec
![Page 8: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/8.jpg)
Twitter : new tweets
15 billions a month Average load : 5700 req/sec
Peak : x30
![Page 9: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/9.jpg)
Delivery load
Requests per month
Max load per
instance, req/sec
RequirementsServers, AWS c3.xLarge
Mgage 3 billions 300 HTTPTime 95% < 60ms 8
Mobclix 14 billions 400 HTTPTime 95% < 100ms 16
XXXX 20 billions 800 HTTPSTime 99% < 100ms 14
![Page 10: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/10.jpg)
Delivery load
c3.XLarge - 4 vCPU, 2.8 GHz Intel Xeon E5-2680LA - ~2-3
1-2 cores reserved for sudden peaks
![Page 11: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/11.jpg)
BE tech stacksMobclix : Spring, iBatis, MySql, Solr, Vertica, Cascading, Tomcat
Mgage :Spring, Hibernate, Postgres, Distributed ehCache, Hadoop, Voldemort, Jboss
XXXX:Spring, Hibernate, MySQL, Solr, Cascading, Redis, Tomcat
![Page 12: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/12.jpg)
Initial problem
● ~1000 req/sec● Peaks 6x● 99% HTTPS with response time < 100ms
![Page 13: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/13.jpg)
Real problem
● ~85 mln active users, ~115 mln registered users● 11.5 messages per user per day● ~11000 req/sec● Peaks 6x● 99% HTTPS with response time < 100ms● Reliable and scalable for future grow up to 80k
![Page 14: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/14.jpg)
Architecture
AdServer Console (UI)
Reporting
![Page 15: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/15.jpg)
Architecture
Console (UI)
MySql
SOLR Master
SOLR Slave SOLR SlaveSOLR Slave
![Page 16: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/16.jpg)
SOLR? Why?● Pros:
○ Quick search on complex queries○ Has a lot of build-in features (master-
slave replication, RDBMS integration)● Cons:
○ Only HTTP, embedded performs worth○ Not easy for beginners○ Max load is ~100 req/sec per instance
![Page 17: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/17.jpg)
“Simple” query
"-(-connectionTypes:"+"\""+getConnectionType()+"\""+" AND connectionTypes:[* TO *]) AND "+"-connectionTypeExcludes:"+"\""+getConnectionType()+"\""+" AND " + "-(-
OSes:"+"(\""+osQuery+"\" OR \""+getOS()+"\")"+" AND OSes:[* TO *]) AND " + "-osExcludes:"+"(\""+osQuery+"\" OR \""+getOS()+"\")" "AND (runOfNetwork:T OR
appIncludes:"+getAppId()+" OR pubIncludes:"+getPubId()+" OR categories:("+categoryList+"))" +" AND -appExcludes:"+getAppId()+" AND -pubExcludes:"
+getPubId()+" AND -categoryExcludes:("+categoryList+") AND " + keywordQuery+" AND " + "-(-devices:"+"\""+getHandsetNormalized()+"\""+" AND devices:[* TO *]) AND " + "-deviceExcludes:"+"\""+getHandsetNormalized()+"\""+" AND " + "-(-carriers:"+"\""
+getCarrier()+"\""+" AND carriers:[* TO *]) AND " + "-carrierExcludes:"+"\""+getCarrier()+"\""+" AND " + "-(-locales:"+"(\""+locale+"\" OR \""+langOnly+"\")"
+" AND locales:[* TO *]) AND " + "-localeExcludes:"+"(\""+locale+"\" OR \""+langOnly+"\") AND " + "-(-segments:("+segmentQuery+") AND segments:[* TO *]) AND " + "-segmentExcludes:("+segmentQuery+")" + " AND -(-geos:"+geoQuery+" AND geos:[*
TO *]) AND " + "-geosExcludes:"+geoQuery
![Page 18: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/18.jpg)
Architecture
MySql
Solr Master
SOLR Slave
AdServer
SOLR Slave
AdServer
SOLR Slave
AdServer
No-SQL
![Page 19: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/19.jpg)
AdServer - Solr Slave
Delivery:volitile DeliveryData cache;
Cron Job:DeliveryData tempCache = loadData();
cache = tempCache;
![Page 20: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/20.jpg)
Why no-sql?
● Realtime data● Quick response time● Simple queries by key● 1-2 queries to no-sql on every request. Average load
10-20k req/sec and >120k req/sec in peaks. ● Cheap solution
![Page 21: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/21.jpg)
Why Redis? Pros
● Easy and light-weight● Low latency and response time.
99% is < 1ms. Average latency is ~0.2ms● Up to 100k 'get' commands per second on
c1.X-Large● Cool features (atomic increments, sets,
hashes)● Ready AWS service — ElastiCache
![Page 22: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/22.jpg)
Why Redis? Cons
● Single-threaded from the box● Utilize all cores - sharding/clustering● Scaling/failover not easy● Limited up to max instance memory (240GB largest
AWS)● Persistence/swapping may delay response● Cluster solution not production ready
![Page 23: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/23.jpg)
DynamoDB vs RedisPrice per month Put, 95% Get, 95% Rec/sec
DynamoDB 58$ 300ms 150ms 50
DynamoDB 580$ 60ms 8ms 780
DynamoDB 5800$ 16ms 8ms 1250
Redis 200$ (c1.medium) 3ms <1ms 4000
ElastiCache 600$ (c1.xlarge) <1ms <1ms 10000
![Page 24: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/24.jpg)
What about others?
● Cassandra● Voldemort● Memcached
![Page 25: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/25.jpg)
Redis RAM problem
● 1 user entry ~ from 80 bytes to 3kb● ~85 mln users● Required RAM ~ from 1 GB to 300 GB
![Page 26: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/26.jpg)
Data compression speed
![Page 27: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/27.jpg)
Data compression size
![Page 28: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/28.jpg)
Data compression
Json → Kryo binary → 4x times less data → Gzipping → 2x times less data == 8x less data
Now we need < 40 GB
+ Less load on network stack
![Page 29: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/29.jpg)
AdServer BE
Average response time — ~1.2 msLoad — 800 req/sec with LA ~4
c3.XLarge == 4 vCPU
![Page 30: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/30.jpg)
AdServer BE
● Logging — 12% of time (5% on SSD);● Response generation — 15% of time;● Redis request — 50% of time;● All business logic — 23% of time;
![Page 31: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/31.jpg)
Reporting
AdServer Hadoop ETL
MySQLConsole
S3 S3
Delivery logs Aggregated logs
![Page 32: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/32.jpg)
Log structure{ "uid":"test", "platform":"android", "app":"xxx", "ts":1375952275223, "pid":1, "education":"Some-Highschool-or-less", "type":"new", "sh":1280, "appver":"6.4.34", "country":"AU", "time":"Sat, 03 August 2013 10:30:39 +0200", "deviceGroup":7, "rid":"fc389d966438478e9554ed15d27713f51", "responseCode":200, "event":"ad", "device":"N95", "sw":768, "ageGroup":"18-24", "preferences":["beer","girls"] }
![Page 33: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/33.jpg)
Log structure
● 1 mln. records == 0.6 GB.● ~900 mln records a day == ~0.55 TB.● 1 month up to 20 TB of data.● Zipped data is 10 times less.
![Page 34: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/34.jpg)
Reporting
Customer : “And we need fancy reporting”
But 20 TB of data per month is huge. So what we can do?
![Page 35: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/35.jpg)
ReportingDimensions:device, os, osVer, sreenWidth, screenHeight, country, region, city, carrier, advertisingId, preferences, gender, age, income, sector, company, language, etc...
Use case:I want to know how many users saw my ad in San-Francisco.
![Page 36: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/36.jpg)
ReportingGeo table:Country, City, Region, CampaignId, Date, counters;
Device table:Device, Carrier, Platform, CampaignId, Date, counters;
Uniques table:CampaignId, UID
![Page 37: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/37.jpg)
Predefined report types → aggregation by predefined dimensions → 500-1000 times less
data 20 TB per month → 40 GB per month
![Page 38: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/38.jpg)
Of course - hadoop● Pros:
○ Unlimited (depends) horizontal scaling
● Cons:○ Not real-time○ Processing time directly depends on quality code
and on infrastructure cost.○ Not all input can be scaled○ Cluster startup is so... long
![Page 39: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/39.jpg)
Alternatives?
● Storm● Redshift● Vertica● Math models?
![Page 40: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/40.jpg)
Elastic MapReduce
● Easy setup● Easy extend● Easy to monitor
![Page 41: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/41.jpg)
Timing● Hadoop (cascading) :
○ 25 GB in peak hour takes ~40min (-10 min). CSV output 300MB. With cluster of 4 c3.xLarge.
● MySQL:○ Put 300MB in DB with insert statements ~40 min.
![Page 42: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/42.jpg)
Timing● Hadoop (cascading) :
○ 25 GB in peak hour takes ~40min (-10 min). CSV output 300MB. With cluster of 4 c3.xLarge.
● MySQL:○ Put 300MB in DB with insert statements ~40 min.
● MySQL:○ Put 300MB in DB with optimizations ~5 min.
![Page 43: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/43.jpg)
Optimized are
● No “insert into”. Only “load data” - ~10 times faster● “ENGINE=MyISAM“ vs “INNODB” when possible - ~5
times faster● For “upsert” - temp table with “ENGINE=MEMORY” - IO
savings
![Page 44: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/44.jpg)
CascadingHadoop:
void map(K key, V val,
OutputCollector collector) {
...
}
void reduce(K key, Iterator<V> vals, OutputCollector collector) {
...
}
Cascading:Scheme sinkScheme = new TextLine(new Fields( "word", "count"));
Pipe assembly = new Pipe("wordcount");
assembly = new Each(assembly, new Fields( "line" ), new RegexGenerator(new Fields("word"), ",") );
assembly = new GroupBy(assembly, new Fields( "word"));
Aggregator count = new Count(new Fields( "count"));
assembly = new Every(assembly, count);
![Page 45: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/45.jpg)
Why cascading?
Hadoop Job 1
Hadoop Job 2
Hadoop Job 3
Result of one job should be processed by another job
![Page 46: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/46.jpg)
Lessons Learned
![Page 47: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/47.jpg)
Redis sharding
Redis shard 0 Redis shard 1 Redis shard 2
AdServer
shardNumber = UID.hashCode() / 3
![Page 48: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/48.jpg)
Resharding problem
All data already in shards, how to add new shards?
![Page 49: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/49.jpg)
Resharding problem. Solution
Old Shard NewShard
1. Get NEW UID. If not present - a).
AdServer
a) Get OLD UID 2. Save UID to new Shard
Removal script
![Page 50: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/50.jpg)
Postgres partitioning
● Queries on small partitions● Distributed index● Less index size● Small partitions may fit RAM memory● Easy to remove/move
![Page 51: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/51.jpg)
Cost of IO
L1 cache 3 cyclesL2 cache 14 cyclesRAM 250 cyclesDisk 41 000 000 cyclesNetwork 240 000 000 cycles
![Page 52: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/52.jpg)
Cost of IO
@Cacheable is everywhere
![Page 53: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/53.jpg)
Hadoop
Map input : 300 MBMap output : 80 GB
![Page 54: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/54.jpg)
Hadoop
● mapreduce.map.output.compress = true● codecs: GZip, BZ2 - CPU intensive● codecs: LZO, Snappy● codecs: JNI
~x10
![Page 55: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/55.jpg)
Hadoop
Consider Combiner
![Page 56: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/56.jpg)
Hadoop
Text, IntWritable, BytesWritable, NullWritable, etc
Simpler - better
![Page 57: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/57.jpg)
Hadoop
Missing data:map(T value, ...) {
Log log = parse(value);
Data data = dbWrapper.getSomeMissingData(log.getCampId());
}
![Page 58: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/58.jpg)
Hadoop
Missing data:map(T value, ...) {
Log log = parse(value);
Data data = dbWrapper.getSomeMissingData(log.getCampId());
}
Wrong
![Page 59: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/59.jpg)
Hadoop
Unnecessary data:map(T value, ...) {
Log log = parse(value);
Key resultKey = makeKey(log.getCampName(), ...);
output.collect(resultKey, resultValue);
}
![Page 60: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/60.jpg)
Hadoop
Unnecessary data:map(T value, ...) {
Log log = parse(value);
Key resultKey = makeKey(log.getCampName(), ...);
output.collect(resultKey, resultValue);
}
Wrong
![Page 61: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/61.jpg)
Hadoop
Unnecessary data: RecordWriter.write(K key, V value) {
Entity entity = makeEntity(key, value);
dbWrapper.save(entity);
}
![Page 62: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/62.jpg)
Hadoop
Unnecessary data: RecordWriter.write(K key, V value) {
Entity entity = makeEntity(key, value);
dbWrapper.save(entity);
}
Wrong
![Page 63: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/63.jpg)
Hadoop public boolean equals(Object obj) {
EqualsBuilder equalsBuilder = new EqualsBuilder();
equalsBuilder.append(id, otherKey.getId());
...
}
public int hashCode() {
HashCodeBuilder hashCodeBuilder = new HashCodeBuilder();
hashCodeBuilder.append(id);
...
}
![Page 64: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/64.jpg)
Hadoop public boolean equals(Object obj) {
EqualsBuilder equalsBuilder = new EqualsBuilder();
equalsBuilder.append(id, otherKey.getId());
...
}
public int hashCode() {
HashCodeBuilder hashCodeBuilder = new HashCodeBuilder();
hashCodeBuilder.append(id);
...
}
Wrong
![Page 65: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/65.jpg)
Hadooppublic void map(...) {
…for (String word : words) {
output.collect(new Text(word), new IntVal(1));
}
}
![Page 66: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/66.jpg)
Hadooppublic void map(...) {
…for (String word : words) {
output.collect(new Text(word), new IntVal(1));
}
}
Wrong
![Page 67: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/67.jpg)
Hadoopclass MyMapper extends Mapper {
Text word = new Text();
IntVal one = new IntVal(1);
public void map(...) {
for (String word : words) {
word.set(word);
output.collect(word, one);
}
}
}
![Page 68: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/68.jpg)
Network
Per 1 AdServer instance :Income traffic : ~100Mb/secOutcome traffic : ~50Mb/sec
LB all traffic :Almost 10 Gb/sec
![Page 69: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/69.jpg)
Amazon
![Page 70: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/70.jpg)
AWS ElastiCacheSLOWLOG GET 1) 1) (integer) 35 2) (integer) 1391709950 3) (integer) 34155 4) 1) "GET" 2) "2ads10percent_rmywqesssitmfksetzvj" 2) 1) (integer) 34 2) (integer) 1391709830 3) (integer) 34863 4) 1) "GET" 2) "2ads10percent_tteeoomiimcgdzcocuqs"
![Page 71: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/71.jpg)
AWS ElastiCache
35ms for GET? WTF?Even java faster
![Page 72: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/72.jpg)
AWS ElastiCache
● Strange timeouts (with SO_TIMEOUT 50ms)● No replication for another cluster● «Cluster» is not a cluster● Cluster uses usual instances, so pay for 4
cores while using 1
![Page 73: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/73.jpg)
AWS Limits. You never know where
● Network limit● PPS rate limit● LB limit● Cluster start time up to 20 mins● Scalability limits● S3 is slow for many files
![Page 74: Tweaking perfomance on high-load projects_Думанский Дмитрий](https://reader033.vdocuments.net/reader033/viewer/2022042613/554a1ffeb4c90507558b5ab4/html5/thumbnails/74.jpg)
Facts
● HTTP x2 faster HTTPS● HTTPS keep-alive +80% performance● Java 7 40% faster Java 6 (our case)● All IO operations minimized