hadoop summit-ams-2014-04-03

26
HADOOP, FROM LAB TO 24/7 PRODUCTION http://criteolabs.com/jobs

Upload: sdanzanvillierscriteo

Post on 21-Aug-2014

3.608 views

Category:

Engineering


3 download

DESCRIPTION

Criteo slides form the Hadoop summit in Amsterdam

TRANSCRIPT

Page 1: Hadoop summit-ams-2014-04-03

HADOOP, FROM LAB TO 24/7 PRODUCTION

http://criteolabs.com/jobs

Page 2: Hadoop summit-ams-2014-04-03

criteolabs.com/jobs

Jean-Baptiste NOTE

[email protected]

Ana DIN

[email protected]

From the Criteo HPC Team(+ Loïc / Serge / Maxime / Samuel / Yann / Stuart)

ABOUT US

Page 3: Hadoop summit-ams-2014-04-03

criteolabs.com/jobs

CRITEO ?

6 DATA CENTERS, 4 CONTINENTS.120 BILLION REQUESTS/DAY*.

* EVERY DAY CRITEO IS CALLED MORE THAN 100 BILLION TIMES BY ADVERTISERS AND PUBLISHERS

54 OPEN POSITIONS IN PARIS’ R&Dhttp://criteolabs.com/jobs

Page 4: Hadoop summit-ams-2014-04-03

criteolabs.com/jobs

« Anything that can go wrong - will go wrong »-- Murphy’s Law

TALES OF A TECHNOLOGY ADOPTION

Page 5: Hadoop summit-ams-2014-04-03

criteolabs.com/jobs

Usage of Hadoop is growing exponentially

• Learning curve is real• Analysts discover interesting things with raw data

– Which causes them to ask more questions• Increased insight leads to a better product

– Which leads to more data• Data gains in value and more is kept (and studied!)

• YOU (the admin) are the bottleneck !

USAGE GROWTH

Page 6: Hadoop summit-ams-2014-04-03

criteolabs.com/jobs

• Administration automation• Hadoop configuration tuning• Network• Multitenancy

TOPICS

Page 7: Hadoop summit-ams-2014-04-03

criteolabs.com/jobs

ADMINISTRATION AUTOMATION

Page 8: Hadoop summit-ams-2014-04-03

criteolabs.com/jobs

Rack and load!• Machine is racked, cabled and provisionned for a role• Chef is our one stop-shop for automation• Diskless system install

AUTOMATING DEPLOYMENTS

INSTA- CLUSTER!

Page 9: Hadoop summit-ams-2014-04-03

criteolabs.com/jobs

• Learn from the past• Previous cluster 1.5 years operation• 78% failure rate on /dev/sda at restart

• Disk usage symmetry

• Garanteed statelessness

OS DISKLESS : WHY

Page 10: Hadoop summit-ams-2014-04-03

criteolabs.com/jobs

• PXE Boot on custom CentOs image• Automated Chef bootstrap• Everything done by Chef

– Inventory– Firmware updates– OS / Service deployment

OS DISKLESS : HOW

Page 11: Hadoop summit-ams-2014-04-03

criteolabs.com/jobs

• Evolutive maintenance (version bump)• Not much to do on normal ops• Most freq. issue is flacking / slow performing host

• Use Preprod / Prod for infra changes• Progressive VS black out

MAINTENANCE

Page 12: Hadoop summit-ams-2014-04-03

criteolabs.com/jobs

• User facing interfaces• Jobtracker• Fsimage checkpointing• HDFS usage and local disk usage

MONITORING

Page 13: Hadoop summit-ams-2014-04-03

criteolabs.com/jobs

HADOOP CONFIG TUNING

Page 14: Hadoop summit-ams-2014-04-03

criteolabs.com/jobs

• Hadoop is a DDOS to your infrastructure– Increase ARP retention (L2-specific)– Use NSCD

• Increase Read ahead• Disable THP compaction• MTU jumbo frames

SYSTEM CONFIGS

Page 15: Hadoop summit-ams-2014-04-03

criteolabs.com/jobs

CLUSTER CONFIGS

Page 16: Hadoop summit-ams-2014-04-03

criteolabs.com/jobs

CLUSTER CONFIGS

• Adjust log settings (default is INFO,console)• Increase handler counts (JT,NN,DN)• Use namenode.service.handler.count• Watch out for checkpointing loops

Page 17: Hadoop summit-ams-2014-04-03

criteolabs.com/jobs

NETWORK

Page 18: Hadoop summit-ams-2014-04-03

criteolabs.com/jobs

• One datacenter topology will not fit all• Web traffic VS Hadoop traffic• Historical Fat-tree hierarchy with layer 2 routing• Switched to meshed design (soon layer3)

NETWORK TOPOLOGY

Page 19: Hadoop summit-ams-2014-04-03

criteolabs.com/jobs

• Rack awareness (of course !)– Performance– Reliability– Maintenance (eg. relocation)

HADOOP TOPOLOGY

Page 20: Hadoop summit-ams-2014-04-03

criteolabs.com/jobs

• HDFS Quotas• Scheduling (user-facing)• Map / Reduce ratio

• Use Yarn !

MULTITENANCY

Page 21: Hadoop summit-ams-2014-04-03

criteolabs.com/jobs

SECURITY

Page 22: Hadoop summit-ams-2014-04-03

criteolabs.com/jobs

• Dedicated kdc / realm• Dedicated services principals• Cross-realm trusts• Delegate user management to your IT

KERBEROS SETUP

Page 23: Hadoop summit-ams-2014-04-03

criteolabs.com/jobs

• Use multiple proxies• Easy way to interconnect to the outside world• Data injection / read with a simple curl• High bandwidth transfers

HTTPFS PROXIES

Page 24: Hadoop summit-ams-2014-04-03

criteolabs.com/jobs

• Multiple use cases (ML, BI analytics)• Baseline Json (+gzip) is ok• Don’t optimize too early• We still use it(*) at Peta scale

(*) some teams also use Parquet and contributed to Hive integration

FILE FORMATS

Page 25: Hadoop summit-ams-2014-04-03

criteolabs.com/jobs

QUESTIONS ?

Page 26: Hadoop summit-ams-2014-04-03

criteolabs.com/jobs

Did I say we’re hiring!

We’re hiring lots of engineers in 2014. Come join us!

http://criteolabs.com/jobs

MY FELLOW CRITEOS WOULD KILL ME…