log analysis with hadoop in livedoor 2013

43
Log analysis system with Hadoop in livedoor 2013 Winter 2013/01/20 Hadoop Conference Japan 2013 Winter TAGOMORI Satoshi (@tagomoris) NHN Japan Corp. 13121日月曜日

Upload: satoshi-tagomori

Post on 27-Jan-2015

110 views

Category:

Technology


3 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Log analysis with Hadoop in livedoor 2013

Log analysis systemwith Hadoop

in livedoor 2013 Winter

2013/01/20Hadoop Conference Japan 2013 Winter

TAGOMORI Satoshi (@tagomoris)NHN Japan Corp.

13年1月21日月曜日

Page 2: Log analysis with Hadoop in livedoor 2013

TAGOMORI SATOSHI (@TAGOMORIS)NHN JAPAN CORP.

WEB SERVICE BUSINESS DIVISION DEVELOPMENT DEPARTMENT 2(IN JAN 2012, LIVEDOOR -> NHN JAPAN)

13年1月21日月曜日

Page 3: Log analysis with Hadoop in livedoor 2013

13年1月21日月曜日

Page 4: Log analysis with Hadoop in livedoor 2013

13年1月21日月曜日

Page 5: Log analysis with Hadoop in livedoor 2013

livedoor in NHN Japan

13年1月21日月曜日

Page 6: Log analysis with Hadoop in livedoor 2013

13年1月21日月曜日

Page 7: Log analysis with Hadoop in livedoor 2013

large scale web services400+ Web Servers

5Gbps @ Aug 2009

15Gbps @ Aug 2011

20+Gbps @ Jan 2013

(direct outbound + CDN)

13年1月21日月曜日

Page 8: Log analysis with Hadoop in livedoor 2013

giant access log traffic

At Aug 2011 (HCJ2011)

From 96 servers

580GB/day

13年1月21日月曜日

Page 9: Log analysis with Hadoop in livedoor 2013

giant access log trafficNOW (At Jan 2013 HCJ2013W)

From 320+ servers

1.5+ TB/day (raw)

5,300,000,000+ lines/day

120,000+ lines/sec (peak time)

400Mbps log traffic13年1月21日月曜日

Page 10: Log analysis with Hadoop in livedoor 2013

What we want to do

COUNT PV,UU and others (daily)

COUNT Service metrics (daily/hourly)

FIND Surprised Errors [4xx,5xx] (immediately)

CHECK Response Times (immediately)

SERCH Logs in troubles (hourly/immediately)

13年1月21日月曜日

Page 11: Log analysis with Hadoop in livedoor 2013

Batches and StreamsHadoop is for batchesHigh performance batch is important

HDFS has good performance

Stream log writing and calcurationsare also VERY VERY IMPORTANT

Hybrid System:Stream processing + Batch

13年1月21日月曜日

Page 12: Log analysis with Hadoop in livedoor 2013

System OverviewWeb Servers Fluentd

Cluster

ArchiveStorage(scribed)

FluentdWatchers Graph

Tools

Notifications(IRC)

Hadoop Cluster(HDFS, YARN)

webhdfs

HuahinManager

hiveserver

STREAM

Shib ShibUI

BATCH SCHEDULEDBATCH

13年1月21日月曜日

Page 13: Log analysis with Hadoop in livedoor 2013

Hadoop in livedoor 201318 nodes (Master 3 + Slave 15)

120core, 180GB RAM, 100TB HDFS

CDH4.1.2

NameNode HA(QJM), WebHDFS

YARN, Hive + HiverServer1

13年1月21日月曜日

Page 14: Log analysis with Hadoop in livedoor 2013

Fluentd in livedoor 2013

16 nodes (Deliver 4 + Worker 10 + Watcher 2)

Fluentd (latest release / trunk)

Ruby based message transfer daemon

Many plugins from rubygems.org

13年1月21日月曜日

Page 15: Log analysis with Hadoop in livedoor 2013

Hadoop/Fluentd engineerin livedoor 2013

1 person.

13年1月21日月曜日

Page 16: Log analysis with Hadoop in livedoor 2013

Processes OverviewLog collection / Archiving

Parse / Transform / Add flags

Load into Hive tables

On-demand queries

Scheduled queries

Stream aggregations + Notifications

13年1月21日月曜日

Page 17: Log analysis with Hadoop in livedoor 2013

Past and present1st gen: Fully batch (late 2011)

Scribed + Hadoop

2nd gen: Partially stream processing (earlier 2012)

Fluentd + Hadoop

3rd gen: Fully stream processing (late 2012)

Fluentd + Hadoop + Graph Tools

4th gen: New Cluster with CDH4 (earlier 2013)

13年1月21日月曜日

Page 18: Log analysis with Hadoop in livedoor 2013

BREAK.

13年1月21日月曜日

Page 19: Log analysis with Hadoop in livedoor 2013

BATCH

1st gen: First impl.Web Servers Scribed

ArchiveStorage(scribed)

Hadoop ClusterCDH3b2

(Hadoop Streaming)

hiveserver

STREAM

Shib

(LIBHDFS)

13年1月21日月曜日

Page 20: Log analysis with Hadoop in livedoor 2013

Shib: Hive Web Client

https://github.com/tagomoris/shib13年1月21日月曜日

Page 21: Log analysis with Hadoop in livedoor 2013

1st gen: Fully batchLog collection / Archiving

Parse / Transform / Add flags

Load into Hive tables

On-demand queries

Scheduled queries

Stream aggregations + Notifications

Scribed(libhdfs)

Hadoop Streaming

HiveServer + Shib

13年1月21日月曜日

Page 22: Log analysis with Hadoop in livedoor 2013

1st gen: Fully batch

Simplicity: easy to implement

Shib: easy to run on-demand query

Latency: hourly rotation + import batch

Performance: import batch needs CPU

Scribed: libhdfs dependency problem

13年1月21日月曜日

Page 23: Log analysis with Hadoop in livedoor 2013

2nd gen: +FluentdWeb Servers Fluentd

Cluster

ArchiveStorage(scribed)

Hadoop ClusterCDH3u2

(Hive)

Cludera Hoop

HuahinManager

hiveserver

STREAM

Shib

BATCH

13年1月21日月曜日

Page 24: Log analysis with Hadoop in livedoor 2013

Fluentd stream processingout_exec_filter

any filter programs with STDIN/STDOUTcompatible with Hadoop Streaming!

out_hoopoutput plugin to write HDFS over HoopHoop: a.k.a. HttpFs in Hadoop 2.0.x

13年1月21日月曜日

Page 25: Log analysis with Hadoop in livedoor 2013

Fluentd stream processingWeb Servers

Fluentd deliver

Fluentd deliver

Fluentd deliver

Fluentd worker

Fluentd worker

Fluentd worker

Fluentd worker

Fluentd worker

Fluentd worker

Hoop Server

HDFS

13年1月21日月曜日

Page 26: Log analysis with Hadoop in livedoor 2013

Huahin ManagerREST API for:

JobTracker (MRv1)

ResourceManager (YARN)

HiveServer

http://huahinframework.org/huahin-manager/

13年1月21日月曜日

Page 27: Log analysis with Hadoop in livedoor 2013

2nd gen: +FluentdLog collection / Archiving

Parse / Transform / Add flags

Load into Hive tables

On-demand queries

Scheduled queries

Stream aggregations + Notifications

Fluentd

Fluentd

HiveServer + Shib

13年1月21日月曜日

Page 28: Log analysis with Hadoop in livedoor 2013

2nd gen: +FluentdCompatibility:

RPC based HDFS/JobTracker Access

Performance: import needs no CPU (Load Only)

Latency: hourly rotation only

Latency: hourly rotation for any queries

Hoop Server: SPOF / traffic bottleneck13年1月21日月曜日

Page 29: Log analysis with Hadoop in livedoor 2013

3rd gen: ++++++Web Servers Fluentd

Cluster

ArchiveStorage(scribed)

FluentdWatchers Graph

Tools

Notifications(IRC)

Hadoop ClusterCDH3u5

(Hive)

webhdfs

HuahinManager

hiveserver

STREAM

Shib ShibUI

BATCH SCHEDULEDBATCH

13年1月21日月曜日

Page 30: Log analysis with Hadoop in livedoor 2013

HttpFs (Hoop)

WebHDFS (CDH3u5 or CDH4)

Java NativeHTTP

NameNode

DataNode

DataNode

DataNode

httpfsserverClient

WebHDFS

HTTP

Client

NameNode

DataNode

DataNode

DataNode

13年1月21日月曜日

Page 31: Log analysis with Hadoop in livedoor 2013

Fluentd online aggregation

Semi-realtime aggregation to:

counts errors of HTTP response

calculate avg/%tiles of response time

draw graphs immediately

Many plugins for real time aggregation

13年1月21日月曜日

Page 32: Log analysis with Hadoop in livedoor 2013

Graph Tools:GrowthForecast / HRForecast

Graph drawing tools to update values

over very simple HTTP request

GrowthForecast: Real-time values

HRForecast: Summarized (past) values

13年1月21日月曜日

Page 33: Log analysis with Hadoop in livedoor 2013

HTTP Status/Response Timeon GrowthForecast

HTTP STATUS: 2XX(BLUE),3XX(GREEN),4XX(ORANGE), 5XX(RED)

HTTP RESPONSE TIMES: AVG, [90, 95, 98, 99]PERCENTILE

http://kazeburo.github.com/GrowthForecast/

13年1月21日月曜日

Page 34: Log analysis with Hadoop in livedoor 2013

ShibUI

13年1月21日月曜日

Page 35: Log analysis with Hadoop in livedoor 2013

ShibUI

https://github.com/kazeburo/hrforecast

13年1月21日月曜日

Page 36: Log analysis with Hadoop in livedoor 2013

3rd gen: +++++++Log collection / Archiving

Parse / Transform / Add flags

Load into Hive tables

On-demand queries

Scheduled queries

Stream aggregations + Notifications

Fluentd

Fluentd

HiveServer + Shib

FluentdShibUI

13年1月21日月曜日

Page 37: Log analysis with Hadoop in livedoor 2013

3rd gen: +++++++NO SPOF: for data stream

Real time monitoring

Queries for services:

Scheduled queries, Visualization

Latency: hourly rotation for any queries

SPOF: NameNode (VIP & DRBD is xxxx...)

13年1月21日月曜日

Page 38: Log analysis with Hadoop in livedoor 2013

4th gen: NOWWeb Servers Fluentd

Cluster

ArchiveStorage(scribed)

FluentdWatchers Graph

Tools

Notifications(IRC)

Hadoop ClusterCDH4

(HDFS, YARN)

webhdfs

HuahinManager

hiveserver

STREAM

Shib ShibUI

BATCH SCHEDULEDBATCH

13年1月21日月曜日

Page 39: Log analysis with Hadoop in livedoor 2013

4th gen: CDH4.1.2

NO SPOF: QJM based NameNode HA

Performance: YARN (?)

Latency: multiple rotation in an hour

with hive table schema change

NONE should be improved!

13年1月21日月曜日

Page 40: Log analysis with Hadoop in livedoor 2013

Good parts for solo engineer:

RPC: Loosely-coupled architectureHigh compatibility / Low maintenance cost

Open SourceAll components are OSS

Open knowledgeWell blogged / presentationed

13年1月21日月曜日

Page 41: Log analysis with Hadoop in livedoor 2013

OUR DRIVER IS"OPENNESS"

thanks to crouton & @kbysmnr !13年1月21日月曜日

Page 43: Log analysis with Hadoop in livedoor 2013

See also:Hadoop and Subsystem in livedoor (2011)

http://www.slideshare.net/tagomoris/hadoop-and-subsystems-in-livedoor-hcj11f

Distributed message stream processing on Fluentdhttp://www.slideshare.net/tagomoris/distributed-stream-processing-on-fluentd-fluentd

Hive Tools in NHN Japanhttp://www.slideshare.net/tagomoris/hive-tools-in-nhn-japan-hadoopreading

OSS based large scale log aggregation in livedoorhttp://www.slideshare.net/tagomoris/oss-nhntech

Fluentd and WebHDFShttp://www.slideshare.net/tagomoris/fluentd-and-webhdfs

13年1月21日月曜日