gocon autumn (story of our own monitoring agent in golang)

50
Story of our own Monitoring Agent in golang @dxhuy LINE corp

Upload: huy-do

Post on 21-Jan-2018

2.161 views

Category:

Software


0 download

TRANSCRIPT

Page 1: GOCON Autumn (Story of our own Monitoring Agent in golang)

Story of our own Monitoring Agent

in golang@dxhuy

LINE corp

Page 2: GOCON Autumn (Story of our own Monitoring Agent in golang)

Introduction

• @dxhuy • Vietnamese • Building monitoring stack at LINE

Page 3: GOCON Autumn (Story of our own Monitoring Agent in golang)

My goal today• Join GoConference without lottery

Page 4: GOCON Autumn (Story of our own Monitoring Agent in golang)

My goal today• Show that this is not 100% true

Page 5: GOCON Autumn (Story of our own Monitoring Agent in golang)

Today takeaway

→Anatomy of monitoring agent →How to design one →Challenges and learn

Page 6: GOCON Autumn (Story of our own Monitoring Agent in golang)

Monitoring Agent !?

Page 7: GOCON Autumn (Story of our own Monitoring Agent in golang)
Page 8: GOCON Autumn (Story of our own Monitoring Agent in golang)
Page 9: GOCON Autumn (Story of our own Monitoring Agent in golang)

• Small application run on host machine • Collect host machine metrics

• Request latency? • MySQL load? • Redis hit/miss rate? • .....

• Aggregate metrics (sum/avg/histogram..) • Send to collector server → alert / chart ...

• statsd / collectd / telegraf...

Page 10: GOCON Autumn (Story of our own Monitoring Agent in golang)
Page 11: GOCON Autumn (Story of our own Monitoring Agent in golang)

Not a generic log transfer

Page 12: GOCON Autumn (Story of our own Monitoring Agent in golang)

Why not reuse existing technology?

• Scale problem • We need to write our own stack

• Various environment problem • Management problem • Development velocity problem

Page 13: GOCON Autumn (Story of our own Monitoring Agent in golang)

Let's start write our own

Page 14: GOCON Autumn (Story of our own Monitoring Agent in golang)

Language

Page 15: GOCON Autumn (Story of our own Monitoring Agent in golang)
Page 16: GOCON Autumn (Story of our own Monitoring Agent in golang)
Page 17: GOCON Autumn (Story of our own Monitoring Agent in golang)

Features

Page 18: GOCON Autumn (Story of our own Monitoring Agent in golang)

• Modularity (for user)

• Buffer (prevent data loss)

• Management friendly (for admin)

Page 19: GOCON Autumn (Story of our own Monitoring Agent in golang)

Modularity

• What is modularity? • Easily to add new metrics from user

view • Pluggable

Page 20: GOCON Autumn (Story of our own Monitoring Agent in golang)

Modularity• How?

• Input : get metric • Codec : understand metric • Output : send metric

Page 21: GOCON Autumn (Story of our own Monitoring Agent in golang)

// Metric is central model for imonDtype Metric struct {

ProtocolVersion ProtocolVerName stringVal ValueTimeStamp time.TimeFingerprint FingerprintType MetricTypeLabels map[string]string

}

Page 22: GOCON Autumn (Story of our own Monitoring Agent in golang)

Input Plugin design

Page 23: GOCON Autumn (Story of our own Monitoring Agent in golang)

Input Plugin design

• Three important things: • Process model • Plugin model • Collecting model (push vs pull)

Page 24: GOCON Autumn (Story of our own Monitoring Agent in golang)

Process model

Single process vs

Multiple process

Page 25: GOCON Autumn (Story of our own Monitoring Agent in golang)

Process model

- Adv : easy management / maintainance

- DisAdv : one bad plugin could affect the whole

Page 26: GOCON Autumn (Story of our own Monitoring Agent in golang)

Same language vs

Embedded language

Plugin model

Page 27: GOCON Autumn (Story of our own Monitoring Agent in golang)

Plugin model- Adv: Simple model, better maintainance - DisAdv: each time add new plugin, need to restart the whole agent

Page 28: GOCON Autumn (Story of our own Monitoring Agent in golang)

// InputPlugin represent an input plugin interfacetype InputPlugin interface {

Interval() config.DurationGracefulStop() errorName() stringType() InputType

}

type InputByte interface {Decoder() codec.DecoderReadBytesWithContext(ctx context.Context) ([]byte, error)

}

type InputMetrics interface {ReadMetricsWithContext(ctx context.Context) (model.Metrics, error)

}

All plugins share same interface

Page 29: GOCON Autumn (Story of our own Monitoring Agent in golang)

Push vs

Pull

Collecting model

Page 30: GOCON Autumn (Story of our own Monitoring Agent in golang)

Collecting model

- Adv: less affect to middleware, simple model - DisAdv: Application need to expose some thing to "pull" (http endpoint / file / ..)

Page 31: GOCON Autumn (Story of our own Monitoring Agent in golang)

func (i *MemcachedInput) ReadMetricsWithContext(ctx context.Context) (model.Metrics, error) {

..............conn, err := net.DialTimeout("tcp", i.endpoint, i.timeout.Duration)if err != nil {

return nil, err}defer conn.Close()

_, err = conn.Write([]byte("stats\n"))if err != nil {

return nil, err}..................scanner := bufio.NewScanner(conn)

for scanner.Scan() {text := scanner.Text()if text == "END" {

break}// Split entries which look like: STAT time 1488291730entries := strings.Split(text, " ")if len(entries) == 3 {

v, err := strconv.ParseInt(entries[2], 10, 64)if err != nil {

log.Debug("invalid value %s", entries[2])continue

}

ms = append(ms, *model.NewMetric(entries[1],model.Value(float64(v)),time.Now(),model.GaugeType,

))}

}..........return ms, nil

}

Pull sample directly contact server

Page 32: GOCON Autumn (Story of our own Monitoring Agent in golang)

Codec Plugin / Output Plugin

Page 33: GOCON Autumn (Story of our own Monitoring Agent in golang)

type Encoder interface {//Name() stringEncode(metrics model.Metrics) ([]byte, error)Name() string

}

type Decoder interface {//Name() stringDecode(input []byte) (model.Metrics, error)Name() string

}

Codec interface

Page 34: GOCON Autumn (Story of our own Monitoring Agent in golang)

// OutputPlugin represent an output plugin interfacetype OutputPlugin interface {

WriteWithContext(ctx context.Context, metrics model.Metrics) error // for Cancellable write

Encoder() codec.EncoderInterval() config.DurationGracefulStop() errorWalReader() wal.LogReaderName() string

}

Output interface

Page 35: GOCON Autumn (Story of our own Monitoring Agent in golang)

Buffer design

Page 36: GOCON Autumn (Story of our own Monitoring Agent in golang)

each Output maintain its own offset i offset will be update when output success

Buffer design

Page 37: GOCON Autumn (Story of our own Monitoring Agent in golang)

Buffer design• Advantages

• When output failed, just rollback index

• Chunks will be organized by segments (each segments ~ 1GB) • To clean up, just delete old segments

which already consumed by all output

Page 38: GOCON Autumn (Story of our own Monitoring Agent in golang)

Buffer design• Other concerns

• Serialization • It's not hard to write your own serialization method (link)

• mmap vs file read • not much different in our case • mmap index management is cubersome to write because it

has to manipulate at 2^n address

• Concurrent write vs Synchronized write • Synchronized write for data safety

https://www.slideshare.net/dxhuy88/story-writing-byte-serializer-in-golang

Page 39: GOCON Autumn (Story of our own Monitoring Agent in golang)

Buffer designtype LogReader interface {

Read() (model.Metrics, error)Read1() (model.Metrics, error)CurrentOffset() int64SetOffset(int64) errorDestroy() error

}

type LogWriter interface {Write(*model.Metrics) errorLastOffset() int64

}

Page 40: GOCON Autumn (Story of our own Monitoring Agent in golang)

Management friendly

• Monitoring agents is f**king hard

• Deploy agents in large scale is painful

Page 41: GOCON Autumn (Story of our own Monitoring Agent in golang)

Potential risk

• Die without noticing • Over resource consume • Overflow buffer • Dirty data • Resend storm

Page 42: GOCON Autumn (Story of our own Monitoring Agent in golang)

Resend storm is aweful

Page 43: GOCON Autumn (Story of our own Monitoring Agent in golang)

How we solve those problems

• Expose agent state as http endpoint • and monitoring them all using prometheus • Monitoring everything

• Aliveness / CPU / Memory / Output Lag • Using circuitbreaker / jitter resend to

prevent resend storm

Page 44: GOCON Autumn (Story of our own Monitoring Agent in golang)

func (b *AutoOpenBreaker) Close() {log.Info("close breaker for %v", b.autoOpenTime)b.state = CLOSEb.closeTime = time.Now()go b.autoOpen()

}

func (b *AutoOpenBreaker) open() {b.state = OPEN

}

func (b *AutoOpenBreaker) IsOpen() bool {return b.state == OPEN

}

func (b *AutoOpenBreaker) autoOpen() {tick := time.Tick(b.autoOpenTime)select {case <-tick:

log.Info("auto open breaker after %v", b.autoOpenTime)b.open()

}} Circuit

breaker

Page 45: GOCON Autumn (Story of our own Monitoring Agent in golang)

func (i *Output) retry(left int, cancelCtx context.Context, f func() error) error {

select {case <-cancelCtx.Done():

return fmt.Errorf("got cancelled")default: // no-op}

// jitter retrym := math.Min(capacity, float64(base*math.Pow(2.0, float64(maxRetry-

left))))s := rand.Intn(int(m))log.Debug("retry sleep %d second", s)time.Sleep(time.Duration(s) * time.Second)

// do some work....}

jitter

Page 46: GOCON Autumn (Story of our own Monitoring Agent in golang)

Agent monitoring using prometheus / grafana

Page 47: GOCON Autumn (Story of our own Monitoring Agent in golang)

Export agent own metrics at http://host:port/agent_metrics

Page 48: GOCON Autumn (Story of our own Monitoring Agent in golang)

Admin page

Page 49: GOCON Autumn (Story of our own Monitoring Agent in golang)

Finally• Golang is awesome

• Quick prototype, works everywhere • Never, ever write your own agent

• ... unless you have to • But it's fun because there're a lot of

problems

Page 50: GOCON Autumn (Story of our own Monitoring Agent in golang)

We're hiring