introduction · rely on the cloud for scaling keep raw data to keep your options open 5. netbeam...

33

Upload: others

Post on 30-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction · Rely on the cloud for scaling Keep raw data to keep your options open 5. netbeam Network Analytics in Google Cloud Three Pillars 1. Real time analytics Low latency,
Page 2: Introduction · Rely on the cloud for scaling Keep raw data to keep your options open 5. netbeam Network Analytics in Google Cloud Three Pillars 1. Real time analytics Low latency,

Introduction● Harsh realities of network analytics● netbeam● Demo● Technology Stack● Alternative Approaches● Lessons Learned

2

Page 3: Introduction · Rely on the cloud for scaling Keep raw data to keep your options open 5. netbeam Network Analytics in Google Cloud Three Pillars 1. Real time analytics Low latency,

ESnet Data, Analytics and Visualization Architecture

3

Page 4: Introduction · Rely on the cloud for scaling Keep raw data to keep your options open 5. netbeam Network Analytics in Google Cloud Three Pillars 1. Real time analytics Low latency,

The Harsh Realities of Network Analytics

1. It’s a mess

2. Things change

3. There’s always more

4. It’s never really done

● Your data isn’t neat and tidy

● Time and money are limited

● More devices & more telemetry

● What you need today may not be what you need tomorrow.

4

Page 5: Introduction · Rely on the cloud for scaling Keep raw data to keep your options open 5. netbeam Network Analytics in Google Cloud Three Pillars 1. Real time analytics Low latency,

Coping strategies

1. It’s a mess

2. Things change

3. There’s always more

4. It’s never really done

● Design knowing things won’t be tidy

● “What” not “How”

● Rely on the cloud for scaling

● Keep raw data to keep your options open

5

Page 6: Introduction · Rely on the cloud for scaling Keep raw data to keep your options open 5. netbeam Network Analytics in Google Cloud Three Pillars 1. Real time analytics Low latency,

netbeam

Network Analytics in Google Cloud

Three Pillars

1. Real time analytics ○ Low latency, incomplete

2. Offline analytics ○ High latency, complete

3. Flexible data model○ Changing needs? Recompute from raw data!

Secret sauce: Apache Beam

6

Page 7: Introduction · Rely on the cloud for scaling Keep raw data to keep your options open 5. netbeam Network Analytics in Google Cloud Three Pillars 1. Real time analytics Low latency,

What is Apache Beam?

1. The Beam Programming Model

2. SDKs for writing Beam pipelines

3. Runners for existing distributed processing backends

○ Apache Apex

○ Apache Flink

○ Apache Spark

○ Google Cloud Dataflow

○ Local runner for testing

Slide courtesy of the Apache Beam Project 7

Page 8: Introduction · Rely on the cloud for scaling Keep raw data to keep your options open 5. netbeam Network Analytics in Google Cloud Three Pillars 1. Real time analytics Low latency,

The Evolution of Apache Beam

MapReduce

BigTable DremelColossus

FlumeMegastoreSpanner

PubSub

MillwheelApache Beam

Google Cloud Dataflow

Slide courtesy of the Apache Beam Project 8

Page 9: Introduction · Rely on the cloud for scaling Keep raw data to keep your options open 5. netbeam Network Analytics in Google Cloud Three Pillars 1. Real time analytics Low latency,

Architecture DiagramApache Beam

(Stream Processing)

BigQuery(immutable)

API

SNMP collection system

Client

Bigtable(realtime)

Apache Beam(Batch Processing)

BigQuery(historical)

Old SNMP system

avro

9

Page 10: Introduction · Rely on the cloud for scaling Keep raw data to keep your options open 5. netbeam Network Analytics in Google Cloud Three Pillars 1. Real time analytics Low latency,

Architecture DiagramApache Beam

(Stream)

BigQuery(immutable)

API

SNMP collection system

Client

Bigtable(realtime)

Rollups5m, 1h, 1d avg

Align/rates

BigQuery(historical)

Percentiles

...

● Google Pubsub● Uses Python outside

of Google Cloud to poll devices and write to Pubsub topic

● Code within Google Cloud subscribes to topic to process data

Old SNMP system

avro

10

Page 11: Introduction · Rely on the cloud for scaling Keep raw data to keep your options open 5. netbeam Network Analytics in Google Cloud Three Pillars 1. Real time analytics Low latency,

Architecture DiagramApache Beam

(Stream)

BigQuery(immutable)

API

SNMP collection system

Client

Bigtable(realtime)

Rollups5m, 1h, 1d avg

Align/rates

BigQuery(historical)

Percentiles

...

● Apache Beam / Google Dataflow

● Stream processing● Subscribes to

Pubsub topic

Old SNMP system

avro

11

Page 12: Introduction · Rely on the cloud for scaling Keep raw data to keep your options open 5. netbeam Network Analytics in Google Cloud Three Pillars 1. Real time analytics Low latency,

Architecture DiagramApache Beam

(Stream)

BigQuery(immutable)

API

SNMP collection system

Client

Bigtable(realtime)

Rollups5m, 1h, 1d avg

Align/rates

BigQuery(historical)

Percentiles

...

● Apache Beam / Google Dataflow

● Stream processing● Subscribes to

Pubsub topic● Raw data is written to

BigQuery

Old SNMP system

avro

12

Page 13: Introduction · Rely on the cloud for scaling Keep raw data to keep your options open 5. netbeam Network Analytics in Google Cloud Three Pillars 1. Real time analytics Low latency,

Architecture DiagramApache Beam

(Stream)

BigQuery(immutable)

API

SNMP collection system

Client

Bigtable(realtime)

Rollups5m, 1h, 1d avg

Align/rates

BigQuery(historical)

Percentiles

...

● Apache Beam / Google Dataflow

● Stream processing● Subscribes to

Pubsub topic● Raw data is written to

BigQuery● Real time

transformed data (e.g. aligned data rates) written to Bigtable

● Writes and makes use of meta data in BigTable (not shown)

Old SNMP system

avro

13

Page 14: Introduction · Rely on the cloud for scaling Keep raw data to keep your options open 5. netbeam Network Analytics in Google Cloud Three Pillars 1. Real time analytics Low latency,

Architecture DiagramApache Beam

(Stream)

BigQuery(immutable)

API

SNMP collection system

Client

Bigtable(realtime)

Rollups5m, 1h, 1d avg

Align/rates

BigQuery(historical)

Percentiles

...

● Cloud Bigtable● Like HBase● Write to cells in rows,

indexed by keys● We write 1 day of

data to a single row (columns are the time of day, key is metric and day)

● Fast access to row by key, can serve data from here

● Store one year

Old SNMP system

avro

14

Page 15: Introduction · Rely on the cloud for scaling Keep raw data to keep your options open 5. netbeam Network Analytics in Google Cloud Three Pillars 1. Real time analytics Low latency,

Architecture DiagramApache Beam

(Stream)

BigQuery(immutable)

API

SNMP collection system

Client

Bigtable(realtime)

Rollups5m, 1h, 1d avg

Align/rates

BigQuery(historical)

Percentiles

...

● BigQuery● Data warehousing

solution● Cheap storage, SQL

access, but not suitable for real-time access

● Allows SQL queries for ad hoc investigation

● We store our source of truth here

Old SNMP system

avro

15

Page 16: Introduction · Rely on the cloud for scaling Keep raw data to keep your options open 5. netbeam Network Analytics in Google Cloud Three Pillars 1. Real time analytics Low latency,

Architecture DiagramApache Beam

(Stream)

BigQuery(immutable)

API

SNMP collection system

Client

Bigtable(realtime)

Rollups5m, 1h, 1d avg

Align/rates

BigQuery(historical)

Percentiles

...

● BigQuery● Data warehousing

solution● Cheap storage, SQL

access, but not suitable for real-time access

● Allows SQL queries for ad hoc investigation

● We store our source of truth here

● Also store historical data (7 years), imported via avro files

Old SNMP system

avro

16

Page 17: Introduction · Rely on the cloud for scaling Keep raw data to keep your options open 5. netbeam Network Analytics in Google Cloud Three Pillars 1. Real time analytics Low latency,

Architecture DiagramApache Beam

(Stream)

BigQuery(immutable)

API

SNMP collection system

Client

Bigtable(realtime)

Rollups5m, 1h, 1d avg

Align/rates

BigQuery(historical)

Percentiles

...

● Apache Beam / Google Dataflow

● Batch processing● Run with cron job

Old SNMP system

avro

17

Page 18: Introduction · Rely on the cloud for scaling Keep raw data to keep your options open 5. netbeam Network Analytics in Google Cloud Three Pillars 1. Real time analytics Low latency,

Architecture DiagramApache Beam

(Stream)

BigQuery(immutable)

API

SNMP collection system

Client

Bigtable(realtime)

Rollups5m, 1h, 1d avg

Align/rates

BigQuery(historical)

Percentiles

...

● Apache Beam / Google Dataflow

● Batch processing● Run with cron job● Recalculate Bigtable

data each night from source of truth in BigQuery

Old SNMP system

avro

18

Page 19: Introduction · Rely on the cloud for scaling Keep raw data to keep your options open 5. netbeam Network Analytics in Google Cloud Three Pillars 1. Real time analytics Low latency,

Architecture DiagramApache Beam

(Stream)

BigQuery(immutable)

API

SNMP collection system

Client

Bigtable(realtime)

Rollups5m, 1h, 1d avg

Align/rates

BigQuery(historical)

Percentiles

...

● Apache Beam / Google Dataflow

● Batch processing● Run with cron job● Recalculate Bigtable

data each night from source of truth in BigQuery

● Process Bigtable rows into new rows of 5min, 1 hr and 1 day aggregations

Old SNMP system

avro

19

Page 20: Introduction · Rely on the cloud for scaling Keep raw data to keep your options open 5. netbeam Network Analytics in Google Cloud Three Pillars 1. Real time analytics Low latency,

Architecture DiagramApache Beam

(Stream)

BigQuery(immutable)

API

SNMP collection system

Client

Bigtable(realtime)

Rollups5m, 1h, 1d avg

Align/rates

BigQuery(historical)

Percentiles

...

● Apache Beam / Google Dataflow

● Batch processing● Run with cron job● Recalculate Bigtable

data each night from source of truth in BigQuery

● Process Bigtable rows into new rows of 5min, 1 hr and 1 day aggregations

● Additional pre-computed views e.g. percentiles for traffic distribution over a month

Old SNMP system

avro

20

Page 21: Introduction · Rely on the cloud for scaling Keep raw data to keep your options open 5. netbeam Network Analytics in Google Cloud Three Pillars 1. Real time analytics Low latency,

Architecture DiagramApache Beam

(Stream)

BigQuery(immutable)

Dataserver API(node.js)

SNMP collection system

Client

Bigtable(realtime)

Rollups5m, 1h, 1d avg

Align/rates

BigQuery(historical)

Percentiles

Old SNMP system

avro

...

● API● Currently runs on

App Engine● Node.js● Serves data out of

Bigtable● Timeseries data is

served as ‘tiles’, each tile is one row

● Would like to use Cloud Endpoints and provide a gRPC service

● Looking forward to grpc-web solution

21

Page 22: Introduction · Rely on the cloud for scaling Keep raw data to keep your options open 5. netbeam Network Analytics in Google Cloud Three Pillars 1. Real time analytics Low latency,

Use case example: Historical Trends

22

Page 23: Introduction · Rely on the cloud for scaling Keep raw data to keep your options open 5. netbeam Network Analytics in Google Cloud Three Pillars 1. Real time analytics Low latency,

Use case example: Historical TrendsStream to BQ

Dataserver API(node.js)

SNMP collection system

Client

Bigtable

Per-month totals

Per-dayInterface totals

BigQuery(historical)

Old SNMP system avro

snmp-daily::2017-08::$interface

Jan 1 Jan 2

1.8 Pb 1.9 Pb

... Dec31

3.1 Pb...

snmp-monthly-totals

Jan 1991

28 Gb

Feb 1991

29 Gb

...

...

BigQuery

Sep 2017

56 Pb

Bigtable rows

23

Page 24: Introduction · Rely on the cloud for scaling Keep raw data to keep your options open 5. netbeam Network Analytics in Google Cloud Three Pillars 1. Real time analytics Low latency,

Use case: real time anomaly detectionStream to BQ

Dataserver API(node.js)

SNMP collection system

Client

Bigtable

Baseline generation

baseline::5m::avg::$interface

Mon12am

Mon1am

2.1 1.9

... Sun11pm

0.5...

anomaly::5m::avg

iface-1

+0.1

iface-2

+2.0

...

...

BigQuery

iface-n

-1.5

Anomaly detection

Mon2am

0.3

Generates avg for each interface over the past 3 months for that hour/day

Compares baseline to real time values to generate current deviation from normal

24

Page 25: Introduction · Rely on the cloud for scaling Keep raw data to keep your options open 5. netbeam Network Analytics in Google Cloud Three Pillars 1. Real time analytics Low latency,

Use case example: Percentiles

25

Page 26: Introduction · Rely on the cloud for scaling Keep raw data to keep your options open 5. netbeam Network Analytics in Google Cloud Three Pillars 1. Real time analytics Low latency,

Stream to Bigtable

Dataserver API(node.js)

SNMP collection system

Client

Bigtable

Percentiles

Daily rollups5m avg

rollup-month-5m::2017-08::$interface::in

1 2

6Gbps 5Gbps

... 8640

2Gbps...

percentiles::2017-08::$interface::in

1 pct

0.1 Gbps

2 pct

0.3 Gbps

...

...

99 pct

22.1Gbps

Bigtable rows

Use case example: Percentiles

26

Page 27: Introduction · Rely on the cloud for scaling Keep raw data to keep your options open 5. netbeam Network Analytics in Google Cloud Three Pillars 1. Real time analytics Low latency,

Example: Computing Total Traffic# Python Beam SDKpipeline = beam.Pipeline('DirectRunner')

(pipeline | 'read' >> ReadFromText('./example.csv') | 'csv' >> beam.ParDo(FormatCSVDoFn()) | 'ifName key' >> beam.Map(group_by_device_interface) | 'group by iface' >> beam.GroupByKey() | 'compute rate' >> beam.FlatMap(compute_rate) | 'timestamp key' >> beam.Map(lambda row: (row['timestamp'], row['rateIn'])) | 'group by timestamp' >> beam.GroupByKey() | 'sum by timestamp' >> beam.Map(lambda rates: (rates[0], sum(rates[1]))) | 'format' >> beam.Map(lambda row: '{},{}'.format(row[0], row[1])) | 'save' >> beam.io.WriteToText('./total_by_timestamp'))

pipeline.run()

Full code available at: http://x1024.net/blog/2017/05/chinog-flexible-network-analytics-in-the-cloud/ 27

Page 28: Introduction · Rely on the cloud for scaling Keep raw data to keep your options open 5. netbeam Network Analytics in Google Cloud Three Pillars 1. Real time analytics Low latency,

Our Stack● Apache Beam using Scio● Google Cloud Platform

○ Dataflow○ Bigtable○ BigQuery○ Pub/Sub○ App Engine

● Languages○ Scala○ Javascript / Typescript○ Python

Cloud Dataflow

BigQuery Cloud Bigtable

Cloud Endpoints

App Engine

Cloud Pub/Sub

28

Page 29: Introduction · Rely on the cloud for scaling Keep raw data to keep your options open 5. netbeam Network Analytics in Google Cloud Three Pillars 1. Real time analytics Low latency,

Current Status & Future PlansCurrent

Release candidate for SNMP data:

● Ingest to BigQuery is working● Migration of historical data is complete● Streaming ingest to Bigtable ● Early version of utilization visualization● Simple data server can provide data to

clients, but gRPC API coming● Interface time series charts functional

29

Future

More types of data:

● Flow data● perfSONAR

Machine Learning

Anomaly Detection

“Mash up” various data sources

Page 30: Introduction · Rely on the cloud for scaling Keep raw data to keep your options open 5. netbeam Network Analytics in Google Cloud Three Pillars 1. Real time analytics Low latency,

Why not InfluxDB, Elastic or ${FAVORITE_DB}● We have a data processing problem, not a data storage problem per se.

○ Beam and the ecosystem around it give a huge amount of flexibility -- can try new ideas as they occur to us

○ Ability to move to different platform components○ machine learning (TensorFlow and others)

● InfluxDB & Elastic ○ require care and feeding -- have to think about disks and machines, etc.○ At our last evaluation (a while ago now) InfluxDB wasn’t able to keep up with our load -- this

may have changed but other benefits outweigh that.○ Elastic doesn’t seem to be a good fit for long term storage -- everything is in the “hot” tier

30

Page 31: Introduction · Rely on the cloud for scaling Keep raw data to keep your options open 5. netbeam Network Analytics in Google Cloud Three Pillars 1. Real time analytics Low latency,

Why the cloud? Why Google Cloud Platform?Why the cloud?

● Focus on our problems not on infrastructure● Scalability without needing to own lots of systems● Managed services for databases and compute

Why Google Cloud?

● Apache Beam was Google Dataflow when we first encountered it● More cohesive ecosystem than AWS in our experience● Although we have used Google Cloud specific services, the approach is

portable to other environments31

Page 32: Introduction · Rely on the cloud for scaling Keep raw data to keep your options open 5. netbeam Network Analytics in Google Cloud Three Pillars 1. Real time analytics Low latency,

Lessons learned / Life in the cloud / Good & BadThe Good

● Not a silver bullet, but makes many things are easier● Scaling! We processed 9,902,585,175 data points in 3.5 hours● Focus on your services, not on infrastructure● Scio and Scala allow working at a high level of abstraction

The Not So Good

● GCP tech support is pretty bad● Python is a second class citizen in Beam for now● Scala is powerful but challenging at times● Learning curve is pretty steep in places

32

●●●

Page 33: Introduction · Rely on the cloud for scaling Keep raw data to keep your options open 5. netbeam Network Analytics in Google Cloud Three Pillars 1. Real time analytics Low latency,

Thank you!Jon Dugan <[email protected]>

● MyESnet: https://my.es.net● ESnet Open Source: http://software.es.net/

○ http://software.es.net/react-timeseries-charts/ ○ http://software.es.net/pond/ ○ http://software.es.net/react-network-diagrams/

● Scio: https://github.com/spotify/scio ● Beam: https://beam.apache.org

33

The ESnet netbeam team:

● Peter Murphy● Monte Goode● Sowmya Balasubramanian● Scott Richmond