time series database open source distributed introducing ...files.meetup.com/1406240/introducing...

67
Introducing InfluxDB, an open source distributed time series database Paul Dix @pauldix [email protected]

Upload: others

Post on 25-Sep-2019

35 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

Introducing InfluxDB, an open source distributed

time series databasePaul Dix@pauldix

[email protected]

Page 2: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

● Co-founder, CEO of Errplane (YC W13)● Organizer of NYC Machine Learning● Author of “Service Oriented Design with

Ruby & Rails”

About me

Page 3: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

Series editor for Addison Wesley’s “Data & Analytics”

Page 4: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

What is a time series?

Page 5: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

Metrics

Page 6: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series
Page 7: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series
Page 8: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series
Page 9: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series
Page 10: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

Events

● Measurements● Exceptions● Page Views● User actions● Commits● Deploys● Things happening in time...

Page 11: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

Analyticsoperations, developers, users, business

Page 12: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

Things you want to ask questions about,

visualize, or summarize over time.

Page 13: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

Actually a summarization

Page 14: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

Also a summarization

Page 15: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

What about...“...order by some_time_col”

Page 16: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

Why a database for time series?

Page 17: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

Billions of data points. Scale horizontally.

Page 18: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

HTTP native.API to build on.

Page 19: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

Built in tools for downsampling and

summarizing

Page 20: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

Automatically clear out old data if we want

Page 21: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

Process or monitor data as it comes in, like Storm

Page 22: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

Visualize and Summarize

● Graphs & dashboards● Last 10 minutes● Last 4 hours● Last 24 hours● Past week● Past month● YTD● All Time

Page 23: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

Data Collection

● Statsd - https://github.com/etsy/statsd/● CollectD - http://collectd.org/● Heka - https://github.com/mozilla-

services/heka● l2met - https://github.

com/ryandotsmith/l2met● Libraries● Framework integrations● Cloud integrations (AWS, OpenStack)● Third-party integrations

Page 24: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

Existing Tools

● RRDTool (metrics)● Graphite (metrics)● OpenTSDB (metrics + events)● Kairos (metrics + events)● and others...

Page 25: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

Something missing...

Page 26: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series
Page 27: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series
Page 28: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series
Page 29: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series
Page 30: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

InfluxDB: harness lightning, get 1.21

gigawatts.

Page 31: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

InfluxDB

● Written in Go● Uses LevelDB for storage (may change)● Self contained binary● No external dependencies● Distributed (in December)

Page 32: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

HTTP Native

● Read/write data via HTTP● Manage via HTTP● Security model to allow access directly from

browser

Page 33: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

How data is organized

● Databases (like in MySQL, Postgres, etc)● Time series (kind of like tables)● Points or events (kind of like rows)

Page 34: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

Security

● Cluster admins● Database admins● Database users

○ read permissions■ only certain series■ only queries with a column having a specific

value (e.g. customer_id=32)○ write permissions

■ only certain series■ only with columns having a specific value

Page 35: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

InfluDB Setup

● http://play.influxdb.org● OSX

○ brew update && brew install influxdb● http://influxdb.org/download● Ubuntu

○ sudo dpkg -i influxdb_latest_amd64.deb● RedHat

○ sudo rpm -ivh influxdb-latest-1.i686.rpm

Page 36: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

Examples, but sadly no R :(

Page 37: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

HTTP API docs athttp://influxdb.org/docs/api/http

Page 38: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

https://github.com /influxdb/influxdb-r

fork, write sweet code, submit PR, be loved and adored FOREVER

Page 39: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

Create a databasecurl -X POST \ 'http://localhost:8086/db?u=root&p=root' \ -d '{"name":"mydb", "replicationFactor": 3}'

Page 40: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

Add a user

curl -X POST\ 'http://.../db/mydb/users?u=root&p=root' -d \ '{"name":"paul", "password": "foo", "admin": true}'

Page 41: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

Write points

curl -X POST \ 'http://localhost:8086db/mydb/series?u=paul&p=pass' \ -d '[{"name":"foo", "columns":["val"], "points": [[3]]}]'

Page 42: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

Querying

curl \'http://...:8086/db/mydb/series?u=paul&p=pass&q=...'

Page 43: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

SQL(ish) Query Language

select * from user_events where time > now() - 4h

Page 44: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

[{ "name": "foo", "columns": [ "time", "sequence_number", "val1", "val2" ], "points": [ [1384295094, 3, "paul", 23], [1384295094, 2, "john", 92], [1384295094, 1, "todd", 61] ] }, {...}]

JSON data returned

Page 45: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

select count(state) from user_eventsgroup by time(5m), state where time > now() - 7d

Page 46: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

select percentile(value, 90) from response_timesgroup by time(30s)where time > now() - 1h

Page 47: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

select percentile(value, 90) from response_timesgroup by time(5m)into response_times.percentiles.90

Continuous Queries (downsampling)

Page 48: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

Continuous queries for real-time processing &

monitoring

Page 49: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

Regexes

select * from eventswhere email =~ /.*gmail\.com/

Page 50: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

select percentile(value, 99)from /stats\.*/into :series_name.percentiles.99

Page 51: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

select count(value)from seriesA merge seriesB

Page 52: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

Querying

● Functions○ count, min, max, mean, distinct, median, mode,

percentiles, derivative, stddev● Where clauses● Group by clauses (time and other columns)● Periodically delete old raw data

Page 53: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

Built in UI

Page 54: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

CLI

Page 55: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

Libraries

● Ruby● Frontend JS● Node● Python● PHP● Go (soon)● Java (soon)

Page 56: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

Ideas to come...

● Custom functions○ Embedded LUA, YARN like interface, or both?

● Custom real-time queries○ define custom logic and InfluxDB will feed it data

● Queries triggering web hooks○ pair with custom functions for monitoring/anomaly

detection

Page 57: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

Project Status

● Based on work at https://errplane.com○ 2 billion points per month

● http://influxdb.org● Code available at https://github.com/influxdb● API finalized in the next month● Clustered version in December● Production ready by end of year

Page 58: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

We’re available for consulting/help

Page 59: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

We need your help

● API, what else would you like to see?● Client libraries● Visualization tools● Data collection integrations● Comments/feedback on the mailing list● http://influxdb.org/overview/

Page 60: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

Share the love

● Star or watch the project on http://github.com/influxdb/influxdb

● Tweet, blog, shout, whisper● Participate in discussions on mailing list

Page 61: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

Come to the hackfest

● Monday, December 2nd at Pivotal● http://meetup.com/nyc-influxdb-user-group

Page 62: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

OSS lives and dies by adoption/popularity

Page 63: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

MongoDB has 4,406 stars

Page 64: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

MongoDB valued at $1.2B

Page 65: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

Each star worth $272,355.00

Page 66: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

Help InfluxDB get to 10k stars!

go forth and build!

Page 67: time series database open source distributed Introducing ...files.meetup.com/1406240/Introducing InfluxDB (r-curl).pdf · Introducing InfluxDB, an open source distributed time series

Thanks!@pauldix

[email protected]