introduction to elk
TRANSCRIPT
Introduction to ELK
20160316Yuhsuan_Chen
2
ELK
• What is ELK?– Elasticsearch
It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents
– Logstash It is a tool to collect, process, and forward events and log messages
– KibanaIt provides visualization capabilities on top of the content indexed on an Elasticsearch cluster.
Server Filebeat
Server Filebeat
Logstash Elasticsearch Kibana Nginx User
3
ELK
• Who use it?– SOUNDCLOUD
• https://www.elastic.co/assets/blt7955f90878661eec/case-study-soundcloud.pdf
– TANGO• https://www.elastic.co/assets/blt0dc7d9c62f60d38f/case-study-tango.pdf
– GOGOLOOK• http://www.slideshare.net/tw_dsconf/elasticsearch-kibana
– KKBOX• http://www.slideshare.net/tw_dsconf/kkbox-51962632
4
Beats
• The Beats are open source data shippers that you install as agents on your servers to send different types of operational data to Elasticsearch.
• Beats can send data directly to Elasticsearch or send it to Elasticsearch via Logstash, which you can use to enrich or archive the data
5
Logstash
• The ingestion workhorse for Elasticsearch and more– Horizontally scalable data processing pipeline with strong Elasticsearch and
Kibana synergy
• Pluggable pipeline architecture– Mix, match, and orchestrate different inputs, filters, and outputs to play in
pipeline harmony
• Community-extensible and developer-friendly plugin ecosystem– Over 200 plugins available, plus the flexibility of creating and contributing your
own
6
Logstash
• Logstash Download Link– Linux
https://download.elastic.co/logstash/logstash/packages/debian/logstash_2.2.2-1_all.deb
– Windowshttps://download.elastic.co/logstash/logstash/logstash-2.2.2.zip
• How to set up Logstash– http://howtorapeurjob.tumblr.com/post/140724250861/
7
Logstash
• Start up– Linux:
service logstash restart
/opt/logstash/bin/logstash -f /etc/logstash/conf.d/ -v
– Windows:{file_path}\bin\logstash.bat
• Parameter
-h or --help 相關資訊-t or --configtest 測試 config 是否正常-f 指定 config 檔案-v or --debug 輸出 debug 資訊-w 是以多少 thread 處理資料,預設 4-l 指定 log 輸出路徑
8
Logstash - Configure
• Configure– 主結構分三個 input / filter / output ,同時都支援多種輸入輸出與外掛– Configure 可以放在同一份檔案也可以放在不同檔案, Logstash 會自動尋找相對應的資料
• Configure Structure
input { ….}
filter { ….}
output { ….}
1234567891011
9
• stdin
• beats
Logstash - INPUT
Input { # 將 console 上的資料輸入 stdin{}}
12345
input { # 資料來源為 Beats , port 是 5044 ,定義的資料類型為 HTCLog beats{ port => 5044 type => "HTCLog" }
123456
10
Logstash - INPUT
• generatorinput{ generator{ # 資料來源是以行方式呈現,如下面兩行 lines =>[ "02-23 10:19:58.588 1696 1750 D 1", "02-23 10:19:58.695 904 904 D 2" ] # 每一行的次數,這邊設定只輸入一次 count => 1 type => "device_log" }}
123456789101112
11
Logstash - INPUT
• fileinput{ file{ # 指定輸入的檔案路徑,可用 array 方式記錄多個路徑 path => [ "/home/elasticadmin/Desktop/TestLog/*", "/home/elasticadmin/Desktop/TestLog/**/*.txt", ] # 用來監聽指定的檔案路徑下變更的時間,預設是 15 秒 discover_interval => 1 # 用來決定找檔案內容的方式,預設是默認結束的位置,如果要從頭開始找資訊就要設定為beginning start_position => "beginning"
# 用來記錄那些檔案是找過的,如果把檔案砍掉就可以讓 Logstash 重新查找之前的資料, Linux 如果要每次啟動都重新查找,要設定為 /dev/null sincedb_path => "/home/elasticadmin/Desktop/.sincedb_TestLog" }}
12345678910111213141516171819
12
Logstash - FILTER
• grok
• Grok test webhttp://grokdebug.herokuapp.com/
filter { # 假設進來的資料 type 是 HTCLog 就已 grok 這個外掛來進行資料的篩選 if [type] == "HTCLog" { grok { match => {"message" => "(?<log_time>(0[0-9]+|1[0-2])-[0-6][0-9] %{TIME})\s+(?<pid>([0-9]+))\s+(?<ppid>([0-9]+))\s+(?<log_level>([Ww]))\s+(?<log>(.*)) "}
# 輸出時增加一個資料欄位叫 file_type ,內容為 kernel_log add_field => { "file_type" => "kernel_log" } } }}
123456789101112131415
13
Logstash - FILTER
• mutate
filter { mutate { # 新增一個 field add_field => {"add word for host" => "Hello world, from %{host}" } # 新增一個 tag add_tag => {"add word for host" => "Hello world, from %{host}"} # 移除 host 這個 field remove_field => ["host"] # 移除 tag 這個 host remove_tag => ["host"] # 重新命名 host 這個 field rename => {"host" => "hostname"} # 重新將 message 的內容取代 replace => {"message" =>"The message was removed"} }}
12345678910111213141516
14
Logstash - OUTPUT
• stdout
• file
output { # 將結果輸出到 console stdout {}}
1234
output{ file{ # 檔案輸出路徑 path =>"C:\Users\Yuhsuan_chen\Desktop\Elastic\logstash\result.gzip" # 將檔案輸出後,壓縮 gzip => true }}
123456789
15
Logstash - OUTPUT
• elasticsearch
output { elasticsearch { # Elasticsearch 的 web link hosts => ["10.8.49.16:9200"] # 要將資料塞進的 index 名稱 index =>"%{[@metadata][beat]}-%{+YYYY.MM.dd}“ # 要將資料塞進的 type 名稱 document_type => "%{[@metadata][type]}" }}
12345678910
16
Elasticsearch
Elasticsearch is a highly scalable open-source full-text search and analytics engine. It allows you to store, search, and analyze big volumes of data quickly and in near real time.
17
Elasticsearch
• Elasticsearch on Ubuntu– http://howtorapeurjob.tumblr.com/post/140678763346/
• Elasticsearch cluster on Windows– http://howtorapeurjob.tumblr.com/post/140318838221/
• Elasticsearch Cluster Design– http://howtorapeurjob.tumblr.com/post/140438605026/
• CRUD in Elasticsearch– http://howtorapeurjob.tumblr.com/post/140323820711/
• Query DSL in Elasticsearch– http://howtorapeurjob.tumblr.com/post/140616855246/
18
Elasticsearch on Ubuntu
# Close firewallsudo ufw disable
# Edit the configure filesudo gedit /etc/elasticsearch/elasticsearch.yml
12345
# elasticsearch.ymlnetwork.host: host_namecluster.name: "scd500"node.name: "s1"discovery.zen.ping.unicast.hosts: ["host_name"]discovery.zen.ping.multicast.enabled: falsediscovery.zen.minimum_master_nodes: 1node.master: truenode.data: true
123456789
# restart servicesudo service elasticsearch restart
12
• Close Firewall & Edit Configure File
• Configure File
• Restart Service
19
Elasticsearch on Ubuntu
• Install plugin for elasticsearch– Firefox: https://goo.gl/tzwE76 (RESTClient)– Chrome: https://goo.gl/GCc75v (Postman)
check node statushttp://hostname:9200/_cluster/health?pretty
Check Service Statshttp://hostname:9200
20
Elasticsearch cluster on Windows
• Set Java Path on Windows
• Set ConfigurationFile Path: \elasticsearch-2.2.0\config\elasticsearch.yml
# Node S1: \elasticsearch-2.2.0\config\elasticsearch.ymlnetwork.host: ES1 #ES2 #ES3cluster.name: "scd500"node.name: "s1" #"S2" "S3"discovery.zen.ping.unicast.hosts: ["ES1"] discovery.zen.ping.multicast.enabled: falsediscovery.zen.minimum_master_nodes: 1node.master: truenode.data: true
123456789
21
Elasticsearch cluster on Windows
• Setting in elasticsearch.yml– network.host: 要寫可以 ping 到的 hostname ,要記得關防火牆– cluster.name: 一定要同一個名稱,不然就算在同往段裡面也不會視為同群組– node.name: 節點名稱,如果不寫也可以,但是每次重開就會自動重新更新名字– discovery.zen.ping.unicast.hosts: 用來尋找 host 的設定,實際測試在 HTC 網域內設定設定為你的 Master 主機才可以自動加入叢集,倘若在單一電腦上不設定就可以偵測到– discovery.zen.ping.multicast.enabled: 用以防止偵測到預期以外的叢集或節點– node.master: 是否為 Master 節點,若沒指定本身在加入節點的時候就會自動設定– node.data: 是否儲存資料– path.data: Elasticsearch 的資料存放位置– path.logs: Elasticsearch 的 log 存放位置– http.port: 預設的 port 都是 9200
22
Elasticsearch Cluster Design
• Check Cluster Status
– Status: Red -> shard 與副本有問題– Status: yellow -> shard 啟動,副本有問題– Status: Red -> shard 與副本啟動
GET localhost:9200/_cluster/health1
23
Elasticsearch Cluster Design
• Index / Shard / replicas– Elasticsearch 最基本的儲存單位為 shard– 當設定多於一個儲存單位時,會有演算法決定要將資料存在哪個地方– replicas 為你的副本數,在建立 index 的時候就要先決定好你的架設方式
24
Elasticsearch Cluster Design
• Single Shard – 在一台機器裡面只有一個儲存空間,沒有副本
POST localhost:9200/{index_name}{ "settings": { "number_of_shards": 1, "number_of_replicas": 0 }}
1234567
25
Elasticsearch Cluster Design
• 2 Shards – 當有其他 node 被建立起來時,系統會自動將第二個 shard 分散給 Node2
POST localhost:9200/{index_name}{ "settings": { "number_of_shards": 2, "number_of_replicas": 0 }}
1234567
26
Elasticsearch Cluster Design
• 3 Shards , 1 replica– 當有其他 node 被建立起來時,系統會自動將第二個 shard 分散給 Node2s POST localhost:9200/{index_name}
{ "settings": { "number_of_shards": 3, "number_of_replicas": 1 }}
1234567
3 Shards0 Replica1 Node
3 Shards1 Replica3 Nodes
3 Shards1 Replica2 nodes
27
CRUD in Elasticsearch
• REST API 在 Elasticsearch 裡, REST API 結構如下
當執行時, index 與 type 是必要的, id 可以指定或不指定當不指定 id 而塞入資料時,要以 POST 傳遞,不能用 PUT
以設計 Facebook 為例子來看 CRUD 操作每則訊息都包含了以下基本資料
http://localhost:9200/<index>/<type>/[<id>] 1
{ "message" : "", "feel" : "", "location" : "", "datetime" : ""}
123456
28
CRUD in Elasticsearch
• Create Index預設索引為空的,所以要先建立一個為 facebook 的 index
– Linux
– REST Client
– Result
• Query Indices
– Result
curl -XPOST 'http://localhost:9200/facebook'1
{ "acknowledged": true}
123
POST http://localhost:9200/facebook1
GET http://localhost:9200/_cat/indices?v1
health status index pri rep docs.count docs.deleted store.size pri.store.size yellow open facebook 5 1 0 0 795b 795b
12
29
CRUD in Elasticsearch
• Create Document– Linux
– RESR Client
– Result
curl -XPOST 'http://localhost:9200/facebook/yuhsuan_chen' -d ‘{"message": " Hello!!!","feel": "Good","location": "Taoyuan City","datetime": "2016-03-01T12:00:00"}’
1
POST http://localhost:9200/facebook/yuhsuan_chen{ "message": "Hello!!!", "feel": "Good", "location": "Taoyuan City", "datetime": "2016-03-01T12:00:00"}
1234567
{ "_index":"facebook", "_type":"yuhsuan_chen", "_id":"AVM7YFHl16LIHfUR_IEh", -> 自動編號或是可以自己指定 "_version":1, -> 修改的第幾個版本 "_shards":{ "total":2, -> 有多少個副本 "successful":1, "failed":0}, "created":true}
123456789101112
30
CRUD in Elasticsearch
• Query Data# 不指定查詢GET http://localhost:9200/_search?
# 指定關鍵字查詢GET http://localhost:9200/_search?q='good'
# 在特定 index 下查詢GET http://localhost:9200/facebook/_search?q='good'
# 在特定 type 下查詢GET http://localhost:9200/facebook/yuhsuan_chen/_search?q='good'
# 尋找 message 這欄位內包含 hello 的資料GET http://localhost:9200/facebook/_search?q=message:hello
# 尋找包含 datetime 這欄位,且其他資料內包含 helloGET http://localhost:9200/facebook/_search?_field=datetime&q=hello
1234567891011121314151617
31
CRUD in Elasticsearch
• Delete Data# 根據 id 刪除資料DELETE http://localhost:9200/facebook/yuhsuan_chen/1
# 刪除 indexDELETE http://localhost:9200/facebook
# 刪除 type在 2.2 版之後已不支援刪除 type ,只能刪除 indexhttps://goo.gl/znbOuB
123456789
32
CRUD in Elasticsearch
• Update Data# 直接更新整個文件POST http://localhost:9200/facebook/yuhsuan_chen/1{ "message": "Hello!!!", "feel": "Good", "location": "Taoyuan City", "datetime": "2016-03-01T12:00:00"}
# 更新文件內的某一個資料欄位POST http://localhost:9200/facebook/yuhsuan_chen/1/_update{ "doc": { "message": "haha" }}
12345678910111213141516
33
CRUD in Elasticsearch
• Bulk– 如果 POST link 中已經包含 index 及 type ,在輸入內容的時候可以不用指定– 如果需要自己指定,則需定義 index 及 type– id 可以不指定,但是 delete 的時候還是必須知道 id 才有辦法刪除
要注意很重要的一點,最後結尾一定要加上換行字元 !!!!
# 新增資料POST http://localhost:9200/facebook/yuhsuan_chen/_bulk{"create":{"_id":"1"}}{"message":"Hello!!!","feel":"Good","location": "Taoyuan City", "datetime": "2016-03-01T12:00:00"}{"create":{"_index":"facebook","_type":"Other","_id":"2"}}{"message": "The weather is so good.","feel": "Comfortable","location": "Taoyuan City", "datetime": "2016-03-01T12:01:00"}{"create":{"_id":"3"}}{"message": "I like my job.","feel": "bad","location": "Taoyuan City", "datetime": "2016-03-01T12:02:00"}{"create":{}}{"message": "Time for launch.","feel": "hungry","location": "Taoyuan City", "datetime": "2016-03-01T12:03:00"}
123456789101112131415
34
CRUD in Elasticsearch
• Bulk
要注意很重要的一點,最後結尾一定要加上換行字元 !!!!
顯示剛剛操作的最後結果
# 新增三筆資料,第一筆 location 修改為台北市,刪除第二筆資料POST http://localhost:9200/facebook/_bulk{"create":{"_type":"test","_id":"1"}}{"message":"test1","feel":"Good","location": "Taoyuan City", "datetime": "2016-03-01T12:00:00"}{"create":{"_type":"test","_id":"2"}}{"message":"test2","feel":"Good","location": "Taoyuan City", "datetime": "2016-03-01T12:00:00"}{"create":{"_type":"test","_id":"3"}}{"message":"test3","feel":"Good","location": "Taoyuan City", "datetime": "2016-03-01T12:00:00"}{"update":{"_type":"test","_id":"1"}}{"doc": { "location": "Taipei City"}}{"delete":{"_type":"test","_id":"2"}}
123456789101112131415
GET http://localhost:9200/facebook/_search?q=_type:test1
35
CRUD in Elasticsearch
• Import Data– Syntax
– Example
curl -XPOST localhost:9200/_bulk --data-binary @file.json1
curl -POST "http://yuhsuan_chen_w7p:9200/_bulk" --data-binary @C:\Users\Yuhsuan_chen\Desktop\shakespeare.json1
36
Query DSL in Elasticsearch
• 特別注意如果使用 linux 的 curl套件,可以直接使用 GET 進行參數傳遞如果使用 windows 的 curl套件,會對字串的判別上有問題,尤其是單引號與雙引號,因此不建議使用 windows 版本的 curl。在 windows 上建議直接使用 POSTMAN 進行查詢如果使用套件進行操作,因為大多數的 http 相關的 library 都不支援 GET帶參數,所以建議使用POST 操作IETF 的相關文件內也沒特別規定 GET 是否可以帶參數,但是 Elasticsearch 開發人員覺得 GET比較合理,但實際上查詢時還是使用 POST降低問題發生http://tools.ietf.org/html/rfc7231#section-4.3.1
# Create test dataPOST http://localhost:9200/facebook/yuhsuan_chen/_bulk{"create":{"_id":"1"}}{"message":"Hello!!!","feel":"Good","location": "Taoyuan City", "datetime": "2016-03-01T12:00:00","temp":25.6}{"create":{"_index":"facebook","_type":"Other","_id":"2"}}{"message": "The weather is so good.","feel": "Comfortable","location": "Taipei City", "datetime": "2016-03-01T12:01:00","temp":27.3}{"create":{"_id":"3"}}{"message": "I like my job.","feel": "bad","location": "Taoyuan City", "datetime": "2016-03-01T12:02:00"}{"create":{}}{"message": "Time for launch.","feel": "hungry","location": "Taoyuan City", "datetime": "2016-03-01T12:03:00","temp":30.3}
123456789101112131415
37
Query DSL in Elasticsearch
• Query All Data
• Query Specific Field Data
# Use match_all to query all dataPOST http://localhost:9200/facebook/_search{ "query" : { "match_all" : { } }}
12345678
# Use match to query “city” in location fieldPOST http://localhost:9200/facebook/_search{ "query" : { "match" : { "location" : "city" } }}
123456789
38
Query DSL in Elasticsearch
• Query Multi Data
• Query Value
# Find location/message field include hello/goodPOST http://localhost:9200/facebook/_search { "query" : { "multi_match" : { "query": ["hello","good"], "fields": ["location","message"] } }}
12345678910
# Find the temp >= 27, gt -> 大於 , gte ->大於等於, lt ->小於, lte ->小於等於POST http://localhost:9200/facebook/yuhsuan_chen/_search{ "query" : { "range" : { "temp":{ "gte":27 } } }}
1234567891011
39
Query DSL in Elasticsearch
• Query Data By Regular Expression
• Order Data
# Find the message can be match “fo.*” or “he.*”POST http://localhost:9200/facebook/yuhsuan_chen/_search{ "query" : { "regexp" : { "message":"fo.*|he.*" } }}
123456789
# Query all data, order _type by desc, order _id by ascPOST http://localhost:9200/facebook/yuhsuan_chen/_search { "query" : { "match_all" : {} }, "sort":[ {"_type" : {"order":"desc"}}, {"_id" : {"order":"asc"}} ]}
1234567891011
40
Query DSL in Elasticsearch
• Query Condition– must 代表必須符合– must_not 代表必不能符合– should 可以加重查詢比重– 至少要有一個 should, 但是如果有 must / must_not 就可以不用 should
# Message 中必須包含” .*i.*”, 而且 Message 不含” good”, 如果 feel 中有包含 bad 則加重排名POST http://localhost:9200/facebook/yuhsuan_chen/_search{ "query":{ "bool":{ "must": {"regexp":{"message":".*i.*"}}, "must_not": {"regexp":{"message":"good"}}, "should":{"match":{"feel":"bad"}} } }}
1234567891011
41
Query DSL in Elasticsearch
• Query by “bool”包含 should , _id:3排名會在前面 不包含 should , _id:3排名較低
42
Kibana
Kibana is an open source analytics and visualization platform designed to work with Elasticsearch. You use Kibana to search, view, and interact with data stored in Elasticsearch indices
• Flexible analytics and visualization platform
• Real-time summary and charting of streaming data
• Intuitive interface for a variety of users
• Instant sharing and embedding of dashboards
• Configure Kibana & Elasticsearch– http://howtorapeurjob.tumblr.com/post/140733691641/
43
Kibana
• Set Configuration
• YML
sudo gedit /opt/kibana/config/kibana.yml1
# \config\kibana.ymlserver.host: "yuhsuan_chen_w7p.htctaoyuan.htc.com.tw"elasticsearch.url: "http://yuhsuan_chen_w7p.htctaoyuan.htc.com.tw:9200"
123
44
Kibana
• Download Sample Fileshttps://www.elastic.co/guide/en/kibana/3.0/snippets/shakespeare.json
https://github.com/bly2k/files/blob/master/accounts.zip?raw=true
https://download.elastic.co/demos/kibana/gettingstarted/logs.jsonl.gz
• Create Index# Index: shakespearecurl -XPUT http://localhost:9200/shakespeare{ "mappings" : { "_default_" : { "properties" : { "speaker" : {"type": "string", "index" : "not_analyzed" }, "play_name" : {"type": "string", "index" : "not_analyzed" }, "line_id" : { "type" : "integer" }, "speech_number" : { "type" : "integer" } } } }}
1234567891011121314
45
Kibana
• Create Index# Index: Index: logstash-{datetime} curl -XPUT http://localhost:9200/logstash-2015.05.18 #19 20{ "mappings": { "log": { "properties": { "geo": { "properties": { "coordinates": { "type": "geo_point" } } } } } }}
1234567891011121314151617
46
Kibana
• Import Data
• Check All Indices
• Connect to Kibanahttp://localhost:5601
如果有設定 Nginx 的自動導向可以不用輸入 port剛建立起來第一次會需要一點時間讓 Kibana初始化如果沒有跳到設定就自己點選 setting
# Import Datacurl -XPOST 'localhost:9200/bank/account/_bulk?pretty' --data-binary @accounts.jsoncurl -XPOST 'localhost:9200/shakespeare/_bulk?pretty' --data-binary @shakespeare.jsoncurl -XPOST 'localhost:9200/_bulk?pretty' --data-binary @logs.jsonl
1234
# Check indicescurl 'localhost:9200/_cat/indices?v'
12
47
Kibana
如果有設定 Nginx 的自動導向可以不用輸入 port剛建立起來第一次會需要一點時間讓 Kibana初始化如果沒有跳到設定就自己點選 setting
這裡要建立三個 index 給 Kibana 可以快速搜尋logstash-*ba*shakes*
48
Kibana
這代表著當我們要搜尋 Kibana 時,可以先行過濾要抓取哪個 Elasticsearch 的 index
我們可以看到所有相關的欄位屬性資料,也可以從這邊來做調整回到 Discover 的部分,選擇 ba* 的 index ,搜尋帳號編號 <100 且資產為 >47500 的資料,全部總共有五筆
如果我們想要針對其他欄位做資料選取,可以點選左邊 avaliable fields -> add 就可以看到我們篩選過後的欄位資料
account_number:<100 AND balance:>475001
49
Kibana
• 建立資產年齡分布圓餅圖選擇 Visualize -> Pie chart -> From a new search -> Select an index pattern -> ba*選擇 buckets -> Split Slices -> Aggregation -> Range -> Field -> balance -> 輸入範圍數值
50
Kibana
接著新增一個 Split Slices -> Aggregation -> Terms-> Field -> age -> Order By -> metric: Count -> Order -> Descending -> Size -> 5最後按下上方的綠色三角已產生我們要的圖表資料
如果要儲存可以選擇右上方的儲存
51
Kibana
• 建立 Dashboard選擇以儲存的圖表就會顯示在 Dashboard 上,可以自由調整大小,位置等
52
Nginx
• Nginx port forwarding– http://howtorapeurjob.tumblr.com/post/140665750171/
53
ELK parse system log on Ubuntu
• How to set ELK on Ubuntu– http://howtorapeurjob.tumblr.com/post/140837053206/