streaming visualization...source: adapted from tibco edge. apache kafka –a streaming platform...
TRANSCRIPT
BASEL | BERN | BRUGG | BUCHAREST | DÜSSELDORF | FRANKFURT A.M. | FREIBURG I. BR. | GENEVA HAMBURG | COPENHAGEN | LAUSANNE | MANNHEIM | MUNICH | STUTTGART | VIENNA | ZURICH
http://guidoschmutz.wordpress.com@gschmutz
Streaming VisualizationDOAG Konferenz 2019Guido Schmutz
Agenda
1. Motivation / Introduction
2. Stream Data Integration & Stream Analytics Ecosystem
3. Three Blueprints for Streaming Visualization
End-to-End Demo available here:https://github.com/gschmutz/various-demos/tree/master/streaming-visualization
BASEL | BERN | BRUGG | BUKAREST | DÜSSELDORF | FRANKFURT A.M. | FREIBURG I.BR. | GENF HAMBURG | KOPENHAGEN | LAUSANNE | MANNHEIM | MÜNCHEN | STUTTGART | WIEN | ZÜRICH
GuidoWorking at Trivadis for more than 22 yearsConsultant, Trainer, Platform Architect for Java, Oracle, SOA and Big Data / Fast DataOracle Groundbreaker Ambassador & Oracle ACE Director
@gschmutz guidoschmutz.wordpress.com
175th
edition
Motivation / Introduction
Timely decisions require new data immediately
Keep the data in motion …
Data at Rest Data in Motion
Store
(Re)Act
Visualize/Analyze
StoreAct
Analyze
111010101010110
111010101010110
vs.
Visualize
Hadoop ClusterdHadoop ClusterBig Data
Reference Architecture for Data Analytics Solutions
SQL
Search
Service
BI Tools
Enterprise Data Warehouse
Search / Explore
File Import / SQL Import
Event Hub
Data Flow
Data FlowChange DataCapture Parallel
Processing
Storage
Storage
Raw
Ref
ined
Results
SQL Export
Microservice State
{ }
API
StreamProcessor
State
{ }
API
EventStream
EventStream
Search
Service
Stream Analytics
MicroservicesEnterprise Apps
Logic
{ }
API
Edge Node
Rules
Event Hub
Storage
Bulk Source
Event Source
Location
DBExtract
File
DB
IoTData
MobileApps
Social
Event Stream
Telemetry
Two Types of Stream Processing(by Gartner)
Stream Data Integration• focuses on the ingestion and processing of
data sources targeting real-time extract-transform-load (ETL) and data integration use cases
• filter and enrich the data
Stream Analytics• targets analytics use cases
• calculating aggregates and detecting patterns to generate higher-level, more relevant summary information (complex events)
• Complex events may signify threats or opportunities that require a response from the business
Gartner: Market Guide for Event Stream Processing, Nick Heudecker, W. Roy Schulte
Stream Data Integration & Stream Analytics Ecosystem
Stream Data Integration & Stream Analytics Ecosystem
Stream Analytics
Event Hub
Open Source Closed Source
Stream Data Integration
Source: adapted from Tibco
Edge
Apache Kafka – A Streaming Platform
Kafka Cluster
Consumer 1 Consume 2r
Broker 1 Broker 2 Broker 3Zookeeper Ensemble
ZK 1 ZK 2ZK 3
Schema Registry
Service 1
Management
Control Center
Kafka Manager
KAdmin Producer 1 Producer 2
kafkacat
Data Retention:• Never• Time (TTL) or Size-based• Log-Compacted based
Producer3Producer3
ConsumerConsumer 3
Apache Kafka – A Streaming Platform
SourceConnector
SinkConnector
trucking_driver
KSQL Engine
Kafka Streams
Kafka Broker
Demo using Kafka Stack for Stream Data Integration
Stream Analytics
Event Hub
Stream Data Integration & Stream Analytics
StreamingVisualization
Data Flow
ConsumerData Sources
Data Flow ??
Filter: #doag2019,….User: @gschmutz
Demo: Kafka Connect to retrieve Tweets
curl -X "POST" "$DOCKER_HOST_IP:8083/connectors" \-H "Content-Type: application/json" \--data '{
"name": "twitter-source","config": {
"connector.class": "com.github.jcustenborder.kafka.connect.twitter.TwitterSourceConnector",
"twitter.oauth.consumerKey": "xxxxx","twitter.oauth.consumerSecret": "xxxxx","twitter.oauth.accessToken": "xxxx","twitter.oauth.accessTokenSecret": "xxxxx","process.deletes": "false","filter.keywords": "#doag2019","filter.userIds": "15148494","kafka.status.topic": "tweet-raw-v1","tasks.max": "1"}
}'
Demo: KSQL for Streaming ETL
CREATE STREAM tweet_sWITH (KAFKA_TOPIC='tweet-v1', VALUE_FORMAT='AVRO', PARTITIONS=8) AS SELECT id , createdAt , text , user->screenNameFROM tweet_raw_s;
CREATE STREAM tweet_raw_s WITH (KAFKA_TOPIC='tweet-raw-v1', VALUE_FORMAT='AVRO');
SELECT id, lang, removestopwords(split(LCASE(text), ' ')) AS word FROM tweet_raw_sWHERE lang = 'en' or lang = 'de';
SELECT id, LCASE(hashtagentities[0]->text) FROM tweet_raw_sWHERE hashtagentities[0] IS NOT NULL;
Demo using Kafka Stack for Stream Data Integration
Stream Analytics
Event Hub
Stream Data Integration & Stream Analytics
StreamingVisualization
Data Flow
ConsumerData Sources
Data Flow ??
Filter: #voxxeddaysbanff,#java,#kafka,….User: @VoxxedDaysBanff, @gschmutz
Visualization: many many options!
But do they all support Streaming Data?
Three Blueprints forStreaming Visualization
BP1: Fast datastore with regular polling from consumer
Storage
Stream Analytics
Event Hub
Stream Data Integration & Stream Analytics
API
Data Store
StreamingVisualization
Data Flow
ConsumerData Sources
Data In Motion Data at Rest
Data Flow
BP1-1: Elasticsearch / Kibana
Storage
Stream Analytics
Event Hub
Stream Data Integration & Stream Analytics
API
Data Store
StreamingVisualization
Data Flow
ConsumerData Sources
Data In Motion Data at Rest
Data Flow
Alternatives:SOLR & Banana
BP1-2: InfluxDB / Grafana or Chronograf
Storage
Stream Analytics
Event Hub
Stream Data Integration & Stream Analytics
API
Data Store
StreamingVisualization
Data Flow
ConsumerData Sources
Data In Motion Data at Rest
Data Flow
Alternatives:Prometheus & GrafanaDruid & Superset
BP1-3: NoSQL & Custom Web
Storage
Stream Analytics
Event Hub
Stream Data Integration & Stream Analytics
API
Data Store
StreamingVisualization
Data Flow
ConsumerData Sources
Data In Motion Data at Rest
Data Flow
BP1-4: Kafka Streams Interactive Query & Custom App
Storage
Stream Analytics
Event Hub
Stream Data Integration & Stream Analytics
API
Data Store
StreamingVisualization
Data Flow
ConsumerData Sources
Data In Motion Data at Rest
Data Flow
Alternatives:Flink…
BP2: Direct Streaming to the Consumer
Stream Analytics
Event Hub
Stream Data Integration & Stream Analytics
StreamingVisualization
Data Flow
ConsumerData Sources
Data In Motion
Data Flow
Channel/Protocol
API
BP2-1: Kafka Connect to Slack / WhatsApp
Stream Analytics
Event Hub
Stream Data Integration & Stream Analytics
StreamingVisualization
Data Flow
ConsumerData Sources
Data In Motion
Data Flow
Channel/Protocol
API
Alternatives:TwitterSMS…
BP-2-1: Demo Kafka Connect to Slack
curl -X "POST" "$DOCKER_HOST_IP:8083/connectors" \-H "Content-Type: application/json" \--data '{
"name": "slack-sink","config": {"connector.class": "net..SlackSinkConnector","tasks.max": "1","topics":"slack-notify","slack.token":”XXXX","slack.channel":"general","message.template":"tweet by ${USER_SCREENNAME} with ${TEXT}",
}}'
BP2-2: Kafka to Tipboard (Dashboard Solution)
Stream Analytics
Event Hub
Stream Data Integration & Stream Analytics
StreamingVisualization
Data Flow
ConsumerData Sources
Data In Motion
Data Flow
Channel/Protocol
API
Alternatives:DashingGeckoboard…
BP2-2: Demo Kafka to Tipboard (Dashboard Solution)
http://allegro.tech/tipboard/
BP2-2: Demo Kafka to Tipboard (Dashboard Solution) c.subscribe(['DASH_TWEET_COUNT_BY_HOUR_T'])
while True:msg = c.poll(1.0)
data = json.loads(msg.value().decode('utf-8'))data_selected = data.get('NOF_TWEETS’)data_prepared = prepare_for_just_value(data_selected)data_jsoned = json.dumps(data_prepared)data_to_push = { 'tile': TILE_NAME, 'key': TILE_KEY
, 'data': data_jsoned }resp = requests.post(API_URL_PUSH, data=data_to_push)
def prepare_for_just_value(data):# data={"title": "Number of Tweets:", "description": "(1 hour)", "just-value": "23"
data_prepared = datadata_prepared = {'title': '# Tweets:', 'description': 'per hour’,
'just-value': data_prepared}return data_prepared
BP2-3: Web Sockets / SSE & Custom Modern Web App
Stream Analytics
Event Hub
Stream Data Integration & Stream Analytics
StreamingVisualization
Data Flow
ConsumerData Sources
Data In Motion
Data Flow
Channel/Protocol
API
Sever Sent Event (SSE)
BP3: Streaming SQL Result to Consumer
Stream Analytics
Event Hub
Stream Data Integration & Stream Analytics ConsumerData Sources
Data In Motion
Data Flow
API StreamingVisualization
BP3-1: KSQL and Arcadia Data
Stream Analytics
Event Hub
Stream Data Integration & Stream Analytics ConsumerData Sources
Data In Motion
Data Flow
API StreamingVisualization
BP3-2: KSQL with REST API to Custom Web App
Stream Analytics
Event Hub
Stream Data Integration & Stream Analytics ConsumerData Sources
Data In Motion
Data Flow
API StreamingVisualization
BP3-2: Demo KSQL with REST API
curl -X POST -H 'Content-Type: application/vnd.ksql.v1+json’ -i http://analyticsplatform:8088/query --data '{
"ksql": "SELECT text FROM tweet_raw_s;","streamsProperties": { "ksql.streams.auto.offset.reset": "latest” }
}'
{"row":{"columns":["The latest The Naji Filali Daily! https://t.co/9E6GonrySE Thanks to @Xavier_Porter1 @ClouMedia #ai #bigdata"]},"errorMessage":null,"finalMessage":null}
{"row":{"columns":["RT @Futurist_Invest: This robot can copy your face! Creepy \n\n#SaturdayThoughts#SaturdayMorning #creepy #bots #bot #AI #bigdata #robotics #…"]},"errorMessage":null,"finalMessage":null}
{"row":{"columns":["She’s back telling us all about why datathons are exciting now :) Catch her while you can! �@ARUKscientist� �@S_Bauermeister� #bigdata #ARUKConfhttps://t.co/Br484db5ut"]},"errorMessage":null,"finalMessage":null}
{"row":{"columns":["Blockchain Competitive Innovation Advantage"]},"errorMessage":null,"finalMessage":null}
BP3-3: Spark Streaming & Oracle Stream Analytics
Stream Analytics
Event Hub
Stream Data Integration & Stream Analytics ConsumerData Sources
Data In Motion
Data Flow
API StreamingVisualization
BP3-3: Demo Spark Streaming & Oracle Stream Analytics
https://www.oracle.com/middleware/technologies/complex-event-processing.html
Summary
BP1: Fast Store & Polling
• “classic” pattern
• Not end-to-end “data-in-motion” -> “Data-at-rest” before visualization
• Slight delay might not be acceptable for monitoring dashboard
• Can use full power of data store(s) => NoSQL
• In-memory reduces overhead
BP2: Stream to Consumer
• minimal latency
• More difficult on “client side”
• good if stream holds directly what should be displayed
• More difficult if data in stream needs to be analyzed before visualization
• No historical info available
BP3: Streaming SQL
• Minimal latency
• Power of SQL query engine available for visualization
• possibility for “self-service” style visualization
• Some analytics are more difficult on streaming data
• No historical info available