structured streaming in - bi consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · the...

Budapest Data Forum, 2018 Structured Streaming in

Upload: others

Post on 22-May-2020

1 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Budapest Data Forum, 2018

Structured Streaming in

Spark / Big Data / Cloud Computing Trainings Building Data Infrastructures for Industry 4.0 & Online

https://photos.google.com/u/1/share/AF1QipPecyA6VU9NrnNTKh0mwA6I2sYoTc6TuNIOrkd9IDuldWFBULz-d_nyVE3ujueuUQ?key=bldGN0tlbVQwMnN3RHJLYUxLU2lZRW8yTG05aWZB

Page 3: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done

Page 4: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done

Why Real-time?

Page 5: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done

Why Spark Streaming?

Page 6: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done

Why Real-time?

Page 7: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done

Page 8: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done

How to chose a streaming tool?

Page 9: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done

The Apache landscape

streams

Page 10: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done

Page 11: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done

Sometimes you just want to keep it simple

Page 12: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done

Remember this from 1 hour ago?

Page 13: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done

So, our fancy tools

streams

Page 14: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done

How to chose a fancy streaming tool?

Page 15: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done

Popularity

Page 16: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done

See the bigger picture

Page 17: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done

Throughput

source: https://databricks.com/blog/2017/10/11/benchmarking-structured-streaming-on-databricks-runtime-against-state-of-the-art-streaming-systems.html

*as the Spark folks measured it

Page 18: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done

Throughput

source:https://data-artisans.com/blog/curious-case-broken-benchmark-revisiting-apache-flink-vs-databricks-runtime

*as the Flink folks measured it

Page 19: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done

Developers!

Page 20: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done

LatencyNative Streaming

(event-based processing)

Microbatching

streams

trident

Page 21: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done

https://www.theguardian.com/technology/2014/feb/05/why-google-engineers-designers

Structured Streaming

Pain points to solve• Interoperability

batch, interactive and real-time analytics

• Event time based processingevent time instead of processing time

• End-to-end guarantees consistent data throughout the whole pipeline exactly-once processing

Page 24: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done

Pain points to solve• Interoperability

batch, interactive and real-time analytics

• Event time based processingevent time instead of processing time

• End-to-end guarantees consistent data throughout the whole pipeline exactly-once processing

Page 25: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done

Unbounded Table

image credit: http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html

http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html

Page 26: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done

Pain points to solve• Interoperability

batch, interactive and real-time analytics

• Event time based processingevent time instead of processing time

• End-to-end guarantees consistent data throughout the whole pipeline exactly-once processing

Page 27: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done

Late data

Page 28: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done

Handling late data with Watermarking

Page 29: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done

Pain points to solve• Interoperability

batch, interactive and real-time analytics

• Event time based processingevent time instead of processing time

• End-to-end guarantees consistent data throughout the whole pipeline exactly-once processing

The drama of Exactly-once processing (Act I)

Spark: got it, thanks! Consider line 11 done.Spark: Hey Postgres,

store the results please

Spark: give me data

Kafka: you were at the 10th line, there you go with the 11th.

Spark: give me data

Kafka: you were at the 11th line, there you go with the 12th.

OK!

...

Page 31: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done

The drama of Exactly-once processing (Act II)

Spark: got it, thanks! Consider line 13 done.Spark: Hey Postgres,

store the re.....

Spark: give me data

Kafka: you were at the 12th line, there you go with the 13th.

Claudius: Hey Spark, got thirsty? ;)

Page 32: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done

Demo

Summary• Only use fancy tools if you need them ;)

• Structured Streaming

• Great Concept

• Access to core Spark functionalities

• Probably takes 1-2 years to make it feature-rich

Page 34: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done

Questions?

Zoltan Toth [email protected]

+36 30 291 3599

mailto:[email protected]

Tartalomjegyzék - BME NYELVVIZSGAKÖZPONTbmenyelvvizsga.bme.hu/letoltes/mf/orosz_2ny_b1...чтобы записать окончательные ответы. Ответы нужно

Gazdagréti Patika Homeopátiás monokomponensű szerek …homeopatika.hu/letoltes/homeopatias_monokomponensu_szerek.pdf · Gazdagréti Patika Homeopátiás monokomponensű szerek

The Enterprise Presto Company STARBURST Presto: SQL-on ...biconsulting.hu/letoltes/2018budapestdata/wojciech_biela_presto_sql_on_anything.pdf · The Enterprise Presto Company STARBURST

Szövegvizualizációs Balogh Kitti Precognox - BI Consultingbiconsulting.hu/letoltes/2016budapestbi/balogh_kitti... · 2016-11-21 · Balogh Kitti Precognox Budapest BI Fórum 2016.10.27

Pénzügyi adattárház építése az ... - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/...statistics, spatial, data preparation for predictive and prescriptive in-place

Gyors sikerek - BI Consultingbiconsulting.hu/letoltes/2016budapestbi/ws... · Tutorial, példa-adatbázisok, template. Előnyei ... Számos tanuló algoritmus Jó vizualizációs

APACHE KUDU: ANALYTICS & RANDOM ACCESS IN A SINGLE ...biconsulting.hu/letoltes/2018budapestdata/...kudu.pdf · Fast Scans, Analytics and Processing of Static Data Fast On-Line Updates

Gyors sikerek - BI Consultingbiconsulting.hu/letoltes/2015budapestbi/budapestbiforum2015... · Demo Adatbányászati projekt folyamata 2 projektet viszünk végig ma Churn projekt

Scalable Data Warehousing on Hadoop - BI Consultingbiconsulting.hu/letoltes/2017budapestdata/fekete...Scalable Data Warehousing on Hadoop Hadoop Ecosystem Hive ... –ORC format, Facebook

H O Deep Water - BI Consultingbiconsulting.hu/.../chow_jo_fai_H2O_deep_water.pdf · H2O Deep Water Making GPU Deep Learning Accessible to Everyone Jo-fai (Joe) Chow Data Scientist

Scalingupa Data Science Team - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/otti... · Scalingupa Data Science Team We arehiringpeople, notskills Levente Otti –Head of

Shipping Data Science Products! - BI Consultingbiconsulting.hu/letoltes/2015budapestbi/budapestbiforum2015_IanOzsvald.pdfMicroservices Flask is my go-to tool Swagger docs (git pull

179044421 Laszlo Ervin Uj Vilagkep a Tudatos Valtozas Kezikonyve PDF Letoltes

Detecting fraud and outliers using R - BI Consultingbiconsulting.hu/letoltes/2016budapestbi/Tomaz_Kastrun_detecting_fr… · Detecting fraud and outliers using R Tomaž Kaštrun #budapestBI

63280116 Prananadi Konyv Petrezselyem Jozsef Konyv Letoltes

Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large

Vizsgaszabályzat - bmenyelvvizsga.bme.hubmenyelvvizsga.bme.hu/letoltes/bmevizsgaszabalyzat.pdf · által kiadott, nyomtatott általános szótár használható. Úgynevezett tematikus

Adattárházak 2015-ben kiterjesztés és gyorsulás: Big Data és a ... - BI Consultingbiconsulting.hu/letoltes/2015budapestdata/budapestdata... · 2015-06-18 · Optimization Adaptive

Multi-tenant Apache Hadoop Clusters - BI Consultingbiconsulting.hu/letoltes/2016budapestdata/daniel_schoberle _hadoop... · Multi-tenant Apache Hadoop Clusters Dániel Schöberle

ketaklub.huketaklub.hu/letoltes/Neuroshima Hex Uranopolis.pdfCreated Date: 2/10/2015 8:03:45 AM

Kulturalis kreativok teljes pdf konyv ingyen letoltes

SAS Viya Overview - BI Consultingbiconsulting.hu/letoltes/2016budapestbi/anand_chitale_sas_viya.pdf · engineering and dimension reduction • Provides advanced unsupervised and supervised

Introduction to Pandas - BI Consultingbiconsulting.hu/...introduction_to_pandas_and_time_series_analysis.pdf · Introduction to Pandas and Time Series Analysis Alexander C. S. Hendorf

Big Data Monetization - BI Consultingbiconsulting.hu/letoltes/2016budapestdata/boldog_viktor_bigdata.pdf · with Big Data Build a Data-Centric Foundation Discover the Value ... Hadoop

Big Data Trends 2017 - BI Consultingbiconsulting.hu/letoltes/2017budapestdata/arato_bence...Big Data Market The Big Data technology and services market will grow at a 27% compound

EME - Letoltes 4 - Mediakulturareal.mtak.hu/64130/1/EME - Letoltes 4 07-03.pdf · dött, az 1870 és 1880 között Párizsban, könyv alakban publikált regények vizsgálatán ala-pult

Designing Agile Data Pipelines - BI Consultingbiconsulting.hu/letoltes/...singhashish_designing.pdf · So how do we build real data architectures? Click to enter confidentiality information

63280479 Reiki II Fokozat Kovesi Peter Konyv Letoltes

ketaklub.huketaklub.hu/letoltes/Neuroshima Hex Sharrash.pdfCreated Date: 6/26/2013 1:37:27 PM

6w bp data show - BI Consultingbiconsulting.hu/letoltes/2015budapestdata/budapest... · Data Stack Philosophy ‣ Closetclean & ‣ Borges’&Labyrinths & ‣ Backwards&straight &

GUI Enterprise ML? AI? DS? nyelvek - BI Consultingbiconsulting.hu/.../nagy_racz_istvan_data_science.pdf · 2018. 7. 9. · McKinsey Global Institute, 2011 Egy magyar fejvadászcég

Néhány újítás alapjaiban zavarta - BI Consultingbiconsulting.hu/letoltes/2016budapestdata/dudas_viktor... · 2016-06-26 · Néhány újítás alapjaiban zavarta meg a gazdasági

Az adatok hatalma - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/arato_bence...Hadoop Cloudera EMR EMR SQL motor Impala Hive, Spark, Presto Presto, Spark, Hive Devops (fő)

Az adatelemző felelőssége – tapasztalatok a biztosítási analitikábanbiconsulting.hu/letoltes/2018budapestdata/szabo_daniel... · 2018-07-04 · Segítünk az embereknek anyagi

QR/AR KÓDOKALKALMAZÁSA AFELSŐOKTATÁSI GYAKORLATBANabonyita.inf.elte.hu/letoltes/tempus20140408/abonyita_ar_qr_20140408.pdf · 2014.04.14. 19 ESÉLYEGYENLŐSÉGBIZTOSÍTÁSA Ügyeljünk