Flink and NiFi, Two Stars in the Apache Big Data Constellation
Matthew Ring, Chicago Apache Flink Meetup, Jan. 19, 2016
About me:● Matthew Ring is currently a Senior Software Engineer at HP Enterprise.
● Matt has been a professional Java developer in multiple industries, including finance, healthcare
and education, since 1999.
● Prior to that, he was an electrical engineer in defense communications.
● He is currently working on a new Investigative Analytics product for HP Enterprise.
● He has presented talks at JavaOne and Bank of America's developer conferences.
● His github is https://github.com/mring33621
What is NiFi?Origin:
NSA -> Onyara -> Apache NiFi
-> Hortonworks DataFlow
Summary:
Visual Dataflow Programming for Big Data/Fast Data Ingestion!
(Or, yet another package where you drop stuff on the screen and connect it with arrows)
What is NiFi?IMHO, good for:
● Ingestion● Format Conversion● Light (simple) Processing● Delivery to other systems
Together?● Similar, but different...● Friends in common:
○ Sockets○ Kafka○ HDFS○ Flume○ RabbitMQ○ NATS Messaging○ Elasticsearch○ Solr
● There is also the option of direct NiFi <-> Flink connections!
Together?● NiFi is visual● NiFi keeps a paper trail RE: the data
running through it● Supports monitoring/metrics reporting
○ Ambari○ Ganglia○ Reimann
● Oh, and you can modify flows while they are LIVE!
● NiFi has more friends to bring to the party:○ JSON/Avro/Parquet/Kite○ HTTP/S, UDP, S/FTP○ Text matching/parsing with regex○ Tagging (meta data)○ Scripting○ AWS S3, SQS, SNS, Azure events○ Tailing/Syslog○ HL7○ MongoDB○ HBase○ SQL○ JMS○ Images○ ...AND MORE!
Paper Trail!NiFi records:
● Content● Metadata● Provenance (touches)
Sooooo what?
● Allows replay of individual items!● Queryable through UI or REST interface● Assists in post hoc data forensics (compliance? legal discovery?)
Downsides?● Weak deployment paradigm
○ Can import/export flow templates
○ But various processor config values will need to be updated by hand when moving from env to env
● Weak clustering story○ non-elastic○ SPOF master node
● Weak querying capability from UI● Most processors are micro-batching (event-time stream processing is still
experimental)● Sometimes tedious -- have to think in terms of several little, built-in pieces to
get a simple job done
Demo NotesCustom Java code provides:
● synthetic intraday ticks● trader state management● glue logic● websocket backend for dashboard UI
Custom HTML/JS code provides:
● live dashboard UI● smoothie.js charts● knockout.js binding/templating
NiFi:
● observes orders○ can deny orders based on ‘compliance
rules’● observes executions
○ routes ‘suspicious’ executions to file system for future scrutiny
Flink Streaming provides:
● trade recommendation engine● execution engine