java high level stream api
TRANSCRIPT
![Page 1: Java High Level Stream API](https://reader035.vdocuments.net/reader035/viewer/2022081520/587136a51a28abf0568b5de5/html5/thumbnails/1.jpg)
Stream API For Apex
June 2016
![Page 2: Java High Level Stream API](https://reader035.vdocuments.net/reader035/viewer/2022081520/587136a51a28abf0568b5de5/html5/thumbnails/2.jpg)
Apex Overview
![Page 3: Java High Level Stream API](https://reader035.vdocuments.net/reader035/viewer/2022081520/587136a51a28abf0568b5de5/html5/thumbnails/3.jpg)
Apex Overview
• YARN is the resource manager
• HDFS used for storing any persistent state
![Page 4: Java High Level Stream API](https://reader035.vdocuments.net/reader035/viewer/2022081520/587136a51a28abf0568b5de5/html5/thumbnails/4.jpg)
Current Development ModelDirected Acyclic Graph (DAG)
Filtered
Stream
Output StreamTuple Tuple
Filtered Stream
Enriched Stream
Enriched
Stream
er
Operator
er
Operator
er
Operator
er
Operator
er
Operator
er
Operator
● Stream is a sequence of data tuples● Typical Operator takes one or more input streams, performs computations & emits one or more
output streams● Each operator is your custom business logic in java, or built-in operator from our open source library● Operator has many instances that run in parallel and each instance is single-threaded● Directed Acyclic Graph (DAG) is made up of operators and streams
![Page 5: Java High Level Stream API](https://reader035.vdocuments.net/reader035/viewer/2022081520/587136a51a28abf0568b5de5/html5/thumbnails/5.jpg)
Current Application Example@ApplicationAnnotation(name="WordCountDemo")
public class Application implements StreamingApplication
{
@Override
public void populateDAG(DAG dag, Configuration conf)
{
WordCountInputOperator input = dag.addOperator("wordinput", new WordCountInputOperator());
UniqueCounter<String> wordCount = dag.addOperator("count", new UniqueCounter<String>());
ConsoleOutputOperator consoleOperator = dag.addOperator("console", new ConsoleOutputOperator());
dag.addStream("wordinput-count", input.outputPort, wordCount.data);
dag.addStream("count-console",wordCount.count, consoleOperator.input);
}
}
![Page 6: Java High Level Stream API](https://reader035.vdocuments.net/reader035/viewer/2022081520/587136a51a28abf0568b5de5/html5/thumbnails/6.jpg)
o Easier for beginners to start witho Fluent APIo Smaller learning curveo Transform methods in one place vs operator libraryo Operator API provides flexibility while high-level API provides
ease of use
Why we need high-level API
![Page 7: Java High Level Stream API](https://reader035.vdocuments.net/reader035/viewer/2022081520/587136a51a28abf0568b5de5/html5/thumbnails/7.jpg)
Stream API
map(..)filter(..)…addOperator(...)with(prop, val)…window(Opt...)
ApexStream<T> group(..)
groupByKey(...)reduce(..)fold(..)join(..)count(..)…window(Opt...)
WindowedStream<T>
<<interface>> <<interface>>
![Page 8: Java High Level Stream API](https://reader035.vdocuments.net/reader035/viewer/2022081520/587136a51a28abf0568b5de5/html5/thumbnails/8.jpg)
Stream API (Application Example)@ApplicationAnnotation(name = "WordCountStreamingApiDemo")
public class ApplicationWithStreamAPI implements StreamingApplication
{
@Override
public void populateDAG(DAG dag, Configuration configuration)
{
String localFolder = "./src/test/resources/data";
ApexStream<String> stream = StreamFactory
.fromFolder(localFolder)
.flatMap(new Split())
.window(new WindowOption.GlobalWindow(), new
TriggerOption().withEarlyFiringsAtEvery(Duration.millis(1000)).accumulatingFiredPanes())
.countByKey(new ConvertToKeyVal()).print();
stream.populateDag(dag);
}
}
![Page 9: Java High Level Stream API](https://reader035.vdocuments.net/reader035/viewer/2022081520/587136a51a28abf0568b5de5/html5/thumbnails/9.jpg)
How it works
o ApexStream<T> literally means bounded/unbounded data set of type T
o ApexStream<T> also holds a graph data struture of all operator and connections between operators from input to current point
o Each transform method attach one or more operators to current graph data structure and return a new Apex Stream object
o The graph data structure won’t be translated to Apex DAG until populateDag or run method are called
![Page 10: Java High Level Stream API](https://reader035.vdocuments.net/reader035/viewer/2022081520/587136a51a28abf0568b5de5/html5/thumbnails/10.jpg)
How it works (Con’t)
![Page 11: Java High Level Stream API](https://reader035.vdocuments.net/reader035/viewer/2022081520/587136a51a28abf0568b5de5/html5/thumbnails/11.jpg)
○ Method chain for readability○ Stateless transform(map, flatmap, filter)○ Some input and output are available (file, console, Kafka)○ Some interoperability (addOperator, getDag, set property/attributes etc)○ Local mode and distributed mode○ Annonymous function class support○ Extensible
Current Status
![Page 12: Java High Level Stream API](https://reader035.vdocuments.net/reader035/viewer/2022081520/587136a51a28abf0568b5de5/html5/thumbnails/12.jpg)
○ WindowedStream is in pull request along with Operators that support it○ A few window transforms (count, reduce, etc)○ 3 Window types (fix window, sliding window, session window)○ 3 Trigger types (early trigger, late trigger, at watermark)○ 3 Accumulation modes(accumulate, discard, accumulation_retraction)○ In memory window state (checkpointed)
Current Status (Con’t)
![Page 13: Java High Level Stream API](https://reader035.vdocuments.net/reader035/viewer/2022081520/587136a51a28abf0568b5de5/html5/thumbnails/13.jpg)
Roadmap○ Persistent window state for windowed operators (large state)○ Fully follow Beam model (window, trigger, watermark)○ Rich selection of windowed transform (group, combine, join)○ Support custom window assignor○ Support custom trigger○ More input/output (hbase, cassendra, jdbc, etc)○ Better schema support○ More language support (java 8, scala, etc...)○ What the community asks for
![Page 14: Java High Level Stream API](https://reader035.vdocuments.net/reader035/viewer/2022081520/587136a51a28abf0568b5de5/html5/thumbnails/14.jpg)
Resources○ Apache Apex website - http://apex.apache.org/○ Subscribe - http://apex.apache.org/community.html○ Download - http://apex.apache.org/downloads.html○ Twitter - @ApacheApex; Follow - https://twitter.com/apacheapex○ Facebook - https://www.facebook.com/ApacheApex/○ Meetup - http://www.meetup.com/topics/apache-apex○ SlideShare -
http://www.slideshare.net/ApacheApex/presentations○ More Examples - https://github.com/DataTorrent/examples○ Pull request
https://github.com/apache/apex-malhar/pull/319 https://github.com/apache/apex-malhar/pull/327
![Page 15: Java High Level Stream API](https://reader035.vdocuments.net/reader035/viewer/2022081520/587136a51a28abf0568b5de5/html5/thumbnails/15.jpg)
Demo & Code Example
○ Word Count○ AutoComplete