apex hands-on lab - into the code! - meetupfiles.meetup.com/18978602/university program - writing an...

19

Click here to load reader

Upload: buidan

Post on 14-May-2018

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Apex Hands-on Lab - Into the code! - Meetupfiles.meetup.com/18978602/University program - Writing an Apache... · Apex Hands-on Lab - Into the code! ... // Additional config + Hints

Akshay Gore, Bhupesh ChawdaDataTorrent

Apex Hands-on Lab - Into the code!

Getting started with your first Apex Application!

Page 2: Apex Hands-on Lab - Into the code! - Meetupfiles.meetup.com/18978602/University program - Writing an Apache... · Apex Hands-on Lab - Into the code! ... // Additional config + Hints

© 2015 DataTorrent

Operators• Input Adaptor Vs

Generic Operators ?• What are streams?• What are ports?

Page 3: Apex Hands-on Lab - Into the code! - Meetupfiles.meetup.com/18978602/University program - Writing an Apache... · Apex Hands-on Lab - Into the code! ... // Additional config + Hints

© 2015 DataTorrent

Apex Operator Lifecycle

Page 4: Apex Hands-on Lab - Into the code! - Meetupfiles.meetup.com/18978602/University program - Writing an Apache... · Apex Hands-on Lab - Into the code! ... // Additional config + Hints

© 2015 DataTorrent

Apex Streaming Application

public class Application implements StreamingApplication{

populateDAG(DAG dag, Configuration conf) {

// Add Operators to dag - dag.addOperator(args)// Add Streams between operators - dag.addStream(args)// Additional config + Hints to YARN - Optional

} }

Page 5: Apex Hands-on Lab - Into the code! - Meetupfiles.meetup.com/18978602/University program - Writing an Apache... · Apex Hands-on Lab - Into the code! ... // Additional config + Hints

© 2015 DataTorrent

Apex Application - FilterWords

Apex Application DAG

● Problem statement - Filter words in the file○ Read a file located on HDFS○ Split each line into words, check if it is not one of the forbidden words and write it

down to HDFS

HDFS

Lines Filtered WordsHDFS

Page 6: Apex Hands-on Lab - Into the code! - Meetupfiles.meetup.com/18978602/University program - Writing an Apache... · Apex Hands-on Lab - Into the code! ... // Additional config + Hints

© 2015 DataTorrent

FilterWords Application DAG

Reader Tokenize Processor Writter

Input Operator (Adapter)

Output Operator (Adapter)

Generic Operators

HDFS HDFS

Lines WordsFiltered Words

Page 7: Apex Hands-on Lab - Into the code! - Meetupfiles.meetup.com/18978602/University program - Writing an Apache... · Apex Hands-on Lab - Into the code! ... // Additional config + Hints

© 2015 DataTorrent

Prerequisites• JAVA 1.7 or above• Maven 3.0 or above • Apache Apex projects:

• Apache Apex Core: core platform, engine• Apache Apex Malhar: operators library

• Hadoop cluster in running state• Your favourite IDE - Eclipse / vi

Page 8: Apex Hands-on Lab - Into the code! - Meetupfiles.meetup.com/18978602/University program - Writing an Apache... · Apex Hands-on Lab - Into the code! ... // Additional config + Hints

© 2015 DataTorrent

Demo time!• Apex application structure• Application code walk through• How to execute the application• Assignment

Page 9: Apex Hands-on Lab - Into the code! - Meetupfiles.meetup.com/18978602/University program - Writing an Apache... · Apex Hands-on Lab - Into the code! ... // Additional config + Hints

© 2015 DataTorrent

Assignment - WordCount

Apex Application DAG

● Problem statement - Count occurrences of words in a file○ Read a file located on HDFS○ Emit count at the end of the every window and writes into HDFS

HDFS

Lines <Word, Count>HDFS

Page 10: Apex Hands-on Lab - Into the code! - Meetupfiles.meetup.com/18978602/University program - Writing an Apache... · Apex Hands-on Lab - Into the code! ... // Additional config + Hints

© 2015 DataTorrent

Assignment - Word Count Application DAG

Reader Tokenize Counter OutputHDFS HDFS

Lines Words<Word, count>

Page 11: Apex Hands-on Lab - Into the code! - Meetupfiles.meetup.com/18978602/University program - Writing an Apache... · Apex Hands-on Lab - Into the code! ... // Additional config + Hints

© 2015 DataTorrent

Assignment - What you need to do

Reader Tokenizer Processor WriterString String String

Line Words Words’

Counter WriterMap

{Word: Count}

Assignment

Page 12: Apex Hands-on Lab - Into the code! - Meetupfiles.meetup.com/18978602/University program - Writing an Apache... · Apex Hands-on Lab - Into the code! ... // Additional config + Hints

© 2015 DataTorrent

Assignment - Hints● Create copy of Processor.java. Name it Counter.java● Modify Counter.java as follows:

○ Define a data structure which can hold counts for words○ Process method of input port must count the occurrences○ Clear the counts in beginWindow() call○ Emit the counts in endWindow() call

Page 13: Apex Hands-on Lab - Into the code! - Meetupfiles.meetup.com/18978602/University program - Writing an Apache... · Apex Hands-on Lab - Into the code! ... // Additional config + Hints

© 2015 DataTorrent

Solution - Changes to Counter.java● Need to define a data structure which can hold counts for words

private HashMap<String, Integer> counts = new HashMap<>();

● Process method of input port must count the occurrences

if(counts.containsKey(refinedWord)) {

counts.put(refinedWord, counts.get(refinedWord) + 1);

} else {

counts.put(refinedWord, 1);

}

● Clear the counts in beginWindow call

counts.clear();

● Emit the counts in endWindow call

output.emit(counts.toString());

● Run Application Test

Page 14: Apex Hands-on Lab - Into the code! - Meetupfiles.meetup.com/18978602/University program - Writing an Apache... · Apex Hands-on Lab - Into the code! ... // Additional config + Hints

© 2015 DataTorrent

Assignment - Are we done yet?● Change the DAG

○ Replace Processor operator with the newly created operator - Counter

Page 15: Apex Hands-on Lab - Into the code! - Meetupfiles.meetup.com/18978602/University program - Writing an Apache... · Apex Hands-on Lab - Into the code! ... // Additional config + Hints

© 2015 DataTorrent

Assignment - Slight change● We are emitting a Map. However it is still a string.

○ Change type of output port of Counter to type Map○ Change type of input port of Writer to Map○ Make appropriate changes to Writer to read a Map and write in a format such that

each line belongs to a single word.

Page 16: Apex Hands-on Lab - Into the code! - Meetupfiles.meetup.com/18978602/University program - Writing an Apache... · Apex Hands-on Lab - Into the code! ... // Additional config + Hints

© 2015 DataTorrent

Assignment - Final change● Change the code such that each count is the overall count, not just for each

window?

Page 17: Apex Hands-on Lab - Into the code! - Meetupfiles.meetup.com/18978602/University program - Writing an Apache... · Apex Hands-on Lab - Into the code! ... // Additional config + Hints

© 2015 DataTorrent

Summary - Recap• Writing Apache Apex operators• Chaining the operators into an Apache Apex application• Executing the application on the Apache Apex platform

Page 18: Apex Hands-on Lab - Into the code! - Meetupfiles.meetup.com/18978602/University program - Writing an Apache... · Apex Hands-on Lab - Into the code! ... // Additional config + Hints

© 2015 DataTorrent

Where to go from here?Apache Apex Documentation - http://apex.incubator.apache.org/docs.htmlApache Apex Core Git - https://github.com/apache/incubator-apex-coreApache Apex Malhar Git - https://github.com/apache/incubator-apex-malhar

Join Users Mailing List - [email protected] Dev Mailing List - [email protected]

Send queries to Users Mailing List - [email protected] queries to Dev Mailing List - [email protected]

Page 19: Apex Hands-on Lab - Into the code! - Meetupfiles.meetup.com/18978602/University program - Writing an Apache... · Apex Hands-on Lab - Into the code! ... // Additional config + Hints

© 2015 DataTorrent

Thank You