map, flatmap and reduce are your new best friends (javaone, svcc)

83
@crichardson Map(), flatMap() and reduce() are your new best friends: Simpler collections, concurrency, and big data Chris Richardson Author of POJOs in Action Founder of the original CloudFoundry.com @crichardson [email protected] http://plainoldobjects.com

Upload: chris-richardson

Post on 01-Dec-2014

169 views

Category:

Software


0 download

DESCRIPTION

Higher-order functions such as map(), flatmap(), filter() and reduce() have their origins in mathematics and ancient functional programming languages such as Lisp. But today they have entered the mainstream and are available in languages such as JavaScript, Scala and Java 8. They are well on their way to becoming an essential part of every developer’s toolbox. In this talk you will learn how these and other higher-order functions enable you to write simple, expressive and concise code that solve problems in a diverse set of domains. We will describe how you use them to process collections in Java and Scala. You will learn how functional Futures and Rx (Reactive Extensions) Observables simplify concurrent code. We will even talk about how to write big data applications in a functional style using libraries such as Scalding.

TRANSCRIPT

Page 1: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Map(), flatMap() and reduce() are your new best friends:

Simpler collections, concurrency, and big data

Chris Richardson

Author of POJOs in ActionFounder of the original CloudFoundry.com

@[email protected]://plainoldobjects.com

Page 2: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Presentation goalHow functional programming simplifies

your code

Show that map(), flatMap() and reduce()

are remarkably versatile functions

Page 3: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

About Chris

Page 4: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

About Chris

Founder of a buzzword compliant (stealthy, social, mobile, big data, machine learning, ...) startup

Consultant helping organizations improve how they architect and deploy applications using cloud, micro services, polyglot applications, NoSQL, ...

Page 5: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Agenda

Why functional programming?

Simplifying collection processing

Eliminating NullPointerExceptions

Simplifying concurrency with Futures and Rx Observables

Tackling big data problems with functional programming

Page 6: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Functional programming is a programming paradigm

Functions are the building blocks of the application

Best done in a functional programming language

Page 7: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Functions as first class citizens

Assign functions to variables

Store functions in fields

Use and write higher-order functions:

Take functions as parameters

Return functions as values

Page 8: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Avoids mutable state

Use:

Immutable data structures

Single assignment variables

Some functional languages such as Haskell don’t allow side-effects

Page 9: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Why functional programming?

"the highest goal of programming-language design to enable good ideas to be elegantly

expressed"

http://en.wikipedia.org/wiki/Tony_Hoare

Page 10: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Why functional programming?More expressive

More concise

More intuitive - solution matches problem definition

Functional code is usually much more composable

Immutable state:

Less error-prone

Easy parallelization and concurrency

But be pragmatic

Page 11: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

An ancient idea that has recently become popular

Page 12: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Mathematical foundation:

λ-calculus

Introduced byAlonzo Church in the 1930s

Page 13: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Lisp = an early functional language invented in 1958

http://en.wikipedia.org/wiki/Lisp_(programming_language)

1940

1950

1960

1970

1980

1990

2000

2010

garbage collection dynamic typing

self-hosting compiler tree data structures

(defun factorial (n) (if (<= n 1) 1 (* n (factorial (- n 1)))))

Page 14: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

My final year project in 1985: Implementing SASL in LISP

sieve (p:xs) = p : sieve [x | x <- xs, rem x p > 0];

primes = sieve [2..]

A list of integers starting with 2

Filter out multiples of p

Page 15: Map, flatmap and reduce are your new best friends (javaone, svcc)

Mostly an Ivory Tower technology

Lisp was used for AI

FP languages: Miranda, ML, Haskell, ...

“Side-effects kills kittens and

puppies”

Page 16: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

http://steve-yegge.blogspot.com/2010/12/haskell-researchers-announce-discovery.html

!*

!*

!*

Page 17: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

But today FP is mainstreamClojure - a dialect of Lisp

A hybrid OO/functional language

A hybrid OO/FP language for .NET

Java 8 has lambda expressions

Page 18: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Java 8 lambda expressions are functions

x -> x * x

x -> { for (int i = 2; i < Math.sqrt(x); i = i + 1) { if (x % i == 0) return false; } return true; };

(x, y) -> x * x + y * y

Page 19: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Agenda

Why functional programming?

Simplifying collection processing

Eliminating NullPointerExceptions

Simplifying concurrency with Futures and Rx Observables

Tackling big data problems with functional programming

Page 20: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Lot’s of application code=

collection processing:

Mapping, filtering, and reducing

Page 21: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Social network examplepublic class Person {

enum Gender { MALE, FEMALE }

private Name name; private LocalDate birthday; private Gender gender; private Hometown hometown;

private Set<Friend> friends = new HashSet<Friend>(); ....

public class Friend {

private Person friend; private LocalDate becameFriends; ...}

public class SocialNetwork { private Set<Person> people; ...

Page 22: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Mapping, filtering, and reducing

public class Person {

public Set<Hometown> hometownsOfFriends() { Set<Hometown> result = new HashSet<>(); for (Friend friend : friends) { result.add(friend.getPerson().getHometown()); } return result; }

Declare result variable

Modify result

Return result

Iterate

Page 23: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Mapping, filtering, and reducingpublic class SocialNetwork {

private Set<Person> people;

...

public Set<Person> lonelyPeople() { Set<Person> result = new HashSet<Person>(); for (Person p : people) { if (p.getFriends().isEmpty()) result.add(p); } return result; }

Declare result variable

Modify result

Return result

Iterate

Page 24: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Mapping, filtering, and reducing

public class SocialNetwork {

private Set<Person> people;

...

public int averageNumberOfFriends() { int sum = 0; for (Person p : people) { sum += p.getFriends().size(); } return sum / people.size(); }

Declare scalar result variable

Modify result

Return result

Iterate

Page 25: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Problems with this style of programming

Lots of verbose boilerplate - basic operations require 5+ LOC

Imperative (how to do it) NOT declarative (what to do)

Mutable variables are potentially error prone

Difficult to parallelize

Page 26: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Java 8 streams to the rescue

A sequence of elements

“Wrapper” around a collection

Streams are lazy, i.e. can be infinite

Provides a functional/lambda-based API for transforming, filtering and aggregating elements

Much simpler, cleaner and declarative code

Page 27: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Using Java 8 streams - mappingclass Person ..

private Set<Friend> friends = ...;

public Set<Hometown> hometownsOfFriends() { return friends.stream() .map(f -> f.getPerson().getHometown()) .collect(Collectors.toSet()); }

transforming lambda expression

Page 28: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

The map() function

s1 a b c d e ...

s2 f(a) f(b) f(c) f(d) f(e) ...

s2 = s1.map(f)

Page 29: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

public class SocialNetwork {

private Set<Person> people;

...

public Set<Person> lonelyPeople() { return people.stream()

.filter(p -> p.getFriends().isEmpty())

.collect(Collectors.toSet()); }

Using Java 8 streams - filtering

predicate lambda expression

Page 30: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Using Java 8 streams - friend of friends V1

class Person ..

public Set<Person> friendOfFriends() { Set<Set<Friend>> fof = friends.stream() .map(friend -> friend.getPerson().friends) .collect(Collectors.toSet()); ... }

Using map() => Set of Sets :-(

Somehow we need to flatten

Page 31: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Using Java 8 streams - mapping

class Person ..

public Set<Person> friendOfFriends() { return friends.stream() .flatMap(friend -> friend.getPerson().friends.stream()) .map(Friend::getPerson) .filter(person -> person != this) .collect(Collectors.toSet()); }

maps and flattens

Page 32: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Chaining with flatMap()

s1 a b ...

s2 f(a)0 f(a)1 f(b)0 f(b)1 f(b)2 ...

s2 = s1.flatMap(f)

Page 33: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Using Java 8 streams - reducingpublic class SocialNetwork {

private Set<Person> people;

...

public long averageNumberOfFriends() { return people.stream() .map ( p -> p.getFriends().size() ) .reduce(0, (x, y) -> x + y) / people.size(); } int x = 0;

for (int y : inputStream) x = x + yreturn x;

Page 34: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

The reduce() function

s1 a b c d e ...

x = s1.reduce(initial, f)

f(f(f(f(f(f(initial, a), b), c), d), e), ...)

Page 35: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Newton's method for calculating sqrt(x)

It’s an iterative algorithm

initial value = guess

betterValue = value - (value * value - x) / (2 * value)

Iterate until |value - betterValue| < precision

Page 36: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Functional square root in Scalapackage net.chrisrichardson.fp.scala.squareroot

object SquareRootCalculator {

def squareRoot(x: Double, precision: Double) : Double =

Stream.iterate(x / 2)( value => value - (value * value - x) / (2 * value) ).

Creates an infinite stream: seed, f(seed), f(f(seed)), .....

sliding(2).map( s => (s.head, s.last)). find { case (value , newValue) => Math.abs(value - newValue) < precision}. get._2}

a, b, c, ... => (a, b), (b, c), (c, ...), ...

Find the first convergent approximation

Page 37: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Adopting FP with Java 8 is straightforward

Switch your application to Java 8Start using streams and lambdasEclipse can refactor anonymous inner classes to lambdas

Or write modules in Scala: more expressive and runs on older JVMs

Page 38: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Agenda

Why functional programming?

Simplifying collection processing

Eliminating NullPointerExceptions

Simplifying concurrency with Futures and Rx Observables

Tackling big data problems with functional programming

Page 39: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Tony’s $1B mistake

“I call it my billion-dollar mistake. It was the invention of the null

reference in 1965....But I couldn't resist the temptation to put in a null reference, simply because it

was so easy to implement...”

http://qconlondon.com/london-2009/presentation/Null+References:+The+Billion+Dollar+Mistake

Page 40: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Coding with null pointersclass Person

public Friend longestFriendship() { Friend result = null; for (Friend friend : friends) { if (result == null || friend.getBecameFriends() .isBefore(result.getBecameFriends())) result = friend; } return result; }

Friend oldestFriend = person.longestFriendship();if (oldestFriend != null) { ...} else { ...}

Null check is essential yet easily forgotten

Return null if no friends

Page 41: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Java 8 Optional<T>A wrapper for nullable references

It has two states:

empty ⇒ throws an exception if you try to get the reference

non-empty ⇒ contain a non-null reference

Provides methods for: testing whether it has a value, getting the value, ...

Use an Optional<T> parameter if caller can pass in null

Return reference wrapped in an instance of this type instead of null

Uses the type system to explicitly represent nullability

Page 42: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Coding with optionalsclass Person public Optional<Friend> longestFriendship() { Friend result = null; for (Friend friend : friends) { if (result == null || friend.getBecameFriends().isBefore(result.getBecameFriends())) result = friend; } return Optional.ofNullable(result); }

Optional<Friend> oldestFriend = person.longestFriendship();// Might throw java.util.NoSuchElementException: No value present// Person dangerous = popularPerson.get();if (oldestFriend.isPresent) { ...oldestFriend.get()} else { ...}

Page 43: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Using Optionals - better

Optional<Friend> oldestFriendship = ...;

Friend whoToCall1 = oldestFriendship.orElse(mother);

Avoid calling isPresent() and get()

Friend whoToCall3 = oldestFriendship.orElseThrow( () -> new LonelyPersonException());

Friend whoToCall2 = oldestFriendship.orElseGet(() -> lazilyFindSomeoneElse());

Page 44: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Transforming with map()public class Person {

public Optional<Friend> longestFriendship() { return ...; }

public Optional<Long> ageDifferenceWithOldestFriend() { Optional<Friend> oldestFriend = longestFriendship(); return oldestFriend.map ( of -> Math.abs(of.getPerson().getAge() - getAge())) ); }

Eliminates messy conditional logic

Page 45: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Chaining with flatMap()class Person

public Optional<Friend> longestFriendship() {...}

public Optional<Friend> longestFriendshipOfLongestFriend() { return longestFriendship() .flatMap(friend -> friend.getPerson().longestFriendship());}

not always a symmetric relationship. :-)

Page 46: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Agenda

Why functional programming?

Simplifying collection processing

Eliminating NullPointerExceptions

Simplifying concurrency with Futures and Rx Observables

Tackling big data problems with functional programming

Page 47: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Let’s imagine you are performing a CPU intensive operation

class Person ..

public Set<Hometown> hometownsOfFriends() { return friends.stream() .map(f -> cpuIntensiveOperation(f)) .collect(Collectors.toSet()); }

Page 48: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

class Person ..

public Set<Hometown> hometownsOfFriends() { return friends.parallelStream() .map(f -> cpuIntensiveOperation(f)) .collect(Collectors.toSet()); }

Parallel streams = simple concurrency Potentially uses N cores

⇒Nx speed up

Perhaps this will be faster. Perhaps not

Page 49: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Let’s imagine that you are writing code to display the

products in a user’s wish list

Page 50: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

The need for concurrency

Step #1

Web service request to get the user profile including wish list (list of product Ids)

Step #2

For each productId: web service request to get product info

Sequentially ⇒ terrible response time

Need fetch productInfo concurrently

Composing sequential + scatter/gather-style operations is very common

Page 51: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Futures are a great concurrency abstraction

http://en.wikipedia.org/wiki/Futures_and_promises

Page 52: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Worker thread or event-driven code

Main thread

Composition with futures

Outcome

Future 2

Client

get Asynchronous operation 2

set

initiates

Asynchronous operation 1

Outcome

Future 1

getset

Page 53: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

BenefitsSimple way for multiple concurrent activities to communicate safely

Abstraction:

Client does not know how the asynchronous operation is implemented, e.g. thread pool, event-driven, ....

Easy to implement scatter/gather:

Scatter: Client can invoke multiple asynchronous operations and gets a Future for each one.

Gather: Get values from the futures

Page 54: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

But composition with basic futures is difficult

Java 7 future.get([timeout]):

Blocking API ⇒ client blocks thread ⇒ poor scalability

Difficult to compose multiple concurrent operations

Futures with callbacks:

e.g. Guava ListenableFutures, Spring 4 ListenableFuture

Attach callbacks to all futures and asynchronously consume outcomes

But callback-based code = messy code

See http://techblog.netflix.com/2013/02/rxjava-netflix-api.html

We need functional futures!

Page 55: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Functional futures - Scala, Java 8 CompletableFuture

def asyncPlus(x : Int, y :Int): Future[Int] = ... x + y ...

val future2 = asyncPlus(4, 5).map{ _ * 3 }

assertEquals(27, Await.result(future2, 1 second))

Asynchronously transforms future

def asyncSquare(x : Int) : Future[Int] = ... x * x ...

val f2 = asyncPlus(5, 8).flatMap { x => asyncSquare(x) }

assertEquals(169, Await.result(f2, 1 second))

Calls asyncSquare() with the eventual outcome of asyncPlus(), i.e. chaining

Page 56: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

map() etc are asynchronous

outcome2

f2

f2 = f1 map (someFn)

Outcome1

f1

Implemented using callbacks

outcome2 = someFn(outcome1)

Page 57: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

class WishListService(...) { def getWishList(userId : Long) : Future[WishList] = {

userService.getUserProfile(userId).

Scala wish list serviceFuture[UserProfile]

map { userProfile => userProfile.wishListProductIds}.

flatMap { productIds => val listOfProductFutures = productIds map productInfoService.getProductInfo

Future.sequence(listOfProductFutures) }.

map { products => WishList(products) }

Future[List[Long]]

List[Future[ProductInfo]]

Future[List[ProductInfo]]

Future[WishList]

Page 58: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Using Java 8 CompletableFuturespublic CompletableFuture<Wishlist> getWishlistDetails(long userId) { return userService.getUserProfile(userId).thenComposeAsync(userProfile -> {

Stream<CompletableFuture<ProductInfo>> s1 = userProfile.getWishListProductIds() .stream() .map(productInfoService::getProductInfo);

Stream<CompletableFuture<List<ProductInfo>>> s2 = s1.map(fOfPi -> fOfPi.thenApplyAsync(pi -> Arrays.asList(pi)));

CompletableFuture<List<ProductInfo>> productInfos = s2 .reduce((f1, f2) -> f1.thenCombine(f2, ListUtils::union)) .orElse(CompletableFuture.completedFuture(Collections.emptyList()));

return productInfos.thenApply(list -> new Wishlist()); }); }

Java 8 is missing Future.sequence()

flatMap()!

map()!

Page 59: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Your mouse is your database

Erik Meijer

http://queue.acm.org/detail.cfm?id=2169076

Page 60: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Introducing Reactive Extensions (Rx)

The Reactive Extensions (Rx) is a library for composing asynchronous and event-based programs ....

Using Rx, developers represent asynchronous data streams with Observables , query asynchronous

data streams using LINQ operators , and .....

https://rx.codeplex.com/

Page 61: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

About RxJava

Reactive Extensions (Rx) for the JVM

Developed by Netflix

Original motivation was to provide rich, functional Futures

Implemented in Java

Adaptors for Scala, Groovy and Clojure

Embraced by Akka and Spring Reactor: http://www.reactive-streams.org/

https://github.com/Netflix/RxJava

Page 62: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

RxJava core concepts

trait Observable[T] { def subscribe(observer : Observer[T]) : Subscription ...}

trait Observer[T] {def onNext(value : T)def onCompleted()def onError(e : Throwable)

}

Notifies

An asynchronous stream of items

Used to unsubscribe

Page 63: Map, flatmap and reduce are your new best friends (javaone, svcc)

Comparing Observable to...Observer pattern - similar but adds

Observer.onComplete()

Observer.onError()

Iterator pattern - mirror image

Push rather than pull

Futures - similar

Can be used as Futures

But Observables = a stream of multiple values

Collections and Streams - similar

Functional API supporting map(), flatMap(), ...

But Observables are asynchronous

Page 64: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Fun with observables

val every10Seconds = Observable.interval(10 seconds)

-1 0 1 ...

t=0 t=10 t=20 ...

val oneItem = Observable.items(-1L)

val ticker = oneItem ++ every10Seconds

val subscription = ticker.subscribe { (value: Long) => println("value=" + value) }...subscription.unsubscribe()

Page 65: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

def getTableStatus(tableName: String) : Observable[DynamoDbStatus]=

Observable { subscriber: Subscriber[DynamoDbStatus] =>

}

Observables as the result of an asynchronous operation

amazonDynamoDBAsyncClient.describeTableAsync( new DescribeTableRequest(tableName), new AsyncHandler[DescribeTableRequest, DescribeTableResult] {

override def onSuccess(request: DescribeTableRequest, result: DescribeTableResult) = { subscriber.onNext(DynamoDbStatus(result.getTable.getTableStatus)) subscriber.onCompleted() }

override def onError(exception: Exception) = exception match { case t: ResourceNotFoundException => subscriber.onNext(DynamoDbStatus("NOT_FOUND")) subscriber.onCompleted() case _ => subscriber.onError(exception) } }) }

Page 66: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Transforming/chaining observables with flatMap()

val tableStatus = ticker.flatMap { i => logger.info("{}th describe table", i + 1) getTableStatus(name) }

Status1 Status2 Status3 ...

t=0 t=10 t=20 ...+ Usual collection methods: map(), filter(), take(), drop(), ...

Page 67: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Calculating rolling averageclass AverageTradePriceCalculator {

def calculateAverages(trades: Observable[Trade]): Observable[AveragePrice] = { ... }

case class Trade( symbol : String, price : Double, quantity : Int ...)

case class AveragePrice(symbol : String, price : Double, ...)

Page 68: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Calculating average pricesdef calculateAverages(trades: Observable[Trade]): Observable[AveragePrice] = {

trades.groupBy(_.symbol).map { symbolAndTrades => val (symbol, tradesForSymbol) = symbolAndTrades val openingEverySecond =

Observable.items(-1L) ++ Observable.interval(1 seconds) def closingAfterSixSeconds(opening: Any) =

Observable.interval(6 seconds).take(1)

tradesForSymbol.window(...).map { windowOfTradesForSymbol => windowOfTradesForSymbol.fold((0.0, 0, List[Double]())) { (soFar, trade) => val (sum, count, prices) = soFar (sum + trade.price, count + trade.quantity, trade.price +: prices) } map { x => val (sum, length, prices) = x AveragePrice(symbol, sum / length, prices) } }.flatten }.flatten}

Page 69: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Agenda

Why functional programming?

Simplifying collection processing

Eliminating NullPointerExceptions

Simplifying concurrency with Futures and Rx Observables

Tackling big data problems with functional programming

Page 70: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Let’s imagine that you want to count word frequencies

Page 71: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Scala Word Count

val frequency : Map[String, Int] = Source.fromFile("gettysburgaddress.txt").getLines() .flatMap { _.split(" ") }.toList

frequency("THE") should be(11)frequency("LIBERTY") should be(1)

.groupBy(identity) .mapValues(_.length))

Map

Reduce

Page 72: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

But how to scale to a cluster of machines?

Page 73: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Apache HadoopOpen-source ecosystem for reliable, scalable, distributed computing

Hadoop Distributed File System (HDFS)

Efficiently stores very large amounts of data

Files are partitioned and replicated across multiple machines

Hadoop MapReduce

Batch processing system

Provides plumbing for writing distributed jobs

Handles failures

And, much, much more...

Page 74: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Overview of MapReduceInputData

Mapper

Mapper

Mapper

Reducer

Reducer

Reducer

Output

DataShuffle

(K,V)

(K,V)

(K,V)

(K,V)*

(K,V)*

(K,V)*

(K1,V, ....)*

(K2,V, ....)*

(K3,V, ....)*

(K,V)

(K,V)

(K,V)

Page 75: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

MapReduce Word count - mapper

class Map extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } }}

(“Four”, 1), (“score”, 1), (“and”, 1), (“seven”, 1), ...

Four score and seven years⇒

http://wiki.apache.org/hadoop/WordCount

Page 76: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Hadoop then shuffles the key-value pairs...

Page 77: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

MapReduce Word count - reducer

class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {

public void reduce(Text key, Iterable<IntWritable> values, Context context) { int sum = 0; for (IntWritable val : values) { sum += val.get(); } context.write(key, new IntWritable(sum)); } }

(“the”, 11)

(“the”, (1, 1, 1, 1, 1, 1, ...))⇒

http://wiki.apache.org/hadoop/WordCount

Page 78: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

About MapReduceVery simple programming abstraction yet incredibly powerful

By chaining together multiple map/reduce jobs you can process very large amounts of data in interesting ways

e.g. Apache Mahout for machine learning

But

Mappers and Reducers = verbose code

Development is challenging, e.g. unit testing is difficult

It’s disk-based, batch processing ⇒ slow

Page 79: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Scalding: Scala DSL for MapReduce

class WordCountJob(args : Args) extends Job(args) { TextLine( args("input") ) .flatMap('line -> 'word) { line : String => tokenize(line) } .groupBy('word) { _.size } .write( Tsv( args("output") ) )

def tokenize(text : String) : Array[String] = { text.toLowerCase.replaceAll("[^a-zA-Z0-9\\s]", "") .split("\\s+") }}

https://github.com/twitter/scalding

Expressive and unit testable

Each row is a map of named fields

Page 80: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Apache SparkCreated at UC Berkeley and now part of the Hadoop ecosystem

Key abstraction = Resilient Distributed Datasets (RDD)

Collection that is partitioned across cluster members

Operations are parallelized

Created from either a collection or a Hadoop supported datasource - HDFS, S3 etc

Can be cached in-memory for super-fast performance

Can be replicated for fault-tolerance

Scala, Java, and Python APIs

http://spark.apache.org

Page 81: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Spark Word Countval sc = new SparkContext(...)

sc.textFile(“s3n://mybucket/...”) .flatMap { _.split(" ")} .groupBy(identity) .mapValues(_.length) .toArray.toMap }}

Expressive, unit testable and very fast

Very similar to Scala collection

code!!

Page 82: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Summary

Functional programming enables the elegant expression of good ideas in a wide variety of domains

map(), flatMap() and reduce() are remarkably versatile higher-order functions

Use FP and OOP together

Java 8 has taken a good first step towards supporting FP

Go write some functional code!

Page 83: Map, flatmap and reduce are your new best friends (javaone, svcc)

@crichardson

Questions?

@crichardson [email protected]

http://plainoldobjects.com