monoids monoids everywhere
TRANSCRIPT
![Page 1: Monoids monoids everywhere](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f030961a28ab0b168b4607/html5/thumbnails/1.jpg)
Tetra Data Blitz10/1/2015
![Page 2: Monoids monoids everywhere](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f030961a28ab0b168b4607/html5/thumbnails/2.jpg)
Monoids Monoids
Everywherein ~5 minutes
Kevin Faro
![Page 3: Monoids monoids everywhere](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f030961a28ab0b168b4607/html5/thumbnails/3.jpg)
http://s2.quickmeme.com/img/44/44b0bd758f8ee5c81362923f0d5c8e017c9ddf623925e60c29a4c015b89fbb45.jpg
![Page 4: Monoids monoids everywhere](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f030961a28ab0b168b4607/html5/thumbnails/4.jpg)
Oh, that wasn’t clear enough?An operation is considered a monoid if:
1. it is associative a. (a●b)●c=a●(b●c)
2. it has an identity element a. e●a=a●e=a
![Page 5: Monoids monoids everywhere](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f030961a28ab0b168b4607/html5/thumbnails/5.jpg)
Examples● Addition
○ associative: (1+2)+3=1+(2+3)=6○ identity: 0+1=1+0=1
● Multiplication○ associative: (1*2)*3=1*(2*3)=6○ identity: 1*2=2*1=2
● Min○ you get the idea ...
● Max● Set Union
![Page 6: Monoids monoids everywhere](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f030961a28ab0b168b4607/html5/thumbnails/6.jpg)
Let’s take a look at algebird
http://www.michael-noll.com/blog/2013/12/02/twitter-algebird-monoid-monad-for-large-scala-data-analytics/
![Page 7: Monoids monoids everywhere](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f030961a28ab0b168b4607/html5/thumbnails/7.jpg)
https://izbicki.me/img/uploads/2013/05/fry-300x225.jpg
![Page 8: Monoids monoids everywhere](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f030961a28ab0b168b4607/html5/thumbnails/8.jpg)
Why is this so awesome?!?!● Divide and Conquer● Parallelization● Incrementalism
Sound Familiar?
● map/REDUCE○ perfect for the reduce phase ○ see Scalding: expenses.groupBy('shoppingLocation) { _.sum[Double]('cost -> 'totalCost) }
● Streaming○ perfect for maintaining running calculations on streams of data (storm, …)
![Page 9: Monoids monoids everywhere](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f030961a28ab0b168b4607/html5/thumbnails/9.jpg)
Approximate Data Structures● HyperLogLog
○ an algorithm for the count-distinct problem, approximating the number of distinct elements in a Set.
● Count-min Sketch○ a probabilistic data structure that provides an approximate frequency table.
● MinHash○ estimates how similar two sets are (approximate Jaccard Similarity)
● Bloom filter○ a probabilistic data structure that is used to test whether an element is a member of a Set ○ can answer definitely No or maybe Yes
![Page 10: Monoids monoids everywhere](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f030961a28ab0b168b4607/html5/thumbnails/10.jpg)
Examples● HyperLogLog
○ How many unique twitter handles tweeted @justinbieber in the past month?
● Count-min Sketch○ What are the frequencies of the hashtags in those tweets?
● MinHash○ How similar are the followers of @justinbieber(~70M) to the followers of @katyperry
(~76M)
● Bloom filter○ Did Kevin tweet to @justinbieber in the past month? maybe yes. Must be a false positive,
can you really trust a bloom filter?!?!?
![Page 11: Monoids monoids everywhere](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f030961a28ab0b168b4607/html5/thumbnails/11.jpg)
How did that get in there?
![Page 12: Monoids monoids everywhere](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f030961a28ab0b168b4607/html5/thumbnails/12.jpg)
https://highlyscalable.files.wordpress.com/2012/04/probabilistic-sizes.png
This is better than Spanks™!
![Page 13: Monoids monoids everywhere](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f030961a28ab0b168b4607/html5/thumbnails/13.jpg)
Thanks Twitter
https://github.com/twitter/algebird*
* Sorry, Algebird doesn’t have a cool logo. Don’t blame me, blame Twitter!
![Page 15: Monoids monoids everywhere](https://reader031.vdocuments.net/reader031/viewer/2022030310/58f030961a28ab0b168b4607/html5/thumbnails/15.jpg)
Need more?● http://www.michael-noll.com/blog/2013/12/02/twitter-algebird-monoid-
monad-for-large-scala-data-analytics/● https://github.com/twitter/algebird/wiki/Learning-Algebird-Monoids-with-
REPL● https://github.com/twitter/algebird● https://github.com/twitter/scalding● https://github.com/twitter/summingbird● https://github.com/twitter/algebird/wiki/Abstract-algebra-definitions