open source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · mar., 10th...
TRANSCRIPT
![Page 1: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/1.jpg)
Open Source development for students.
Why should I work on free software?
http://www.flickr.com/photos/inaz/454059437By Inaz
![Page 2: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/2.jpg)
Isabel Drost
Nighttime:Co-Founder Apache Mahout.
Organizer of Berlin Hadoop Get Together.Member ComDev PMC.
Daytime:Software developer
![Page 3: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/3.jpg)
Hello...
HPI students.
![Page 4: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/4.jpg)
Agenda
• The Apache Software Foundation.
• Apache Mahout.
• Reasons and ways to get started.
• Invitation.
![Page 5: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/5.jpg)
What?
Apache Software Foundation
![Page 6: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/6.jpg)
Community over code.
![Page 7: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/7.jpg)
Meritocracy.
![Page 8: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/8.jpg)
Open communication.
![Page 9: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/9.jpg)
NOT:
Github, Google Code, sourceforge.
![Page 10: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/10.jpg)
How?
Behind the scenes.
![Page 11: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/11.jpg)
![Page 12: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/12.jpg)
![Page 13: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/13.jpg)
![Page 14: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/14.jpg)
![Page 15: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/15.jpg)
![Page 16: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/16.jpg)
![Page 17: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/17.jpg)
![Page 18: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/18.jpg)
![Page 19: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/19.jpg)
![Page 20: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/20.jpg)
![Page 21: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/21.jpg)
![Page 22: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/22.jpg)
![Page 23: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/23.jpg)
![Page 24: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/24.jpg)
![Page 25: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/25.jpg)
![Page 26: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/26.jpg)
Community development
GsoCMentoring
University relations
![Page 27: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/27.jpg)
![Page 28: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/28.jpg)
![Page 29: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/29.jpg)
![Page 30: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/30.jpg)
How?
Open source collaboration tools are good for you.
![Page 31: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/31.jpg)
![Page 32: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/32.jpg)
![Page 33: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/33.jpg)
![Page 34: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/34.jpg)
![Page 35: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/35.jpg)
![Page 36: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/36.jpg)
Mahout
A sub-project of Lucene
![Page 37: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/37.jpg)
![Page 38: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/38.jpg)
![Page 39: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/39.jpg)
![Page 40: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/40.jpg)
January 3, 2006 by Matt Callowhttp://www.flickr.com/photos/blackcustard/81680010
![Page 41: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/41.jpg)
News aggregation
Today: Read news papers,Blogs, Twitter, RSS feed.
Wish: Aggregate sourcesand track emerging topics.
September 10, 2008 by Alex Barthhttp://www.flickr.com/photos/a-barth/2846621384
![Page 42: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/42.jpg)
![Page 43: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/43.jpg)
Go to cinema
Today: IMDB, zitty, movie reviewpages, twitter, blogs, ask friends.
Wish: Reviews, sentimentdetection, recommendations.
March 22, 2008 by Crystian Cruzhttp://www.flickr.com/photos/crystiancruz/2353895708
![Page 44: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/44.jpg)
Machine learning – what's that?
![Page 45: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/45.jpg)
Image by John Leech, from: The Comic History of Rome by Gilbert Abbott A Beckett.Bradbury, Evans & Co, London, 1850s
Archimedes taking a Warm Bath
![Page 46: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/46.jpg)
Archimedes model of nature
![Page 47: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/47.jpg)
June 25, 2008 by chase-mehttp://www.flickr.com/photos/sasy/2609508999
![Page 48: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/48.jpg)
![Page 49: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/49.jpg)
An SVM's model of nature
![Page 50: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/50.jpg)
The challenge
![Page 51: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/51.jpg)
• Large amounts of data.
• Structured and unstructured data.
• Diverse tasks.
![Page 52: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/52.jpg)
Mission
Provide scalable data mining algorithms.
![Page 53: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/53.jpg)
• Commercially friendly license.
• Scalable to large amounts of data.
• Well documented.
• Healthy community.
• Targeted to developers.
![Page 54: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/54.jpg)
What does Mahout have to offer.
![Page 55: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/55.jpg)
Discover groups of items
• Group items by similarity.
• Examples:– Group news articles by topic.– Find developers with similar interests.
![Page 56: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/56.jpg)
![Page 57: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/57.jpg)
![Page 58: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/58.jpg)
Discover groups of similar items
• Canopy.
• k-Means.
• Fuzzy k-Means.
• Dirichlet based.
• Others upcoming.
![Page 59: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/59.jpg)
Discover groups of similar items
![Page 60: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/60.jpg)
Identify dominant topics
• Given a dataset of texts, identify main topics.
• Examples:– Dominant topics in set of mails.– Identify news message categories.
Algorithms: Parallel LDA
![Page 61: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/61.jpg)
Assign items to defined categories.
• Given pre-defined categories, assign items to it.
• Examples:– Spam mail classification.– Discovery of images depicting humans.
![Page 62: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/62.jpg)
By freezelight, http://www.flickr.com/photos/63056612@N00/155554663/
![Page 63: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/63.jpg)
![Page 64: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/64.jpg)
![Page 65: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/65.jpg)
Assign items to defined categories.
• Naïve Bayes.
• Complementary naïve bayes.
• Random forests.
• Others upcoming.
![Page 66: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/66.jpg)
Assign items to defined categories
• Examples based on “standard” datasets:
• 20 Newsgroupshttp://cwiki.apache.org/MAHOUT/twentynewsgroups.html
• Wikipediahttp://cwiki.apache.org/MAHOUT/wikipediabayesexample.html
![Page 67: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/67.jpg)
Recommendation mining.
• Recommend items to users.
• Examples:– Find books related to the book I am buying.– Find movies I might like.
![Page 68: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/68.jpg)
Recommending places
![Page 69: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/69.jpg)
Recommending people
![Page 70: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/70.jpg)
Recommendation mining.
• Integrated Taste.• Mature Java library.• Java-based, web service / HTTP bindings.
• Batch mode based on EC2 and Hadoop.
![Page 71: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/71.jpg)
Frequent pattern mining
• Given groups of items, find commonly co-occurring items.
• Examples:– In shopping carts find items bought together.– In query logs find queries issued in one session.
![Page 72: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/72.jpg)
By crypto, http://www.flickr.com/photos/crypto/3201254932/sizes/l/
![Page 73: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/73.jpg)
By crypto, http://www.flickr.com/photos/crypto/3201254932/sizes/l/
By libraryman, http://www.flickr.com/photos/libraryman/78337046/sizes/l/
![Page 74: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/74.jpg)
By crypto, http://www.flickr.com/photos/crypto/3201254932/sizes/l/
By libraryman, http://www.flickr.com/photos/libraryman/78337046/sizes/l/
By quinnanya, http://www.flickr.com/photos/quinnanya/2806883231/
![Page 75: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/75.jpg)
Upcoming
• More algorithms.
• Optimization of existing implementations.
• More examples.
• Release 0.3
![Page 76: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/76.jpg)
Jumpstart your project with proven code.
January 8, 2008 by dreizehn28http://www.flickr.com/photos/1328/2176949559
![Page 77: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/77.jpg)
Discuss ideas and problems online.
November 16, 2005 [phil h]http://www.flickr.com/photos/hi-phi/64055296
![Page 78: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/78.jpg)
Become part of the community.
![Page 79: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/79.jpg)
[email protected]@lucene.apache.org
Interest in solving hard problems.Being part of lively community.
Engineering best practices.
Bug reports, patches, features.Documentation, code, examples.
Image by: Patrick McEvoy
![Page 80: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/80.jpg)
June 7/8th: Berlin Buzzwords 2010Store, Search, Scale
Lucene Sphinx
Hadoop
Business IntelligenceNoSQL
HBase
ScalabilityCloud Computing
Distributed computing
Solr
CouchDB
MongoDB
Isabel DrostJan Lehnardtnewthinking storeSimon Willnauer
![Page 81: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/81.jpg)
Mar., 10th 2010: Hadoop* Get Together in Berlin
– Bob Schulze (eCircle/ Munich): Database and Table Design Tips with HBase
– Dragan Milosevic (zanox/ Berlin): Product Search and Reporting powered by Hadoop
– Chris Male (JTeam/ Amsterdam): Spatial Search
http://upcoming.yahoo.com/event/5280014/
* UIMA, Hbase, Lucene, Solr, katta, Mahout, CouchDB, pig, Hive, Cassandra, Cascading, JAQL, ... talks welcome as well.
![Page 82: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/82.jpg)
[email protected]@lucene.apache.org
Interest in solving hard problems.Being part of lively community.
Engineering best practices.
Bug reports, patches, features.Documentation, code, examples.
Image by: Patrick McEvoy
![Page 83: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/83.jpg)
Why?
Why should I waste my time with doing stuff for free?
![Page 84: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/84.jpg)
Work on what you want...
when you want.
http://www.flickr.com/photos/abnelgonzalez/2058764760/
![Page 85: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/85.jpg)
Share and discuss with peers.Discuss ideas and problems online.
November 16, 2005 [phil h]http://www.flickr.com/photos/hi-phi/64055296
![Page 86: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/86.jpg)
Learn from the best.
http://www.flickr.com/photos/mg315/381296439/
![Page 87: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/87.jpg)
Soft Skills.
http://www.flickr.com/photos/ajawin/3587215356/
![Page 88: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/88.jpg)
Make work visible and re-usable.
http://www.flickr.com/photos/telstar/2916051841/
![Page 89: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/89.jpg)
Get started
Turn users into developers.
![Page 90: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/90.jpg)
GSoC
![Page 91: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/91.jpg)
ComDev
![Page 92: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/92.jpg)
![Page 93: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/93.jpg)
![Page 94: Open Source development for students.isabel-drost.de/hadoop/slides/christoph.pdf · Mar., 10th 2010: Hadoop* Get Together in Berlin – Bob Schulze (eCircle/ Munich): Database and](https://reader034.vdocuments.net/reader034/viewer/2022050523/5fa6ddde045af646ba0ceb76/html5/thumbnails/94.jpg)
[email protected]@lucene.apache.org
Interest in solving hard problems.Being part of lively community.
Engineering best practices.
Bug reports, patches, features.Documentation, code, examples.
Image by: Patrick McEvoy