bigfoot: big data for every organization
DESCRIPTION
Everybody wants to do big data analytics these days: storage is cheapand data is plentiful; best of all, software in the Hadoop ecosystem is free both as in speech and as in beer. If you are not Facebook or Amazon, however, you are not likely to put your precious data in the systems of cloud providers you may not trust; on the other hand, developing your own small or medium cluster can be prohibitive, since it requires a lot of effort and specialization to be deployed, tuned and maintained. BigFoot aims to simplify the data scientist's life, making the existing big data software easier to deploy and tune, so that data scientists can focus on their job: getting insight from data. BigFoot contributes to OpenStack: we made it possible to deploy virtualized Spark clusters, enabling analytics-as-a-service using fast in-memory computation. HFSP, our scheduler for Hadoop Mapreduce, gives priority to smaller jobs, so that large batch jobs do not harm user productivity by slowing down quicker data exploration jobs. Interestingly, HFSP achieves this without penalizing large jobs. We also contribute to the Apache Pig high-level analytics language: we propose patches that strongly enhance performance when computing aggregations on multi-dimensional data.TRANSCRIPT
.
......BigFoot: Big Data For Every Organization
Matteo Dell’Amico
Open World Forum 2014, Paris
About BigFoot
About BigFoot Goals
BigFoot Goals.Big Data For Every Organization..
......
Automatic & self-tuned deployment for private clouds
Optimization on all layers
Scalablemachine learning (time-series analysis, forecasting,clustering…)Optimizations for big data frameworksInteractive queries on raw data
Contribute to the Free Software community
About BigFoot The BigFoot Architecture
My Presentation
.Scheduling..
......
HFSP: a new Hadoop scheduler
Schedsim: a playground to simulate new schedulers
.OpenStack..
......
Apache Spark on demand
Work in progress: VM placement optimizations
Scheduling in Hadoop
Scheduling in Hadoop Size-Based Scheduling
“Fair” Sharing vs. Size-Based
100usage (%)
cluster
50
10 15 37.5 42.5 50
time(s)
100usage (%)
cluster
10 5020 30
50
time(s)
job 1
job 2
job 3
job 1 job 3job 2 job 1
Scheduling in Hadoop Size-Based Scheduling
“Fair” Sharing vs. Size-Based
100usage (%)
cluster
50
10 15 37.5 42.5 50
time(s)
100usage (%)
cluster
10 5020 30
50
time(s)
job 1
job 2
job 3
job 1 job 3job 2 job 1
Scheduling in Hadoop HFSP
HFSP: Size-Based Scheduling For Hadoop
.
......
Consistently better than Fair Scheduler (and others…)
The more the system is loaded, the more differenceWe estimate job sizes: it works!
Download from https://github.com/bigfootproject/hfsp
Scheduling in Hadoop PSBS
PSBS – Practical Size-Based Scheduler
Existing Schedulers PSBS: Our proposal.
......
Plotting scheduler response time
blue: better than traditional “fair scheduler”; red: worse
Paper: http://arxiv.org/abs/1410.6122
Simulator: https://github.com/bigfootproject/schedsim
OpenStack
OpenStack Sahara
OpenStack Sahara
.Hadoop On-Demand..
......
Choose number and size of machines
Choose Hadoop version
Voila, a cluster in your datacenter!
.Analytics As-A Service..
......
Compile your Jar
Choose number and size of machines, etc., as before
A cluster appears, does your analytics, and vanishes
OpenStack Sahara
Spark On Sahara
.Spark Is Cool..
......
A project started by the Berkeley AMP Lab
Fast: in-memory computing
Easy: concise code in Scala or Python
.What We Did..
......We made Spark available on Sahara since May
OpenStack Scheduling
Work In Progress
.OpenStack Scheduler..
......
Places virtual machines one at a time
Allows hand-defined filters
Tries to place VMs on least loaded hosts
.What WeWant To Do..
......
Do the placement of a cluster!
VMs that talk a lot to each other: place them closePlace them also close to data!Not too many: we don’t want to overload drives
Parting Words
Parting Words Conclusion
Thank You!
.
......
These slides:http://bit.ly/bigfoot_owf14
.
......
Web: http://bigfootproject.eu
Twitter: @bigfoot_project
Github: http://github.com/bigfootproject/
Bitbucket:bitbucket.org/bigfootproject/