hydra
TRANSCRIPT
![Page 1: Hydra](https://reader038.vdocuments.net/reader038/viewer/2022100602/5581e59ed8b42a75268b549c/html5/thumbnails/1.jpg)
HydraChris Birchall2014/2/17M3 Tech Talk #m3dev
![Page 2: Hydra](https://reader038.vdocuments.net/reader038/viewer/2022100602/5581e59ed8b42a75268b549c/html5/thumbnails/2.jpg)
What is it?
https://github.com/addthis/hydra
● Hadoop-style distrib processing framework, optimised for trees
● The Big Idea: data processing = building and navigating tree data structures
![Page 3: Hydra](https://reader038.vdocuments.net/reader038/viewer/2022100602/5581e59ed8b42a75268b549c/html5/thumbnails/3.jpg)
Components
● Spawn: Job control (+ UI)○ (think JobTracker, in Hadoop-speak)
● Minion: task runner○ (think TaskTracker)
● QueryMaster + QueryWorker● Meshy: Distrib filesystem
○ (think read-only HDFS)● Zookeeper, RabbitMQ
![Page 4: Hydra](https://reader038.vdocuments.net/reader038/viewer/2022100602/5581e59ed8b42a75268b549c/html5/thumbnails/4.jpg)
Getting started (OSX)
# Prerequisitesbrew install rabbitmq maven coreutils wget
# Check this works without a passphrasessh localhost
# Check that the GNU coreutils cmds # (grm, gcp, gln, gmv) are on your PATH
# Clone & buildgit clone https://github.com/addthis/hydra.gitcd hydramvn package
![Page 5: Hydra](https://reader038.vdocuments.net/reader038/viewer/2022100602/5581e59ed8b42a75268b549c/html5/thumbnails/5.jpg)
Getting started (2)
# Start local stackhydra-uber/bin/local-stack.sh starthydra-uber/bin/local-stack.sh start # yes, twice!hydra-uber/bin/local-stack.sh seed
# UI should now be runningopen http://localhost:5052
![Page 6: Hydra](https://reader038.vdocuments.net/reader038/viewer/2022100602/5581e59ed8b42a75268b549c/html5/thumbnails/6.jpg)
Hello world# Sample job definition file available athydra-uber/local/sample/self-gen-tree.json
# Click ‘Create’, copy-paste the job config, # save the job and click ‘Kick’ to run it.
# Click the ‘Q’ button to open the query UI # and see the resulting data.
![Page 7: Hydra](https://reader038.vdocuments.net/reader038/viewer/2022100602/5581e59ed8b42a75268b549c/html5/thumbnails/7.jpg)
Analysing text files# Tips:## “files” source is broken. Use “mesh2”.## Docs are out of date. Read the source code!
# Mesh filesystem root is here:hydra-local/streams/
# Here’s an example job config I used to parse some TSV-formatted Apache logshttps://gist.github.com/cb372/9046464
![Page 8: Hydra](https://reader038.vdocuments.net/reader038/viewer/2022100602/5581e59ed8b42a75268b549c/html5/thumbnails/8.jpg)
Conclusions
● If you have Small Data, use grep, awk, sort, uniq
● If you have Big Data, use Hadoop
● If you really like trees, use Hydra ;)