hydra

8

Click here to load reader

Upload: chris-birchall

Post on 18-Jun-2015

802 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Hydra

HydraChris Birchall2014/2/17M3 Tech Talk #m3dev

Page 2: Hydra

What is it?

https://github.com/addthis/hydra

● Hadoop-style distrib processing framework, optimised for trees

● The Big Idea: data processing = building and navigating tree data structures

Page 3: Hydra

Components

● Spawn: Job control (+ UI)○ (think JobTracker, in Hadoop-speak)

● Minion: task runner○ (think TaskTracker)

● QueryMaster + QueryWorker● Meshy: Distrib filesystem

○ (think read-only HDFS)● Zookeeper, RabbitMQ

Page 4: Hydra

Getting started (OSX)

# Prerequisitesbrew install rabbitmq maven coreutils wget

# Check this works without a passphrasessh localhost

# Check that the GNU coreutils cmds # (grm, gcp, gln, gmv) are on your PATH

# Clone & buildgit clone https://github.com/addthis/hydra.gitcd hydramvn package

Page 5: Hydra

Getting started (2)

# Start local stackhydra-uber/bin/local-stack.sh starthydra-uber/bin/local-stack.sh start # yes, twice!hydra-uber/bin/local-stack.sh seed

# UI should now be runningopen http://localhost:5052

Page 6: Hydra

Hello world# Sample job definition file available athydra-uber/local/sample/self-gen-tree.json

# Click ‘Create’, copy-paste the job config, # save the job and click ‘Kick’ to run it.

# Click the ‘Q’ button to open the query UI # and see the resulting data.

Page 7: Hydra

Analysing text files# Tips:## “files” source is broken. Use “mesh2”.## Docs are out of date. Read the source code!

# Mesh filesystem root is here:hydra-local/streams/

# Here’s an example job config I used to parse some TSV-formatted Apache logshttps://gist.github.com/cb372/9046464

Page 8: Hydra

Conclusions

● If you have Small Data, use grep, awk, sort, uniq

● If you have Big Data, use Hadoop

● If you really like trees, use Hydra ;)