@indeedeng: imhotep - large scale analytics and machine learning at indeed

227
go.indeed.com/IndeedEngTalks

Upload: indeedeng

Post on 17-Dec-2014

1.055 views

Category:

Technology


3 download

DESCRIPTION

Video available at: https://www.youtube.com/watch?v=z4JTjUp3NC0 To scale the building of decision trees on large amounts of Indeed job search data, we created a system called Imhotep. In addition to being a crucial tool for building these machine learning models, Imhotep has proven to be applicable to many different analytics problems. The core of Imhotep is a distributed system that manages the parallel execution of queries across a set of time-sharded inverted indices. This talk covers Imhotep’s primitive operations that allow us to build decision trees, drill into data, build graphs, and even execute sql-like queries in IQL (Imhotep Query Language). We will also discuss what makes Imhotep fast, highly available, and fault tolerant.

TRANSCRIPT

Page 2: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Imhotep Large Scale Analytics and Machine Learning at Indeed

Page 3: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Jeff PlaisanceEngineering Manager

Page 4: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

I help people get jobs.

Page 5: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Indeed is aSearch Engine for Jobs

Page 6: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed
Page 7: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Indeed is a data driven organization

Page 8: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Indeed is a data driven organization

Data driven organizations need great tools

Page 9: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

What does Imhotep allow you to do?

● Decision Tree Building● Analytics

Page 10: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

What does Imhotep allow you to do?

● Decision Tree Building● Analytics

Page 11: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Indeed’s Analytics Philosophy

Analytics systems should be:1. Interactive2. Not Sampled3. Not Approximate

Page 12: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Imhotep answers questions

What was the weekly average query time in the last quarter from people doing the query “software”?

Page 13: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Imhotep answers questions

What percent of jobsearch results pages are for page 2 and beyond?

Page 14: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Imhotep answers questions

What are the 5 most common queries in each country?

Page 15: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Total Job Searches From 2014-03-09 to 2014-03-23

?

Page 16: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed
Page 17: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Query

Page 18: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Query Location

Page 19: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Query Location

Impression

Page 20: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Document

query: “indeed software engineer”location: “austin”impressions: 10clicks: 2time: 2014-03-17T12:00:00

Page 21: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Shard

0 21 3 4

5 76 8 9

10 1211 13 14

Page 22: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Shard

0 21 3 4

5 76 8 9

10 1211 13 14

Page 23: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Server2014/03/02 2014/03/09 2014/03/11

2014/03/12 2014/03/22 2014/03/24

Documents Documents Documents

Documents Documents Documents

Page 24: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Server2014/03/02 2014/03/09 2014/03/11

2014/03/12 2014/03/22 2014/03/24

Documents Documents Documents

Documents Documents Documents

Page 25: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Cluster

2014-03-02

Server A

2014-03-03

Server B

2014-03-04

Server C

Page 26: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Cluster

2014-03-02 2014-03-03

Server B

2014-03-04

Server CServer A

Page 27: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Cluster

2014-03-02 2014-03-03

Server B

2014-03-04

Server C

Client

Session

Server A

Page 28: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Total Job Searches From 2014-03-09 to 2014-03-23

secret

Page 29: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Total Job Searches From 2014-03-09 to 2014-03-23 Per Day

2014-03-09 2014-03-16 2014-03-23

Page 30: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Metrics

● 64 bit integers● Exactly one value per doc● Random access by doc id

Page 31: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Metrics

● Time● Clicks● Impressions● Revenue● … or anything else that is a number

Page 32: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Groups

● Documents are placed into numbered groups

● Every document starts in group 1● Group 0 means “filtered out”

Page 33: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Groups

● Groups are stateful and scoped to a session● Regroup operations update group for each

doc in shard

Page 34: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

width

Metric Regroup

● Iterate over doc_id->metric lookup● Set group to

(value - start)/ bucket_width● Useful for making graphs (buckets on x-axis)

1 2 3 4 5

start end

Page 35: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Get Group Stats

● For each group, sums a metric for all docs in that group

Page 36: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Bucket By Day

1. Regroup on time metric2. Get Group Stats for count metric (always 1)

Page 37: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Total Job Searches From 2014-03-09 to 2014-03-23 Per Day

2014-03-09 2014-03-16 2014-03-23

Page 38: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Total and US Job Searches From 2014-03-09 to 2014-03-23 Per Day

2014-03-09 2014-03-16 2014-03-23

Page 39: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Inverted Indexes

Page 40: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Inverted Index

● Like index in the back of a book● words = terms, page numbers = doc ids● Term list is sorted● Doc list for each term is sorted

Page 41: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

doc id query country impressions clicks

0 software Canada 10 1

1 blank Canada 10 0

2 sales US 5 0

3 software US 8 1

4 blank US 10 1

Standard Index

Page 42: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Constructing an Inverted Indexquery country impression clicks

doc id blank sales software Canada US 5 8 10 0 1

0 ✔ ✔ ✔ ✔

1 ✔ ✔ ✔ ✔

2 ✔ ✔ ✔ ✔

3 ✔ ✔ ✔ ✔

4 ✔ ✔ ✔ ✔

Page 43: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Constructing an Inverted Indexfield term 0 1 2 3 4

query blank ✔ ✔

sales ✔

software ✔ ✔

country Canada ✔ ✔

US ✔ ✔ ✔

impressions 5 ✔

8 ✔

10 ✔ ✔ ✔

clicks 0 ✔ ✔

1 ✔ ✔ ✔

Page 44: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Inverted Indexfield term doc list

query blank 1, 4

sales 2

software 0, 3

country Canada 0, 1

US 2, 3, 4

impressions 5 2

8 3

10 0, 1, 4

clicks 0 1, 2

1 0, 3, 4

Page 45: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Inverted Indexes

Allow you to:● Quickly find all documents containing

a term● Intersect several terms to perform

boolean queries

Page 46: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Lucene

● Open source inverted index implementation● Reasonably fast● Widely used, well tested

Page 47: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Global and US Job Searches From 2014-03-09 to 2014-03-23 Per Day

2014-03-09 2014-03-16 2014-03-23

Page 48: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

field term doc list

query blank 1, 4

sales 2

software 0, 3

country Canada 0, 1

US 2, 3, 4

impressions 5 2

8 3

10 0, 1, 4

clicks 0 1, 2

1 0, 3, 4

Searches in the US only

Page 49: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

field term doc list

query blank 1, 4

sales 2

software 0, 3

country Canada 0, 1

US 2, 3, 4

impressions 5 2

8 3

10 0, 1, 4

clicks 0 1, 2

1 0, 3, 4

Searches in the US only

Page 50: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Searches in the US onlyfield term doc list

country Canada 0, 1

US 2, 3, 4

Page 51: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Searches in the US only

Query Regroup● Regroup all docs which do not match a

boolean query to group zero

field term doc list

country Canada 0, 1

US 2, 3, 4

Page 52: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Term Regroup

Splits docs in a group into one of two new groups based on presence/absence of a term

country:US everything else

1

32

Page 53: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Multiterm Regroup

Generalization of term regroup to N terms and N+1 new groups

country:US everything elsecountry:CA country:FR

52 3 4

1

Page 54: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Total and US Job Searches From 2014-03-09 to 2014-03-23 Per Day

2014-03-09 2014-03-16 2014-03-23

Page 55: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Inverted Index Compression

Size of Organic Dataset for last 5 months● Original: 102 TB● Inverted: 51 TB

Page 56: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Inverted Index Optimizations

● Compressed data structures○ Better use of RAM and processor cache○ Better use of memory bandwidth○ Increased CPU usage and time

● Micro optimizations matter!

Page 57: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Delta / Varint Encoding

● Doc id lists are sorted● Delta between a doc id and the previous doc

id is sufficient● Deltas are usually small integers● Use less bits for small integers and more bits

for large integers

Page 58: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Delta Encoding

field term doc list

query nursing 34, 86, 247, 301, 674, 714

Page 59: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Delta Encoding

field term doc list

query nursing 34, 86, 247, 301, 674, 714

34, 52, 161, 54, 373, 40

Page 60: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Small Integer Compression

● Golomb/Rice● Varint● Binary Packing● PForDelta

Page 61: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Small Integer Compression

● Golomb/Rice● Varint● Bit Packing● PForDelta

Page 62: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Varint Encoding

9838

Page 63: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Varint Encoding

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0

9838

Page 64: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Varint Encoding

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0

9838

Page 65: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Varint Encoding

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0

9838

Page 66: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Varint Encoding

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0

9838

Page 67: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Varint Encoding

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0

9838

? 1 1 0 1 1 1 0

Page 68: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Varint Encoding

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

9838

? 1 1 0 1 1 1 0

0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0

Page 69: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Varint Encoding

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0

9838

? 1 1 0 1 1 1 0

Page 70: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Varint Encoding

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0

9838

1 1 1 0 1 1 1 0

Page 71: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Varint Encoding

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0

9838

1 1 1 0 1 1 1 0

Page 72: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Varint Encoding

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0

9838

1 1 1 0 1 1 1 0

? 1 0 0 1 1 0 0

Page 73: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Varint Encoding

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0

9838

1 1 1 0 1 1 1 0

? 1 0 0 1 1 0 0

Page 74: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Varint Encoding

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0

9838

1 1 1 0 1 1 1 0

? 1 0 0 1 1 0 0

Page 75: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Varint Encoding

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0

9838

1 1 1 0 1 1 1 0

0 1 0 0 1 1 0 0

Page 76: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Varint Encoding

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0

9838

1 1 1 0 1 1 1 0

0 1 0 0 1 1 0 0

Page 77: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Inverted Index Compression

Size of Organic Dataset for last 5 months● Original: 102 TB● Inverted: 51 TB● Delta / Varint: 17 TB

Page 78: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Flamdex

● Two files per field (terms/docs)● Can add fields without rebuilding index● Faster varint decoding● No TF or positions (or wasted time decoding

them)

Page 79: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Varints

Pros:● Compression● Can fit more of index in RAM● Higher information throughput per byte read

from disk

Page 80: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Varints

Cons:● Decodes one byte at a time● Lots of branch mispredictions● Not fast to decode

Page 81: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Vectorized Varint Decoding

01001010 11001000 01110001 01001110

10011011 01101010 10110101 00010111

01110110 10001101 10110011 11000001

Page 82: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Vectorized Varint Decoding

01001010 11001000 01110001 01001110

10011011 01101010 10110101 00010111

01110110 10001101 10110011 11000001

pmovmskb: Extract top bit of each byte

Page 83: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Vectorized Varint Decoding

01001010 11001000 01110001 01001110

10011011 01101010 10110101 00010111

01110110 10001101 10110011 11000001

pmovmskb: Extract top bit of each byte

010010100111

Page 84: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Vectorized Varint Decoding

01001010 11001000 01110001 01001110

10011011 01101010 10110101 00010111

01110110 10001101 10110011 11000001

pmovmskb: Extract top bit of each byte

010010100111Lookup in 4096 entry lookup table

Page 85: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

010010100111

Pattern of leading bits determines:● how many varints to decode● sizes and offsets of varints● length of longest varint in bytes● number of bytes to consume

Page 86: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

010010100111

Pattern of leading bits determines:● how many varints to decode● sizes and offsets of varints● length of longest varint in bytes● number of bytes to consume

Page 87: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

010010100111

Pattern of leading bits determines:● how many varints to decode● sizes and offsets of varints● length of longest varint in bytes● number of bytes to consume

Page 88: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

010010100111

Pattern of leading bits determines:● how many varints to decode● sizes and offsets of varints● length of longest varint in bytes● number of bytes to consume

Page 89: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

010010100111

Pattern of leading bits determines:● how many varints to decode● sizes and offsets of varints● length of longest varint in bytes● number of bytes to consume

Page 90: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

010010100111

Pattern of leading bits determines:● how many varints to decode● sizes and offsets of varints● length of longest varint in bytes● number of bytes to consume

Page 91: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

010010100111

Pattern of leading bits determines:● how many varints to decode● sizes and offsets of varints● length of longest varint in bytes● number of bytes to consume

Page 92: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

010010100111

Pattern of leading bits determines:● how many varints to decode● sizes and offsets of varints● length of longest varint in bytes● number of bytes to consume

Page 93: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

010010100111

Pattern of leading bits determines:● how many varints to decode● sizes and offsets of varints● length of longest varint in bytes● number of bytes to consume

Page 94: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

010010100111

Decoding options for:● up to twelve 1 byte varints● six 1-2 byte varints● four 1-3 byte varints● two 1-5 byte varints

Page 95: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Vectorized Varint Decoding

● Decode six 1-2 byte varints in parallel

● Need to pad out all 1 byte varints to 2 bytes

pshufb: Intel SSSE3 instruction to shuffle bytes

Page 96: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Vectorized Varint Decoding

01001010 11001000 01110001 01001110

10011011 01101010 10110101 00010111

01110110 10001101 10110011 11000001

Decode 6 varints from 9 bytes

Page 97: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Vectorized Varint Decoding

01001010 11001000 01110001 01001110

10011011 01101010 10110101 00010111

01110110 10001101 10110011 11000001

Pad out 1 byte ints to 2 bytes

Page 98: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Vectorized Varint Decoding

01001010 00000000 11001000 01110001

01001110 00000000 10011011 01101010

10110101 00010111 01110110 00000000

Pad out 1 byte ints to 2 bytes

Page 99: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Vectorized Varint Decoding

01001010 00000000 11001000 01110001

01001110 00000000 10011011 01101010

10110101 00010111 01110110 00000000

Reverse bytes in 2 byte varints

Page 100: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Vectorized Varint Decoding

00000000 01001010 01110001 11001000

00000000 01001110 01101010 10011011

00010111 10110101 00000000 01110110

Reverse bytes in 2 byte varints

Page 101: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Vectorized Varint Decoding

00000000 01001010 01110001 11001000

00000000 01001110 01101010 10011011

00010111 10110101 00000000 01110110

Mask out leading purple 1’s

Page 102: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Vectorized Varint Decoding

00000000 01001010 01110001 01001000

00000000 01001110 01101010 00011011

00010111 00110101 00000000 01110110

Mask out leading purple 1’s

Page 103: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Vectorized Varint Decoding

00000000 01001010 01110001 01001000

00000000 01001110 01101010 00011011

00010111 00110101 00000000 01110110

Shift top bytes of each varint 1 bit right (mask/shift/or)

Page 104: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Vectorized Varint Decoding

00000000 01001010 00111000 11001000

00000000 01001110 00110101 00011011

00001011 10110101 00000000 01110110

Shift top bytes of each varint 1 bit right (mask/shift/or)

Page 105: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Vectorized Varint Decoding

00000000 01001010 00111000 11001000

00000000 01001110 00110101 00011011

00001011 10110101 00000000 01110110

● ~10 instructions● No branches● Less than 2 instructions per varint

Page 106: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Vectorized Varint Decoding

00000000 01001010 00111000 11001000

00000000 01001110 00110101 00011011

00001011 10110101 00000000 01110110

● Imhotep spends ~40% of its CPU time decoding varints

Page 107: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Vectorized Varint Decoding

00000000 01001010 00111000 11001000

00000000 01001110 00110101 00011011

00001011 10110101 00000000 01110110

● Imhotep spends ~40% of its CPU time decoding varints

● Vectorized decoder ~3-5x faster○ Decompresses at 1.5 GB per second○ ~2x overall system performance

Page 108: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Top 5 Locations

Page 109: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Term Stats

atlanta 49

austin 14

boston 25

chicago 28

dallas 13

houston 36

new york 68

san francisco 54

Page 110: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Term Stats Iterator

● For each term in a field, sum metrics across all docs containing that term

Page 111: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Term Stats Iterator

● For each term in a field, sum metrics across all docs containing that term

● How do we compute this across many machines?

Page 112: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

dallas 5

boston 12

austin 3

atlanta 16

dallas 8

chicago 19

austin 4

atlanta 12

chicago 9

boston 13

austin 7

atlanta 21

Page 113: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

dallas 5

boston 12

austin 3

atlanta 16

dallas 8

chicago 19

austin 4

atlanta 12

chicago 9

boston 13

austin 7

atlanta 21

Page 114: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

dallas 5

boston 12

austin 3

atlanta 16

dallas 8

chicago 19

austin 4

atlanta 12

chicago 9

boston 13

austin 7

atlanta 21

Page 115: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

dallas 5

boston 12

austin 3

atlanta 16

dallas 8

chicago 19

austin 4

atlanta 12

chicago 9

boston 13

austin 7

atlanta 21

Page 116: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

dallas 5

boston 12

austin 3

atlanta 16

chicago 9

boston 13

austin 7

atlanta 21

atlanta 49

dallas 5

boston 12

austin 3

atlanta 16

dallas 8

chicago 19

austin 4

atlanta 12

chicago 9

boston 13

austin 7

atlanta 21

Page 117: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

atlanta 49

dallas 5

boston 12

austin 3

atlanta 16

chicago 9

boston 13

austin 7

atlanta 21

dallas 5

boston 12

austin 3

atlanta 16

dallas 8

chicago 19

austin 4

atlanta 12

chicago 9

boston 13

austin 7

atlanta 21

Page 118: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

dallas 5

boston 12

austin 3

dallas 8

chicago 19

austin 4

atlanta 12

chicago 9

boston 13

austin 7

atlanta 21

atlanta 49atlanta 49

Page 119: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

dallas 5

boston 12

austin 3

dallas 8

chicago 19

austin 4

chicago 9

boston 13

austin 7

atlanta 21

atlanta 49atlanta 49

Page 120: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

chicago 9

boston 13

austin 7

atlanta 49atlanta 49

dallas 5

boston 12

austin 3

dallas 8

chicago 19

austin 4

Page 121: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

austin 14atlanta 49

chicago 9

boston 13

austin 7

dallas 5

boston 12

austin 3

dallas 8

chicago 19

austin 4

Page 122: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

austin 14

atlanta 49

chicago 9

boston 13

austin 7

dallas 5

boston 12

austin 3

dallas 8

chicago 19

austin 4

Page 123: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

dallas 5

boston 12

austin 14

atlanta 49

chicago 9

boston 13

austin 7

dallas 8

chicago 19

austin 4

Page 124: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

dallas 8

chicago 19

dallas 5

boston 12

austin 14

atlanta 49

chicago 9

boston 13

austin 7

Page 125: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

chicago 9

boston 13

dallas 8

chicago 19

dallas 5

boston 12

austin 14

atlanta 49

Page 126: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

chicago 9

boston 13dallas 8

chicago 19

dallas 5

boston 12

boston 25austin 14

atlanta 49

Page 127: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

boston 25

austin 14

atlanta 49

chicago 9

boston 13

dallas 8

chicago 19

dallas 5

boston 12

Page 128: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

dallas 5

boston 25

austin 14

atlanta 49

chicago 9

boston 13

dallas 8

chicago 19

Page 129: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

chicago 9dallas 5

boston 25

austin 14

atlanta 49

dallas 8

chicago 19

Page 130: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

chicago 9dallas 5

chicago 28boston 25

austin 14

atlanta 49

dallas 8

chicago 19

Page 131: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

chicago 28

boston 25

austin 14

atlanta 49

chicago 9dallas 5

dallas 8

chicago 19

Page 132: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

dallas 8

chicago 28

boston 25

austin 14

atlanta 49

chicago 9dallas 5

Page 133: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

dallas 8

chicago 28

boston 25

austin 14

atlanta 49

dallas 5

Page 134: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

dallas 8

dallas 13chicago 28

boston 25

austin 14

atlanta 49

dallas 5

Page 135: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

dallas 5 dallas 8

dallas 13

chicago 28

boston 25

austin 14

atlanta 49

Page 136: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

dallas 8

dallas 13

chicago 28

boston 25

austin 14

atlanta 49

Page 137: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

dallas 13

chicago 28

boston 25

austin 14

atlanta 49

Page 138: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Term Stats 1-6

TS 1 TS 2 TS 3 TS 4 TS 5 TS 6

Page 139: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

TS 1-6 TS 7-12 TS 13-18

Page 140: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

TS 1-6 TS 7-12 TS 13-18

Term Stats 1-18

Page 141: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Amdahl’s Law

● The speedup of a program using multiple processors is limited by the time needed for the sequential fraction of the program

Page 142: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Amdahl’s Law

● Sequential part of FTGS is last step in merge

● Can we distribute some part of the final merge?

Page 143: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Hash Partition + Interleave

● Send all stats for each unique term to the same thread based on a hash of the term

● Interleave merged terms

Page 144: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

TS 1-6 TS 7-12 TS 13-18

Term Stats 1-18

Page 145: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Shard Distribution

Page 146: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

dallas 5

boston 12

austin 3

atlanta 16

dallas 8

chicago 19

austin 4

atlanta 12

chicago 9

boston 13

austin 7

atlanta 21

Page 147: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

dallas 5

boston 12

austin 3

atlanta 16

dallas 8

chicago 19

austin 4

atlanta 12

chicago 9

boston 13

austin 7

atlanta 21

Page 148: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

dallas 5

boston 12

austin 3

atlanta 16

dallas 8

chicago 19

austin 4

atlanta 12

chicago 9

boston 13

austin 7

atlanta 21

Page 149: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

dallas 5

boston 12

austin 3

atlanta 16

dallas 8

chicago 19

austin 4

atlanta 12

chicago 9

boston 13

austin 7

atlanta 21

Page 150: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

dallas 5

boston 12

austin 3

atlanta 16

dallas 8

chicago 19

austin 4

atlanta 12

chicago 9

boston 13

austin 7

atlanta 21

Page 151: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

dallas 5boston 12austin 3

atlanta 16

dallas 8chicago 19

austin 4

atlanta 12

chicago 9

boston 13austin 7

atlanta 21

Page 152: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

dallas 5

boston 12

atlanta 16

dallas 8

atlanta 12

boston 13

atlanta 21

Page 153: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

dallas 5

boston 12

atlanta 16

dallas 8

atlanta 12

boston 13

atlanta 21

Page 154: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

dallas 5

boston 12

atlanta 16dallas 8

atlanta 12boston 13

atlanta 21

atlanta 49

Page 155: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

dallas 5

boston 12 dallas 8 boston 13

boston 25atlanta 49

Page 156: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

dallas 5 dallas 8

dallas 13boston 25

atlanta 49

Page 157: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

dallas 13boston 25

atlanta 49

Page 158: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

dallas 13

boston 25

atlanta 49

chicago 28

austin 14

Page 159: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

dallas 13

boston 25

atlanta 49

chicago 28

austin 14

Page 160: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

dallas 13

boston 25

atlanta 49chicago 28

austin 14

Page 161: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

atlanta 49

dallas 13

boston 25

atlanta 49chicago 28

austin 14

Page 162: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

atlanta 49

dallas 13

boston 25

atlanta 49

chicago 28

austin 14

Page 163: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

dallas 13

boston 25

atlanta 49

chicago 28

austin 14

Page 164: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

austin 14atlanta 49

dallas 13

boston 25

chicago 28

austin 14

Page 165: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

austin 14

atlanta 49

dallas 13

boston 25

chicago 28

austin 14

Page 166: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

chicago 28

dallas 13

boston 25

austin 14

atlanta 49

Page 167: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

boston 25austin 14

atlanta 49

chicago 28

dallas 13

boston 25

Page 168: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

boston 25

austin 14

atlanta 49

chicago 28

dallas 13

boston 25

Page 169: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

dallas 13

boston 25

austin 14

atlanta 49

chicago 28

Page 170: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

chicago 28boston 25

austin 14

atlanta 49

dallas 13 chicago 28

Page 171: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

chicago 28

boston 25

austin 14

atlanta 49

dallas 13 chicago 28

Page 172: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

chicago 28

boston 25

austin 14

atlanta 49

dallas 13

Page 173: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

dallas 13

dallas 13chicago 28

boston 25

austin 14

atlanta 49

Page 174: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

dallas 13

dallas 13chicago 28

boston 25

austin 14

atlanta 49

Page 175: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

dallas 13

chicago 28

boston 25

austin 14

atlanta 49

Page 176: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Shard Distribution

● Lots of datasets for different event types● Each dataset is split into one shard per

(hour/day)● Each shard has 2 replicas for fault tolerance● How do we assign shards to machines?

Page 177: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Shard Distribution Considerations

● Space● Load● Hot Spots● Adding/Removing machines

Page 178: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Homogeneous vs. Heterogeneous Systems

● Must decide how or if you will handle heterogeneous hardware

● Cannot balance for both space and load on heterogeneous hardware

Page 179: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

1 TB

3 TB

Homogeneous vs. Heterogeneous

Page 180: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Homogeneous vs. Heterogeneous

12 shards50% capacity used

4 shards50% capacity used

Page 181: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Homogeneous vs. Heterogeneous

12 shards50% capacity used

4 shards50% capacity used

read hotspot

Page 182: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Homogeneous vs. Heterogeneous

8 shards33% capacity used

8 shards100% capacity used

wasted space

Page 183: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Hot Spots

When accessing any subset of a dataset, evenly spread the load across CPUs, drives, network cards

Page 184: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Hot Spots

When accessing any subset of a dataset, evenly spread the load across CPUs, drives, network cards

This is hard

Page 185: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Hot Spots

Maybe random is good enough?

Page 186: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Hot Spots

Maybe random is good enough?

On average about 10% more data read from the most loaded machine than the least

Page 187: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Two Choice Randomized Load Balancing

● 2 replicas of each shard to choose from● Greedily choose the machine that currently

has the least load from this client

Page 188: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Two Choice Randomized Load Balancing

● 2 replicas of each shard to choose from● Greedily choose the machine that currently

has the least load from this client● On average about 1% more data read from

the most loaded machine than the least

Page 189: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Rendezvous Hashing

● Assignment of a shard to machines determined only by the machines that exist in the cluster

● Hash all pairs of shard ID and machine ID and pick the largest two

Page 190: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Rendezvous Hashing

Shard ID: organic.2014-03-02T06:00:00

H(Shard ID + m1) = 0.592624H(Shard ID + m2) = 0.294647H(Shard ID + m3) = 0.736681H(Shard ID + m4) = 0.647578H(Shard ID + m5) = 0.835598

Page 191: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Rendezvous Hashing

0

1m5

m3m4

m1

m2

Page 192: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Rendezvous Hashing

0

1m5

m3m4

m1

m2

Page 193: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Rendezvous Hashing

0

1m5

m3m4

m1

m2

Page 194: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Rendezvous Hashing

● No coordination required - deterministic algorithm used to determine assignment

● No centralized storage for shard to machine assignment

Page 195: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Rendezvous Hashing

Page 196: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Rendezvous Hashing

Page 197: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Rendezvous Hashing

Page 198: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Rendezvous Hashing

Page 199: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Rendezvous Hashing

Page 200: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Rendezvous Hashing

Page 201: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Rendezvous Hashing

Page 202: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Rendezvous Hashing

Page 203: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Rendezvous Hashing

Page 204: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Rendezvous Hashing

Page 205: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Expected max hash for a shard is

Rendezvous Hashing

Page 206: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Expected max hash for a shard is

Probability that new machine will get shard

Rendezvous Hashing

Page 207: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Imhotep answers questions

What was the weekly average query time in the last quarter from people doing the query “software”?

Page 208: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

1. Query Regroup on query:software2. Metric Regroup on time, width 7 days3. Get Group Stats on query time and count,

divide after summing

Page 209: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Ramses

Page 210: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Imhotep answers questions

What percent of jobsearch results pages are for page 2 and beyond?

Page 211: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

1. Get Group Stats on count2. Query Regroup on “-page:1”3. Get Group Stats on count4. Divide -page:1 count by total count

Page 212: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Ramses

Page 213: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Imhotep answers questions

What are the 5 most common queries in each country?

Page 214: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

1. Multiterm Regroup on all values of country2. Term Group Stats Iteration on query

Page 215: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

IQL

select count()

from jobsearch

‘2014-01-01’

‘2014-03-26’

group by country, query[5]

Page 216: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

IQL

select count()

from jobsearch

‘2014-01-01’

‘2014-03-26’

group by country, query[5]

Metrics

Page 217: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

select count()

from jobsearch

‘2014-01-01’

‘2014-03-26’

group by country, query[5]

IQL

Dataset

Page 218: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

select count()

from jobsearch

‘2014-01-01’

‘2014-03-26’

group by country, query[5]

IQL

Regroup

Page 219: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

select count()

from jobsearch

‘2014-01-01’

‘2014-03-26’

group by country, query[5]

IQL

Term Group Stats

Page 220: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Imhotep

Large Scale Analytics and Machine Learning

Page 221: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Imhotep

Large Scale Analytics and Machine Learning

● Varint Decoding: High Performance Vector Instructions

● Stream Merging: Hash Partition + Interleave

● Shard Distribution: Rendezvous Hashing

Page 222: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

We’re Open Sourcing Imhotep

Page 223: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

How You Can Use Imhotep

Data Ingestion● TSV Uploader● HadoopData Access● Imhotep Primitives● IQL

Page 224: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Next @IndeedEng TalkLarge Scale Interactive Analytics

with Imhotep

Tom Bergman, Product ManagerZak Cocos, Manager of Marketing Sciences

April 30, 2014

http://engineering.indeed.com/talks

Page 225: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Q&A

Page 226: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

More Questions?David James

Page 227: @IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Next @IndeedEng TalkLarge Scale Interactive Analytics

with Imhotep

Tom Bergman, Product ManagerZak Cocos, Manager of Marketing Sciences

April 30, 2014

http://engineering.indeed.com/talks