scalable data models with elasticsearch

35
Scalable Data Models with Elasticsearch Elasticsearch Meetup | Amsterdam | April 7, 2016 Maarten Roosendaal & Anne Veling

Upload: beyondtrees

Post on 15-Apr-2017

879 views

Category:

Software


3 download

TRANSCRIPT

Page 1: Scalable Data Models with Elasticsearch

Scalable Data Models with Elasticsearch

Elasticsearch Meetup | Amsterdam | April 7, 2016Maarten Roosendaal & Anne Veling

Page 2: Scalable Data Models with Elasticsearch

introduction• Anne Veling– Elasticsearch consultancy and custom

training– Performance and Stability

Troubleshooting– Software Architect, Team Lead

Page 3: Scalable Data Models with Elasticsearch

• Hierarchical data model, multiple levels

• High volume– searches– data changes

• Complex query requirements– Both Product and Offer fields in query– Facet on both levels

bol.com challenge

Page 4: Scalable Data Models with Elasticsearch

Products and Offers

faster indexing

faster searching

Page 5: Scalable Data Models with Elasticsearch
Page 6: Scalable Data Models with Elasticsearch

Test Data Creation• Node.js Script creating random data

– Product• Title: two random nouns from noun list• Category: pick one out 26 nouns• Half have no offer, half between 1-4

– Offer• Random price between 1-20• Seller: pick one out of 10k

• Stream in memory, flush out to disk in 3 flavors– Each flavor keeping its own bulk size of 100k– For 1M, 10M and 100M products

Page 7: Scalable Data Models with Elasticsearch

Document{

"seller": "seller1203","price": 7,"stock": 2,"deliveryCode": 1,"product": {

"id": "product95826","familyId": "family56744","title": "lunchroom representative","category": "crime"

}}

Page 8: Scalable Data Models with Elasticsearch

Nested

Page 9: Scalable Data Models with Elasticsearch

Nested{

"_id": "product95826","familyId": "family56744","title": "lunchroom representative","category": "crime","offers": [

{"seller": "seller1203","price": 7,"stock": 2,"deliveryCode": 1

}]

}

Page 10: Scalable Data Models with Elasticsearch

Parent/Child{

"_id": "product95826","familyId": "family56744","title": "lunchroom representative","category": "crime”

}

{"_parent": "product95826""seller": "seller1203","price": 7,"stock": 2,"deliveryCode": 1

}

Page 11: Scalable Data Models with Elasticsearch

• Zipped data files– 1M: 86Mb– 10M: 860Mb– 100M: 8.6Gg

Getting it there

Page 12: Scalable Data Models with Elasticsearch

Indexing?

Page 13: Scalable Data Models with Elasticsearch
Page 14: Scalable Data Models with Elasticsearch

Indexing• 1M product set, local naive– 80s Document– 41s Nested– 64s Parent/Child

• ES index bottleneck:– Your source system and latency

it can slurp it up faster than you can serve it

Page 15: Scalable Data Models with Elasticsearch

Let’s take a break

Page 16: Scalable Data Models with Elasticsearch

Use CasesUse Case A Use Case B Use Case C

Product Search

Word in Title Word in Title∃ DeliveryC = 0

Word in Title∃ Price < P

Order By Relevance Relevance (Lowest) PriceDisplay for

top N products

Product FieldsCheapest Offer fields

Product FieldsCorrect Cheapest Offer fields

Product FieldsCheapest Offer fields

Aggregate On Category Category Category∀ Offer SellerId ∀ Correct Offers

SellerId∀ Correct Offers SellerId

∀ Offer Price ∀ Correct Offers Price

∀ Correct Offers Price

∀ Offer DeliveryCode

∀ Correct Offers DeliveryCode

∀ Correct Offers DeliveryCode

• Product• Offer

Page 17: Scalable Data Models with Elasticsearch

Use CasesD: query B, roll up by family• Families (with products

with offers)– with

product.title:lunchroom– filter by

product.offer.deliveryCode:tomorrow

Page 18: Scalable Data Models with Elasticsearch

Searching for a lunchroom

How hard can it be?

Page 19: Scalable Data Models with Elasticsearch

Let’s searchPOST /boltest1m_doc/_search -> 3046{ "query": { "term": { "product.title": { "value": "lunchroom" } } }}

POST /boltest1m_nested/_search -> 2026{ "query": { "term": { "title": { "value": "lunchroom" } } }}

POST /boltest1m_parentchild/_search -> 2022{ "query": { "has_parent": { "parent_type": "product", "query": { "term": { "title": { "value": "lunchroom" } } } } }}

Page 20: Scalable Data Models with Elasticsearch
Page 21: Scalable Data Models with Elasticsearch

ElasticSearch docs (and Lucene docs)

Product with Doc

Nested

Parent/Child

no offer 1 1 (1) 11 offer 1 1 (2) 22 offers 2 1 (3) 3

Page 22: Scalable Data Models with Elasticsearch

Real Queries• Add Details, Sorting• Product Facets– Category

• Offer Facets– Seller ID– Price Buckets– Delivery Code

Compare the numbers…Explain the differences...

Page 23: Scalable Data Models with Elasticsearch

A: Doc

Page 24: Scalable Data Models with Elasticsearch

A: Nested

Page 25: Scalable Data Models with Elasticsearch

A: Parent/Child

Page 26: Scalable Data Models with Elasticsearch
Page 27: Scalable Data Models with Elasticsearch

Query Tips• Use aggregations– Cardinality– top_hits ♥ (with top_score)• Smart Grouping & Field Collapsing• Slooooow 😢

– inner_hits• Don’t forget post-filtering or result

page lookup

Page 28: Scalable Data Models with Elasticsearch

Ice Cream Bounty

for making top_hits aggregation fast

Page 29: Scalable Data Models with Elasticsearch

Testing

Page 30: Scalable Data Models with Elasticsearch

Results

a b c d0

20406080

100120140160180200

1m tun 30102015 32 GB new queries

docnestedparentchild

a b c d0

500

1000

1500

2000

2500

3000

3500

10m tun 30102015 32 GB new queries

docnestedparentchild

Page 31: Scalable Data Models with Elasticsearch

Conclusions• Parent/Child has limitations– Combining cross-level queries with

aggregations in one go• Doc not as fast as we’d expected– Because we needed top_hits

aggregation• Elasticsearch scales predictably

Page 32: Scalable Data Models with Elasticsearch

Conclusions• For us, nested was the best solution• What is yours?• What are you searching for?–What are the rows?–What are the facets about?

Page 33: Scalable Data Models with Elasticsearch

Lessons Learned• Testing the scalability of your data

model– Fast iterations early on– Valuable insight in indexing and search

requirements

• Data Modeling is hard– Do it early–Make it fun

Page 34: Scalable Data Models with Elasticsearch

Tech Lessons Learned• Don’t forget to tune the ES cluster– Configure memory ;)

• If bulk file last line has no \n, gets ignored!– count the differences

• 100k bulk files with .000 suffixes ought to be enough for everyone, right?

• Do not underestimate Sneakernet

Page 35: Scalable Data Models with Elasticsearch

Thank You

@anneveling [email protected]