search at soundcloud

34
Search at SoundCloud Tomasz Elendt Don’t be Lost in Music

Upload: tomasz-elendt

Post on 11-Jan-2017

212 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Search at SoundCloud

Search at SoundCloud

Tomasz Elendt

Don’t be Lost in Music

Page 2: Search at SoundCloud

SOUNDCLOUD

Page 3: Search at SoundCloud

SOUNDCLOUD

Eng Manager Designer PM Analyst

Max Eileen Jörn Warren

JanetteTomaszChris DenisEngineer

Search Team

Engineer Engineer Analyst

Data Scientist

Özgür

?You!

Page 4: Search at SoundCloud

If it exists on SoundCloud, it can be found through Search.

Team Mission

Page 5: Search at SoundCloud

A bit of history

Page 6: Search at SoundCloud

● October 2008 - launch of soundcloud.com

● 2010 ??? - add search

● 2012 - new search architecture

SOUNDCLOUD

Page 8: Search at SoundCloud

https://www.thoughtworks.com/de/insights/blog/bff-soundcloud

Microservices

Page 9: Search at SoundCloud

SOUNDCLOUD

Read path

own1

own2

pub1

pub2

api-web

api-mobile

owned

publicapi-partners

public-api

dispatcher

Page 10: Search at SoundCloud

SOUNDCLOUD

Write path

bulk feeder

stream feeder

indexer

DB slave

microservices

query feeder

poke API

bulk API

own1

pub1

pub2

own2

Page 11: Search at SoundCloud

Scoring

SoundCloud activity graph● user uploaded track● user created playlist● user liked track● user liked playlist● user reposted track● user reposted playlist● user follows user● track is part of a playlist● no plays

Combine query similarity score with document’s static DiscoRank score

Page 12: Search at SoundCloud

Scoring

HDFSDiscovery

ranking job(s) HDFS

Elasticsearch

Scorer plugin

WebHDFS

serialized eventsevent rollups

update ready!

Page 14: Search at SoundCloud

Let’s talk about numbers

Page 15: Search at SoundCloud

SOUNDCLOUD

Traffic growth>3,000 QPS at peak

Page 16: Search at SoundCloud

SOUNDCLOUD

Catalog size

https://www.elastic.co/use-cases/soundcloud

Page 17: Search at SoundCloud

SOUNDCLOUD

Catalog size

https://www.elastic.co/use-cases/soundcloud

Page 18: Search at SoundCloud

SOUNDCLOUD

Catalog size

>500M documentsindexed (total)

Spread over 5 shards4 clusters

two for owned clients and two for public traffic

>70 nodes (total)2 shards and ~50GB per node

Page 19: Search at SoundCloud

Scaling& optimizations

Page 20: Search at SoundCloud

Vertical scalingScaling up, bigger nodes

Horizontal scalingScaling out, more nodes

Page 21: Search at SoundCloud

Vertical scalingScaling up, bigger nodes

Horizontal scalingScaling out, more nodes

● Before you allocate more shards (or merge them into bigger ones) check that there’s enough memory for OS cache.

● Don’t cross 32GB heap.

● More things to run == more things to fail● Makes maintenance longer/harder (e.g. longer replication)● You may mix both strategies

Page 22: Search at SoundCloud

Separate clusters story

Page 23: Search at SoundCloud

Premium Content{ "bool": { "should": [ { "nested": { "path": "policies", "filter": { "bool": { "must": [ { "term": { "geo_country_code": "<COUNTRY_CODE>" } }, { "term": { "monetization_model": "SUB_HIGH_TIER" } } ], "must_not": { "term": { "policy": "BLOCK" } } } } } }, { "bool": { "must": { "nested": { "path": "policies", "filter": { "bool": { "must": [ { "term": { "geo_country_code": "--" } }, { "term": { "monetization_model": "SUB_HIGH_TIER" } } ],

"must_not": { "term": { "policy": "BLOCK" } } } } } }, "must_not": { "nested": { "path": "policies", "filter": { "term": { "geo_country_code": "<COUNTRY_CODE>" } } } } } } ], "_cache": true, "_cache_key": "<COUNTRY_CODE>:SUB_HIGH_TIER" }}

It’s complicated!

Page 24: Search at SoundCloud

Memory constraints

Scorer model

Field data

Filter cache

Scorer model(in filesystem)

Field data

Filter cache

31GB

18GB

Page 25: Search at SoundCloud

Monitoringand Alerting

Page 26: Search at SoundCloud

Prometheus

Page 27: Search at SoundCloud

ALERT SearchHighResponseTime_Owned IF histogram_quantile(0.95, sum(rate(http_request_duration_seconds{ job = "search-dispatcher-owned" }[1m])) by (job, le)) > 0.5 FOR 5m LABELS { service = "search-dispatcher", severity = "critical", } ANNOTATIONS { summary = "Slow search {{$labels.job}}", description = \"95th percentile of response time of {{$labels.job}} is {{$value}}ms. We expect it to be below 500ms.", runbook = \"http://<redacted>/runbooks/search/#searchhighresponsetime", }

Alerting rules

Page 28: Search at SoundCloud

KPIs and A/B testing

Page 29: Search at SoundCloud

A/B testing

Page 30: Search at SoundCloud

Engagement Rate - % of searches that produce an engagement or a listen

Page 31: Search at SoundCloud

click-through rateengagement rate

Engagement rate

Page 32: Search at SoundCloud

Click-through rate

Volume of Search Usage

Clicks at Position One

Listening Time

Time to respond

Time to First Click

Queries Before First Click in Session

Other KPIs

Page 33: Search at SoundCloud

A/B tests

Sample questions:

● Does increase of minimum number of matching tokens

(minimum_should_match) increase engagement?

● How about boost of exact matches?

● Should we increase or decrease importance of DiscoRank?

Page 34: Search at SoundCloud

Thank you