search at soundcloud

Search at SoundCloud

Tomasz Elendt

Don’t be Lost in Music

SOUNDCLOUD

SOUNDCLOUD

Eng Manager Designer PM Analyst

Max Eileen Jörn Warren

JanetteTomaszChris DenisEngineer

Search Team

Engineer Engineer Analyst

Data Scientist

Özgür

?You!

If it exists on SoundCloud, it can be found through Search.

Team Mission

A bit of history

● October 2008 - launch of soundcloud.com

● 2010 ??? - add search

● 2012 - new search architecture

SOUNDCLOUD

SOUNDCLOUD

New Search Architecturehttps://youtu.be/qI584upmYTY

https://developers.soundcloud.com/blog/architecture-behind-our-new-search-and-explore-experience

https://youtu.be/qI584upmYTY

https://youtu.be/qI584upmYTY




https://www.thoughtworks.com/de/insights/blog/bff-soundcloud

Microservices



SOUNDCLOUD

Read path

own1

own2

pub1

pub2

api-web

api-mobile

owned

publicapi-partners

public-api

dispatcher

SOUNDCLOUD

Write path

bulk feeder

stream feeder

indexer

DB slave

microservices

query feeder

poke API

bulk API

own1

pub1

pub2

own2

Scoring

SoundCloud activity graph● user uploaded track● user created playlist● user liked track● user liked playlist● user reposted track● user reposted playlist● user follows user● track is part of a playlist● no plays

Combine query similarity score with document’s static DiscoRank score

Scoring

HDFSDiscovery

ranking job(s) HDFS

Elasticsearch

Scorer plugin

WebHDFS

serialized eventsevent rollups

update ready!

DiscoRank

http://www.slideshare.net/utstikkar/discorank-optimizing-discoverability-on-soundcloudhttps://youtu.be/AHUUppXcP_g

http://www.slideshare.net/utstikkar/discorank-optimizing-discoverability-on-soundcloud

http://www.slideshare.net/utstikkar/discorank-optimizing-discoverability-on-soundcloud

https://youtu.be/AHUUppXcP_g

https://youtu.be/AHUUppXcP_g

Let’s talk about numbers

SOUNDCLOUD

Traffic growth>3,000 QPS at peak

SOUNDCLOUD

Catalog size

https://www.elastic.co/use-cases/soundcloud



SOUNDCLOUD

Catalog size

>500M documentsindexed (total)

Spread over 5 shards4 clusters

two for owned clients and two for public traffic

>70 nodes (total)2 shards and ~50GB per node

Scaling& optimizations

Vertical scalingScaling up, bigger nodes

Horizontal scalingScaling out, more nodes

Vertical scalingScaling up, bigger nodes

Horizontal scalingScaling out, more nodes

● Before you allocate more shards (or merge them into bigger ones) check that there’s enough memory for OS cache.

● Don’t cross 32GB heap.

● More things to run == more things to fail● Makes maintenance longer/harder (e.g. longer replication)● You may mix both strategies

Separate clusters story

Premium Content{ "bool": { "should": [ { "nested": { "path": "policies", "filter": { "bool": { "must": [ { "term": { "geo_country_code": "<COUNTRY_CODE>" } }, { "term": { "monetization_model": "SUB_HIGH_TIER" } } ], "must_not": { "term": { "policy": "BLOCK" } } } } } }, { "bool": { "must": { "nested": { "path": "policies", "filter": { "bool": { "must": [ { "term": { "geo_country_code": "--" } }, { "term": { "monetization_model": "SUB_HIGH_TIER" } } ],

"must_not": { "term": { "policy": "BLOCK" } } } } } }, "must_not": { "nested": { "path": "policies", "filter": { "term": { "geo_country_code": "<COUNTRY_CODE>" } } } } } } ], "_cache": true, "_cache_key": "<COUNTRY_CODE>:SUB_HIGH_TIER" }}

It’s complicated!

Memory constraints

Scorer model

Field data

Filter cache

Scorer model(in filesystem)

Field data

Filter cache

31GB

18GB

Monitoringand Alerting

Prometheus

ALERT SearchHighResponseTime_Owned IF histogram_quantile(0.95, sum(rate(http_request_duration_seconds{ job = "search-dispatcher-owned" }[1m])) by (job, le)) > 0.5 FOR 5m LABELS { service = "search-dispatcher", severity = "critical", } ANNOTATIONS { summary = "Slow search {{$labels.job}}", description = \"95th percentile of response time of {{$labels.job}} is {{$value}}ms. We expect it to be below 500ms.", runbook = \"http://<redacted>/runbooks/search/#searchhighresponsetime", }

Alerting rules

KPIs and A/B testing

A/B testing

Engagement Rate - % of searches that produce an engagement or a listen

click-through rateengagement rate

Engagement rate

Click-through rate

Volume of Search Usage

Clicks at Position One

Listening Time

Time to respond

Time to First Click

Queries Before First Click in Session

Other KPIs

A/B tests

Sample questions:

● Does increase of minimum number of matching tokens

(minimum_should_match) increase engagement?

● How about boost of exact matches?

● Should we increase or decrease importance of DiscoRank?

Thank you

search at soundcloud

Software