search at soundcloud
TRANSCRIPT
Search at SoundCloud
Tomasz Elendt
Don’t be Lost in Music
SOUNDCLOUD
SOUNDCLOUD
Eng Manager Designer PM Analyst
Max Eileen Jörn Warren
JanetteTomaszChris DenisEngineer
Search Team
Engineer Engineer Analyst
Data Scientist
Özgür
?You!
If it exists on SoundCloud, it can be found through Search.
Team Mission
A bit of history
● October 2008 - launch of soundcloud.com
● 2010 ??? - add search
● 2012 - new search architecture
SOUNDCLOUD
SOUNDCLOUD
New Search Architecturehttps://youtu.be/qI584upmYTY
https://developers.soundcloud.com/blog/architecture-behind-our-new-search-and-explore-experience
https://www.thoughtworks.com/de/insights/blog/bff-soundcloud
Microservices
SOUNDCLOUD
Read path
own1
own2
pub1
pub2
api-web
api-mobile
owned
publicapi-partners
public-api
dispatcher
SOUNDCLOUD
Write path
bulk feeder
stream feeder
indexer
DB slave
microservices
query feeder
poke API
bulk API
own1
pub1
pub2
own2
Scoring
SoundCloud activity graph● user uploaded track● user created playlist● user liked track● user liked playlist● user reposted track● user reposted playlist● user follows user● track is part of a playlist● no plays
Combine query similarity score with document’s static DiscoRank score
Scoring
HDFSDiscovery
ranking job(s) HDFS
Elasticsearch
Scorer plugin
WebHDFS
serialized eventsevent rollups
update ready!
DiscoRank
http://www.slideshare.net/utstikkar/discorank-optimizing-discoverability-on-soundcloudhttps://youtu.be/AHUUppXcP_g
Let’s talk about numbers
SOUNDCLOUD
Traffic growth>3,000 QPS at peak
SOUNDCLOUD
Catalog size
https://www.elastic.co/use-cases/soundcloud
SOUNDCLOUD
Catalog size
https://www.elastic.co/use-cases/soundcloud
SOUNDCLOUD
Catalog size
>500M documentsindexed (total)
Spread over 5 shards4 clusters
two for owned clients and two for public traffic
>70 nodes (total)2 shards and ~50GB per node
Scaling& optimizations
Vertical scalingScaling up, bigger nodes
Horizontal scalingScaling out, more nodes
Vertical scalingScaling up, bigger nodes
Horizontal scalingScaling out, more nodes
● Before you allocate more shards (or merge them into bigger ones) check that there’s enough memory for OS cache.
● Don’t cross 32GB heap.
● More things to run == more things to fail● Makes maintenance longer/harder (e.g. longer replication)● You may mix both strategies
Separate clusters story
Premium Content{ "bool": { "should": [ { "nested": { "path": "policies", "filter": { "bool": { "must": [ { "term": { "geo_country_code": "<COUNTRY_CODE>" } }, { "term": { "monetization_model": "SUB_HIGH_TIER" } } ], "must_not": { "term": { "policy": "BLOCK" } } } } } }, { "bool": { "must": { "nested": { "path": "policies", "filter": { "bool": { "must": [ { "term": { "geo_country_code": "--" } }, { "term": { "monetization_model": "SUB_HIGH_TIER" } } ],
"must_not": { "term": { "policy": "BLOCK" } } } } } }, "must_not": { "nested": { "path": "policies", "filter": { "term": { "geo_country_code": "<COUNTRY_CODE>" } } } } } } ], "_cache": true, "_cache_key": "<COUNTRY_CODE>:SUB_HIGH_TIER" }}
It’s complicated!
Memory constraints
Scorer model
Field data
Filter cache
Scorer model(in filesystem)
Field data
Filter cache
31GB
18GB
Monitoringand Alerting
Prometheus
ALERT SearchHighResponseTime_Owned IF histogram_quantile(0.95, sum(rate(http_request_duration_seconds{ job = "search-dispatcher-owned" }[1m])) by (job, le)) > 0.5 FOR 5m LABELS { service = "search-dispatcher", severity = "critical", } ANNOTATIONS { summary = "Slow search {{$labels.job}}", description = \"95th percentile of response time of {{$labels.job}} is {{$value}}ms. We expect it to be below 500ms.", runbook = \"http://<redacted>/runbooks/search/#searchhighresponsetime", }
Alerting rules
KPIs and A/B testing
A/B testing
Engagement Rate - % of searches that produce an engagement or a listen
click-through rateengagement rate
Engagement rate
Click-through rate
Volume of Search Usage
Clicks at Position One
Listening Time
Time to respond
Time to First Click
Queries Before First Click in Session
Other KPIs
A/B tests
Sample questions:
● Does increase of minimum number of matching tokens
(minimum_should_match) increase engagement?
● How about boost of exact matches?
● Should we increase or decrease importance of DiscoRank?
Thank you