03/01/2017 - meetup: elasticsearch seattle user group
TRANSCRIPT
Empowering Users with ElasticRealSelf Engineering Team, Rodrigo Nunes (Sr.SWE)
Vision Statement
“RealSelf is the world’s most trusted online destination for improving your body, face
and smile.”
“We empower you to make smart decisions that lead to great experiences.”
LAMP
Architecture
AWS, PHP, Yii, Microservices ● Ruby Services: search, stream (events)● Recommendation engine (Java)● Authorization, Authentication (Scala)● Stock (website): PHP,Yii,Angular,Twig,fragments● MySQL/RDS, MongoDB, Elastic 2.x
Architecture (Whaat ?)
Stock (WS)
Fanout (Events)
Kraken (Data Service,
Hydration)
search-service
Activity Daemon
Charon (AAA)
Recommender
MySQL/Yii MongoDB
RabbitMQ
ES
Activity DaemonActivity
DaemonActivity Daemon
Redis
ReviewsProviders
1Procedures
2
3
Content
Questions and AnswersCertified doctors answering
PhotosDoctor galleriesUsers
more
Content
Blogs, comments, practices, dr profiles, ...Heavily textual data, unstructured, attached to procedures
4 clusters
Content
18 indices in the main cluster
20Gb of data
Cluster Facts● VPC shielded, AWS with cloud plugin (more of that later, ELB,
Lambdas, metrics, ops awesome stuff)● Index templates, hierarchical● Scripts (groovy) - indexed [no inline scripting, SecurityManager
tightening]● Synonyms and stopwords (domain specific, topic contraction-
expansion) ● All deployed via a Distelli app from Github (more of that also later)● Search-service (Ruby) concentrating access (evolution pre-
migration 1.5 -> 2.3.3)● Official ES ruby client [HTTP]
Private GH repo
Cluster Facts● 8 nodes
● Dedicated master (3), client (2, under ELB), 3 data (m4.xlarge,
16Gb RAM)
● EBS attached storage, SSDs (mostly)
● Cloudwatch/Lambda (monitoring) + Marvel
● Java client (TransportClient on the recommender cluster)
● Splunk, NewRelic
id name specialty location
123 Dr. Who, MD Plastic Surgeon Atlanta,GA
456 Foo Williams, MD Dentist New York,NY
777 Batman Robin,PC Facial Plastic Surgeon Miami,FL
778 Dr. Superman R. Dermatologist Seattle,WA
Learning about content● MySQL DB query dumps, denormalized, bulk indexed● ES 1.3● Aggregations, viz
Kibana !
Understanding Content
Simple dashboards to slice the content, maps, charts
[SIMPLE DEMO]
Geo● Users, doctors, practice
locations…● Connect users to providers
(leads)● Provide meaningful
relevant information…
...FAST
Geo● Personalized to the user,
procedure ● Life-changing decision, little
info● Build a safe, reliable source
of information
(...FAST)
Geo● MySQL and distance sorting● Results falling inside a
certain area● Include recency factors,
trending content, etc
...FAST ?
Geo
Take me back to heaven...
RankingFunction_score:
GET /review/_search { "query": { "function_score": { "query": { "multi_match": { "query": "botox", "fields": [ "topics", "title" ] } }, "field_value_factor": { "field": "votes", "modifier": "log1p", "missing": 0 }, "boost_mode": "sum" } }}
But I want pop-u-lar things...
RankingPOST drsearch,rs-practice-location/_search { "from": 0, "size": 100, "query": { "function_score": { "filter": { "bool": { "must": [ { "term": { "isActive": true } }, { "geo_distance_range": { "from": "0mi", "to": "100mi", "location": { "lat": 47.0, "lon": -122.0 } } }, { "query": { "multi_match": { "fields": [ "name.stopwords_removed", "specialty.with_synonyms", "topics.with_synonyms" ], "fuzziness": 0, "query": "botox", "operator": "and" } } } ] } }, "score_mode": "sum", "functions": [ { "exp": { "location": { "origin": "47.0, -122.0", "scale": "100mi", "offset": "25mi", "decay": 0.33 } } }, { "script_score": { "lang": "groovy", "script_id": "best_match_star_stats" } }, { "script_score": { "lang": "groovy", "script_id": "dr_ranking_score", "params": { "min": 0, "max": 50, "photo_multiplier": 3 } } }, { "script_score": { "lang": "groovy", "script_id": "get_reviews_count", "params": { "min": 2, "max": 13910 } }, "weight": 0.01 } ] } }}
By pop-u-lar I also mean close to me, reputable, recently reviewed, with good ratings for hair implants….
[keep...scrolling…]
RankingBy pop-u-lar I also mean close to me, reputable, recently reviewed, with good ratings for hair implants….
[SENSE (Chrome) EXAMPLE]
RankingBy pop-u-lar I also mean close to me, reputable, recently reviewed, with good ratings for hair transplants….
[Website Example]
Doctor finder
● Autocompletion, suggestors (completion types in the mapping)○ Payload with metadata (image links, URL, info)
● Scripting for sort● “Best matches” and fast-evolving sort/score strategies● Inline data (denormalization)● A/B test friendly
Customizing views
Before and after gallery demo:
https://www.realself.com/rhinoplasty/before-and-after-photos#page=1&tags=
Behind the scenes● Users land on the doctor gallery (before and after) for a specific
procedure
● Lat,Lon -> GPS, permission (HTML5), IP-based
● Query: image index, no images stored. Hydration.
○ Doctor location (practice), ratings
○ Votes/Views/Likes
○ Recency (lower boost)
○ Decay function, associated with the treatment
○ Indexed scripts, easily modifiable, A/B testing, adjustments
(Not) Everything is Awesome● Deep Pagination
● > 10K images in some galleries
● Caching...lat,long
○ Scrolling
○ Shard-by-shard sorting
○ Doc values (index time)
○ Offset + page size
Recommendations● Recommender Service
● Java, SpringBoot
● Apache Mahout, CF, BigQuery (GA Data on visitation), Redis
● Item based on user historical data (URLs, widget)
○ YMAL widgets
○ Mobile onboarding (treatment interests)
○ Feeds
Recommendations● ES based recommendations
● MLT query filter
○ Multiple criteria !
○ Stay on topic or diversify
○ Use a sample doc,
synthetic or text
GET review/_search{ "query": { "filtered": { "query": { "more_like_this" : { "fields" : ["title", "text","topics"], "like_text" : "rhinoplasty pain", "min_term_freq" : 1, "max_query_terms" : 12 } } } }}
Recommendations● ES based recommendations
● MLT query filter
○ Multiple criteria !
○ Stay on topic or diversify
○ Use a sample doc,
synthetic or text
GET review,question/_search{ "query": { "filtered": { "query": { "more_like_this" : { "fields" : ["title", "description"], "like" : [ { "_index" : "review", "_type" : "review", "_id" : "3120" }, "painful" ], "min_term_freq" : 1, "max_query_terms" : 12 } } } } }
Recommendations● Transport client
● Separate cluster (from search)
○ Caching (TTL 5d, Redis)
○ User events generated on
view data
○ Populate user feed
(site/mobile app)
GET review,question/_search{ "query": { "filtered": { "query": { "more_like_this" : { "fields" : ["title", "description"], "like" : [ { "_index" : "review", "_type" : "review", "_id" : "3120" }, "painful" ], "min_term_freq" : 1, "max_query_terms" : 12 } } } } }
Stories for another day● Replacing GSS (Google) with our custom search engine
● Image similarity for B&A gallery
● Spotlights (display ads)
Improvements● Caching: reduce granularity to known locations
● Leverage Routing for speed (treatment, city)
● Eliminate hydration
● Real-time event-based Indexing (Stream+RabbitMQ/Logstash)
● Stress testing, relevance tweaks
● 5.X (Lucene 6 - dimensional points, geo improvements, reindex
from remote, more, more…)
Elastic{ON} ‘17● See you all there !
April - meetup on the East side !
Looking for speakers, hosts. Contact us via the meetup page or email
me directly: [email protected]
And...yes...
...WE ARE HIRING ! www.realself.com/jobs
ENGINEERING
Site Reliability EngineerSoftware Development Manager
In the Ops side of the op
https://www.slideshare.net/EdAnderson9/operational-elastic-72746640