03/01/2017 - meetup: elasticsearch seattle user group

33
Empowering Users with Elastic RealSelf Engineering Team, Rodrigo Nunes (Sr.SWE)

Upload: rodrigo-nunes

Post on 11-Apr-2017

133 views

Category:

Software


3 download

TRANSCRIPT

Page 1: 03/01/2017 - Meetup: Elasticsearch Seattle User Group

Empowering Users with ElasticRealSelf Engineering Team, Rodrigo Nunes (Sr.SWE)

Page 2: 03/01/2017 - Meetup: Elasticsearch Seattle User Group

Vision Statement

“RealSelf is the world’s most trusted online destination for improving your body, face

and smile.”

“We empower you to make smart decisions that lead to great experiences.”

Page 3: 03/01/2017 - Meetup: Elasticsearch Seattle User Group

LAMP

Architecture

AWS, PHP, Yii, Microservices ● Ruby Services: search, stream (events)● Recommendation engine (Java)● Authorization, Authentication (Scala)● Stock (website): PHP,Yii,Angular,Twig,fragments● MySQL/RDS, MongoDB, Elastic 2.x

Page 4: 03/01/2017 - Meetup: Elasticsearch Seattle User Group

Architecture (Whaat ?)

Stock (WS)

Fanout (Events)

Kraken (Data Service,

Hydration)

search-service

Activity Daemon

Charon (AAA)

Recommender

MySQL/Yii MongoDB

RabbitMQ

ES

Activity DaemonActivity

DaemonActivity Daemon

Redis

Page 5: 03/01/2017 - Meetup: Elasticsearch Seattle User Group

ReviewsProviders

1Procedures

2

3

Content

Questions and AnswersCertified doctors answering

PhotosDoctor galleriesUsers

Page 6: 03/01/2017 - Meetup: Elasticsearch Seattle User Group

more

Content

Blogs, comments, practices, dr profiles, ...Heavily textual data, unstructured, attached to procedures

Page 7: 03/01/2017 - Meetup: Elasticsearch Seattle User Group

4 clusters

Content

18 indices in the main cluster

20Gb of data

Page 8: 03/01/2017 - Meetup: Elasticsearch Seattle User Group

Cluster Facts● VPC shielded, AWS with cloud plugin (more of that later, ELB,

Lambdas, metrics, ops awesome stuff)● Index templates, hierarchical● Scripts (groovy) - indexed [no inline scripting, SecurityManager

tightening]● Synonyms and stopwords (domain specific, topic contraction-

expansion) ● All deployed via a Distelli app from Github (more of that also later)● Search-service (Ruby) concentrating access (evolution pre-

migration 1.5 -> 2.3.3)● Official ES ruby client [HTTP]

Private GH repo

Page 9: 03/01/2017 - Meetup: Elasticsearch Seattle User Group

Cluster Facts● 8 nodes

● Dedicated master (3), client (2, under ELB), 3 data (m4.xlarge,

16Gb RAM)

● EBS attached storage, SSDs (mostly)

● Cloudwatch/Lambda (monitoring) + Marvel

● Java client (TransportClient on the recommender cluster)

● Splunk, NewRelic

Page 10: 03/01/2017 - Meetup: Elasticsearch Seattle User Group

id name specialty location

123 Dr. Who, MD Plastic Surgeon Atlanta,GA

456 Foo Williams, MD Dentist New York,NY

777 Batman Robin,PC Facial Plastic Surgeon Miami,FL

778 Dr. Superman R. Dermatologist Seattle,WA

Learning about content● MySQL DB query dumps, denormalized, bulk indexed● ES 1.3● Aggregations, viz

Page 11: 03/01/2017 - Meetup: Elasticsearch Seattle User Group

Kibana !

Understanding Content

Simple dashboards to slice the content, maps, charts

[SIMPLE DEMO]

Page 12: 03/01/2017 - Meetup: Elasticsearch Seattle User Group

Geo● Users, doctors, practice

locations…● Connect users to providers

(leads)● Provide meaningful

relevant information…

...FAST

Page 13: 03/01/2017 - Meetup: Elasticsearch Seattle User Group

Geo● Personalized to the user,

procedure ● Life-changing decision, little

info● Build a safe, reliable source

of information

(...FAST)

Page 14: 03/01/2017 - Meetup: Elasticsearch Seattle User Group

Geo● MySQL and distance sorting● Results falling inside a

certain area● Include recency factors,

trending content, etc

...FAST ?

Page 15: 03/01/2017 - Meetup: Elasticsearch Seattle User Group

Geo

Take me back to heaven...

Page 16: 03/01/2017 - Meetup: Elasticsearch Seattle User Group

RankingFunction_score:

GET /review/_search { "query": { "function_score": { "query": { "multi_match": { "query": "botox", "fields": [ "topics", "title" ] } }, "field_value_factor": { "field": "votes", "modifier": "log1p", "missing": 0 }, "boost_mode": "sum" } }}

But I want pop-u-lar things...

Page 17: 03/01/2017 - Meetup: Elasticsearch Seattle User Group

RankingPOST drsearch,rs-practice-location/_search { "from": 0, "size": 100, "query": { "function_score": { "filter": { "bool": { "must": [ { "term": { "isActive": true } }, { "geo_distance_range": { "from": "0mi", "to": "100mi", "location": { "lat": 47.0, "lon": -122.0 } } }, { "query": { "multi_match": { "fields": [ "name.stopwords_removed", "specialty.with_synonyms", "topics.with_synonyms" ], "fuzziness": 0, "query": "botox", "operator": "and" } } } ] } }, "score_mode": "sum", "functions": [ { "exp": { "location": { "origin": "47.0, -122.0", "scale": "100mi", "offset": "25mi", "decay": 0.33 } } }, { "script_score": { "lang": "groovy", "script_id": "best_match_star_stats" } }, { "script_score": { "lang": "groovy", "script_id": "dr_ranking_score", "params": { "min": 0, "max": 50, "photo_multiplier": 3 } } }, { "script_score": { "lang": "groovy", "script_id": "get_reviews_count", "params": { "min": 2, "max": 13910 } }, "weight": 0.01 } ] } }}

By pop-u-lar I also mean close to me, reputable, recently reviewed, with good ratings for hair implants….

[keep...scrolling…]

Page 18: 03/01/2017 - Meetup: Elasticsearch Seattle User Group

RankingBy pop-u-lar I also mean close to me, reputable, recently reviewed, with good ratings for hair implants….

[SENSE (Chrome) EXAMPLE]

Page 19: 03/01/2017 - Meetup: Elasticsearch Seattle User Group

RankingBy pop-u-lar I also mean close to me, reputable, recently reviewed, with good ratings for hair transplants….

[Website Example]

Page 20: 03/01/2017 - Meetup: Elasticsearch Seattle User Group

Finder

Drfinder demo

https://www.realself.com/find

Page 21: 03/01/2017 - Meetup: Elasticsearch Seattle User Group

Doctor finder

● Autocompletion, suggestors (completion types in the mapping)○ Payload with metadata (image links, URL, info)

● Scripting for sort● “Best matches” and fast-evolving sort/score strategies● Inline data (denormalization)● A/B test friendly

Page 22: 03/01/2017 - Meetup: Elasticsearch Seattle User Group

Customizing views

Before and after gallery demo:

https://www.realself.com/rhinoplasty/before-and-after-photos#page=1&tags=

Page 23: 03/01/2017 - Meetup: Elasticsearch Seattle User Group

Behind the scenes● Users land on the doctor gallery (before and after) for a specific

procedure

● Lat,Lon -> GPS, permission (HTML5), IP-based

● Query: image index, no images stored. Hydration.

○ Doctor location (practice), ratings

○ Votes/Views/Likes

○ Recency (lower boost)

○ Decay function, associated with the treatment

○ Indexed scripts, easily modifiable, A/B testing, adjustments

Page 24: 03/01/2017 - Meetup: Elasticsearch Seattle User Group

(Not) Everything is Awesome● Deep Pagination

● > 10K images in some galleries

● Caching...lat,long

○ Scrolling

○ Shard-by-shard sorting

○ Doc values (index time)

○ Offset + page size

Page 25: 03/01/2017 - Meetup: Elasticsearch Seattle User Group

Recommendations● Recommender Service

● Java, SpringBoot

● Apache Mahout, CF, BigQuery (GA Data on visitation), Redis

● Item based on user historical data (URLs, widget)

○ YMAL widgets

○ Mobile onboarding (treatment interests)

○ Feeds

Page 26: 03/01/2017 - Meetup: Elasticsearch Seattle User Group

Recommendations● ES based recommendations

● MLT query filter

○ Multiple criteria !

○ Stay on topic or diversify

○ Use a sample doc,

synthetic or text

GET review/_search{ "query": { "filtered": { "query": { "more_like_this" : { "fields" : ["title", "text","topics"], "like_text" : "rhinoplasty pain", "min_term_freq" : 1, "max_query_terms" : 12 } } } }}

Page 27: 03/01/2017 - Meetup: Elasticsearch Seattle User Group

Recommendations● ES based recommendations

● MLT query filter

○ Multiple criteria !

○ Stay on topic or diversify

○ Use a sample doc,

synthetic or text

GET review,question/_search{ "query": { "filtered": { "query": { "more_like_this" : { "fields" : ["title", "description"], "like" : [ { "_index" : "review", "_type" : "review", "_id" : "3120" }, "painful" ], "min_term_freq" : 1, "max_query_terms" : 12 } } } } }

Page 28: 03/01/2017 - Meetup: Elasticsearch Seattle User Group

Recommendations● Transport client

● Separate cluster (from search)

○ Caching (TTL 5d, Redis)

○ User events generated on

view data

○ Populate user feed

(site/mobile app)

GET review,question/_search{ "query": { "filtered": { "query": { "more_like_this" : { "fields" : ["title", "description"], "like" : [ { "_index" : "review", "_type" : "review", "_id" : "3120" }, "painful" ], "min_term_freq" : 1, "max_query_terms" : 12 } } } } }

Page 29: 03/01/2017 - Meetup: Elasticsearch Seattle User Group

Stories for another day● Replacing GSS (Google) with our custom search engine

● Image similarity for B&A gallery

● Spotlights (display ads)

Page 30: 03/01/2017 - Meetup: Elasticsearch Seattle User Group

Improvements● Caching: reduce granularity to known locations

● Leverage Routing for speed (treatment, city)

● Eliminate hydration

● Real-time event-based Indexing (Stream+RabbitMQ/Logstash)

● Stress testing, relevance tweaks

● 5.X (Lucene 6 - dimensional points, geo improvements, reindex

from remote, more, more…)

Page 31: 03/01/2017 - Meetup: Elasticsearch Seattle User Group

Elastic{ON} ‘17● See you all there !

April - meetup on the East side !

Looking for speakers, hosts. Contact us via the meetup page or email

me directly: [email protected]

Page 32: 03/01/2017 - Meetup: Elasticsearch Seattle User Group

And...yes...

...WE ARE HIRING ! www.realself.com/jobs

ENGINEERING

Site Reliability EngineerSoftware Development Manager

Page 33: 03/01/2017 - Meetup: Elasticsearch Seattle User Group

In the Ops side of the op

https://www.slideshare.net/EdAnderson9/operational-elastic-72746640