dzone webinar: search patterns with amazon cloudsearch

54
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Search Patterns Jon Handler, Amazon CloudSearch Solution Architect

Upload: michael-bohlig

Post on 13-Jan-2015

1.270 views

Category:

Technology


0 download

DESCRIPTION

This webinar is based on the Dzone Refcard (http://refcardz.dzone.com/refcardz/search-patterns) and provides patterns for integrating cloud-based search with a variety of applications. Examples of these patterns are demonstrated using Amazon CloudSearch to abstract away the complexities of deploying and administering your own search servers, but the principles apply to other search systems as well.

TRANSCRIPT

Page 1: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Search Patterns

Jon Handler, Amazon CloudSearch Solution Architect

Page 2: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Agenda

!   Amazon CloudSearch Basics !   Searching in the Cloud !   Ranking !   Location-Based Search !   Faceting !   Mixed Data Sources !   Performance

Page 3: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Patterns !   Title-Body Search !   Social Search Patterns !   Mobile Search Patterns ! eCommerce Patterns

Page 4: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Page 5: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

AMAZON CLOUDSEARCH

Page 6: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Search, In The Cloud

Page 7: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Page 8: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

The Cloud is Elastic

SEARCH INSTANCE Index Partition n

Copy 1

SEARCH INSTANCE Index Partition 2

Copy 2

SEARCH INSTANCE Index Partition n

Copy 2

SEARCH INSTANCE Index Partition 2

Copy n

SEARCH INSTANCE

DATA Document Quantity and Size

TRAFFIC Search Request Volume and Complexity

Index Partition n Copy n

SEARCH INSTANCE Index Partition 1

Copy 1

SEARCH INSTANCE Index Partition 2

Copy 1

SEARCH INSTANCE Index Partition 1

Copy 2

SEARCH INSTANCE Index Partition 1

Copy n

Page 9: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

SEARCHING IN THE CLOUD

Page 10: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

CloudSearch Batches { "type": "add",! "id": "tt0076759",! "fields": { ! "title": "Star Wars",!

! "director": "Lucas, George",!! "year": 1977,!! "genre": ["Action","Adventure","Fantasy","Sci-Fi"],!! "actor": ["Ford, Harrison","Fisher, Carrie","Hamill,!! Mark","Jones, James Earl","Guinness, !! ! ! ! Alec",...] } },!

Page 11: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Bootstrapping Data

Source System

Processing Script

Queuing Batching

Amazon EC2

Amazon EC2

Amazon CloudSearch

Amazon SQS

Page 12: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Configuring for Search

!   Text fields for individual word search •  User-generated and external text – titles, descriptions

!   Literal fields for exact matches •  Application-generated text like facets

!   Integer fields for range searching and ranking

Page 13: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Sending Queries http(s)://<endpoint>/2011-02-01/search? !   Simple searches

•  q=<text> !   Filtering

•  bq= (and title:'iron man' genre:'Action') !   Filtering with integer ranges

•  bq=(and 'iron man' year:..2010) !   Geo filtering

•  bq=(and 'iron man' latitude:12700..12900 longitude:5700..5800)

Page 14: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Search Results { "rank": "-text_relevance",! "match-expr": "(label 'star wars')",! "hits": { "found": 7, "start": 0,! "hit": [{"id": "tt1185834"},! {"id": "tt0076759"},! {"id": "tt0086190"},! {"id": "tt0120915"},! {"id": "tt0121765"},! {"id": "tt0080684"},! {"id": "tt0121766"} ]! } ...!}!!

Page 15: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Updating CloudSearch

Amazon EC2 Amazon CloudSearch

Amazon SQS Amazon EC2

Amazon S3 DynamoDB Amazon RDS

Web Server

Users

Update Processor

Page 16: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

BASIC RANKING

Page 17: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Page 18: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Customizing Ranking

! text_relevance and cs.text_relevance !   Rank expressions

•  Compute a score for each document •  &rank=<function>

!   Defined in the console !   Defined at query-time

•  &q='iron-man'&rank-recency=text_relevance + year &rank=recency

Page 19: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Document Structure

Movie

title

description

user_rating

likes

release_date

latitude

longitude

Page 20: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Field Weighting

Page 21: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Field Weighting

!   Adjust relative importance of fields !   &rank-title_boost=

cs.text_relevance({"weights":{"title":4.0}, "default_weight":1})

Page 22: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Popularity

Page 23: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Popularity

!   Convert floating point to integer !   Weight by the number of ranks !   rank-pop=

(user-rating - 2) * log10(number-user-ranks) * 10 + metascore * 3

Page 24: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Freshness

Page 25: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Freshness

!   Exponential decay function

!   &rank-decay= 200*Math.exp(-0.1*days_ago)

r = ce−λt

Page 26: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Rank Expressions: Combined

!   &rank-combined=1.0 * title + 0.5 * popularity + 0.3 * freshness

!   &rank=combined

Page 27: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Page 28: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

LOCATION-BASED SEARCH

Page 29: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

When wealthy industrialist Tony Stark is forced to build an armored suit after a life-threatening incident, he ultimately decides to use its technology to fight against evil. !

Iron Man (2008)!

Tony Stark has declared himself Iron Man and installed world peace... or so he thinks. He soon realizes that not only is there a mad man...!

Iron Man 2 (2010)!

When Tony Stark's world is torn apart by a formidable terrorist called the Mandarin, he starts an odyssey of rebuilding and retribution. !

Iron Man 3 (2013)!

On the hunt for a fabled treasure of gold, a band of warriors, assassins, and a rogue British soldier descend upon a village in feudal China, where a humble blacksmith...!

The Man With The Iron Fists (2012) !

Cancel Iron Man!

Movies Search Social Account Nearby

Done Iron Man

!

Movies Search Social Account Nearby

Mobile Experience

Page 30: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Encoding Location !   Latitude and longitude expressed as

integers Movie

title

description

user_rating

likes

release_date

latitude

longitude

Page 31: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Bounding Box Search

!  Latitude min/max !  Longitude min/max bq=(and 'theater' latitude:12700..12900 longitude:5700..5800)

Page 32: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Location Sort

!   Cartesian distance function

!   &rank-geo=sqrt(pow(latitude - lat, 2) + pow(longitude - lon, 2)

!   &rank=-geo

(lat − latuser )2 + (lon− lonuser )

2

Page 33: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

FACETING

Page 34: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Facets

Page 35: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Facets

Page 36: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Simple Faceting: Document

Movie

title

description

genre

Page 37: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Simple Faceting: Configuration

Page 38: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Simple Faceting: Query q=iron+man&facet=genre

{"rank":  "-­‐text_relevance",  "match-­‐expr":  "(label  'star  wars')",  "hits":  {"found":  7,  "start":  0,  "hit":  []  },  "facets":  {      "genre":  {          "constraints":  [              {"value":  "Family",  "count":  62},              {"value":  "Action/Adventure",  "count":  21},              {"value":  "Drama",  "count":  5  },  

Page 39: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Simple Faceting: UI <div  class='facet'>          <ul  class='facet_list'>                  <?php                          $genres  =  $resultsObj-­‐>facets-­‐>genre-­‐>constraints;                          for  ($i  =  0;  $i  <  count($genres);  $i++)  {                                  $curGenre  =  $genres[$i];  $curCount  =  $thisGenre-­‐>count;                    ?>                  <li  class='facet_item'>                          <div  class='facet_name'><?=$curGenre?></div>                          <div  class='facet_count'><?=$curCount?></div>                  </li>                  <?php  }  ?>          </ul>  </div>  

Page 40: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Facets

Page 41: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Document !   title: Lincoln !   description: ... !   oscar1: Awards !   oscar2: Awards/Best Actor !   oscar3: Awards/Best Actor/Daniel Day

Lewis

Movie title description oscar1 oscar2 oscar3

Page 42: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Query &q=lincoln&facet=oscar1,oscar2,oscar3 {"rank":  "-­‐text_relevance",  "hits":{...},  "facets":  {      "oscar1":  {          "constraints":  [              {"value":  "Awards",  "count":  23},              {"value":  "Nominations",  "count":  124}]},      "oscar2":  {          "constraints":  [              {"value":  "Awards/Best  Actor",  "count":  6},              {"value":  "Awards/Best  Actress",  "count":  3}...]},            "oscar3":  {          "constraints":  [              {"value":  "Awards/Best  Actor/Daniel  Day  Lewis",  "count":  1},              {"value":  "Awards/Best  Actor/Denzel  Washington",  "count":  2}...]},        

Page 43: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Drilldown

! bq=oscar1:'Awards' ! bq=oscar2:'Awards/Best Actor' ! bq=oscar3:'Awards/Best Actor/Daniel Day Lewis' ! bq=(and 'star' oscar2:'Awards/Best Actor')

Page 44: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

MIXED DATA SOURCES

Page 45: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Document

Showtime

type

title

theater_name

city

latitude

longitude

Movie

type

title

description

user_rating

likes

release_date

Review

type

title

movie_name

author

url

body

Page 46: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Heterogeneous Data

Page 47: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Multi Domain

Page 48: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Trade-offs

!   Multiple domain •  Independent configuration •  Independent scale

!   Single domain •  Simpler •  Lower cost •  bq=(and 'iron man' type:'movie')

Page 49: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

TUNING

Page 50: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

What to Track

!   User queries !   Responses !   Response times !   Click positions

Page 51: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Tuning Relevance

!   Return relevance values !   Check no-result queries !   Check most common results

Page 52: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Tuning Performance

!   Identify consistent slow queries !   Tend towards text matching !   Cache slow queries when possible !   Benchmark with JMeter or Siege

Page 53: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Q&A

Page 54: Dzone Webinar: Search Patterns with Amazon CloudSearch

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Resources

!   Amazon CloudSearch Overview Page http://aws.amazon.com/cloudsearch/ •  Developer Guide •  FAQs, Articles •  Community Forum •  Tutorial

!   Free 30-day trial !   Contact: [email protected]