aws webcast - build a scalable search engine with the new amazon cloudsearch

38
Build a Scalable Search Engine With the New Amazon CloudSearch

Upload: amazon-web-services

Post on 06-May-2015

977 views

Category:

Technology


0 download

DESCRIPTION

Amazon CloudSearch is a fully-managed service that makes it easy to set up, operate, and scale a search solution for your website or application. Traditional search solutions require significant time and resources to maintain and operate. In addition to the complexity involved, administration of a search system is also expensive. Amazon CloudSearch not only significantly lowers the cost of a search solution, but it also makes it easy to setup a search system that can change with the needs of the business. During this session we will provide an overview of Amazon CloudSearch including recently launched powerful search and admin features, discuss popular use cases for CloudSearch, and share best practices that will help you fully leverage CloudSearch to build scalable search solutions for your websites and applications.

TRANSCRIPT

Page 1: AWS Webcast - Build a Scalable Search Engine with the New Amazon CloudSearch

Build a Scalable Search Engine With the

New Amazon CloudSearch

Page 2: AWS Webcast - Build a Scalable Search Engine with the New Amazon CloudSearch

Agenda

• What Search Engines Do

• Amazon CloudSearch Introduction

• Building With CloudSearch

Page 3: AWS Webcast - Build a Scalable Search Engine with the New Amazon CloudSearch

What Search Engines Do

Page 4: AWS Webcast - Build a Scalable Search Engine with the New Amazon CloudSearch

Search Engines Connect Us To Data

Page 5: AWS Webcast - Build a Scalable Search Engine with the New Amazon CloudSearch

Documents

Page 6: AWS Webcast - Build a Scalable Search Engine with the New Amazon CloudSearch

Representation of a Document

Field Value

id tt0371746

title Iron Man

description When wealthy industrialist Tony Stark is forced to build

an armored suit after a life-threatening incident, he

ultimately decides to use its technology to fight against

evil.

director John Favreau

actors Robert Downey Jr., Gwyneth Paltrow, Terrence Howard

...

rating 7.9

release_date 2008-05-02T00:00:00Z

Page 7: AWS Webcast - Build a Scalable Search Engine with the New Amazon CloudSearch

Data Types

Doubles

Dates

Signed Integers Text

Literal

Page 8: AWS Webcast - Build a Scalable Search Engine with the New Amazon CloudSearch

Geo

• Latlon data type

• Region search

• Distance sort

• Supports mobile

Page 9: AWS Webcast - Build a Scalable Search Engine with the New Amazon CloudSearch

Text Processing (Normalization)

• Tokenization

(parsing)

• Downcasing

• Stemming

• Stopword removal

• Synonym Addition

When wealthy industrialist Tony Stark is forced to

build an armored suit after a life-threatening

incident, he ultimately decides to use its

technology to fight against evil.

when wealth industrial tony stark force build

armor suit after life threaten incident ultimate

decide use technology fight against evil

Page 10: AWS Webcast - Build a Scalable Search Engine with the New Amazon CloudSearch

Indexing

Term Documents (Posting List)

Iron The Man in the Iron Mask

Iron Man 2

Iron Man

The Iron Giant

The Iron Lady

...

Man Rain Man

The Man in the Moon

Iron Man 2

The Lawnmower Man

The Third Man

Iron Man

...

Page 11: AWS Webcast - Build a Scalable Search Engine with the New Amazon CloudSearch

Matching

The Man in the Iron

Mask

Iron Man 2

Iron Man

The Iron Giant

The Iron Lady

Rain Man

The Man in the Moon

Iron Man 2

The Lawnmower Man

The Third Man

Iron Man

Iron Man 2

Iron Man

Page 12: AWS Webcast - Build a Scalable Search Engine with the New Amazon CloudSearch

Ranking and Relevance

• The meat of the search engine

• TF-IDF – uniqueness and presence

• Additional Criteria

– Measures of document value (e.g. rating)

– Observed user behavior

– Freshness

Page 13: AWS Webcast - Build a Scalable Search Engine with the New Amazon CloudSearch

Summary

• Search makes data accessible

• Search documents gather information about one search target

• Reverse indices provide the basis of text-text matching

• Relevance brings the best matches

Page 14: AWS Webcast - Build a Scalable Search Engine with the New Amazon CloudSearch

Amazon CloudSearch

Page 15: AWS Webcast - Build a Scalable Search Engine with the New Amazon CloudSearch

Building a Search service

• Build your own

– Extend datastores and build custom relevance engine

• Open Source

– Apache Solr, ElasticSearch

• Legacy Enterprise Search

– FAST, Autonomy, Endeca

Page 16: AWS Webcast - Build a Scalable Search Engine with the New Amazon CloudSearch

Challenges with building a Search service

• COMPLEX: Requires extensive search expertise

• COSTLY: High upfront expenditure

• SLOW: Long time to market. Slows innovation

• UNDIFFERENTIATED: Operational overhead that doesn’t add value to

core product

Page 17: AWS Webcast - Build a Scalable Search Engine with the New Amazon CloudSearch

Where CloudSearch fits in the picture

Amazon CloudSearch is a fully managed search service in the cloud that

makes it easy to setup, operate, and scale a search solution for your

website or application

Similar benefits as other AWS Managed Services

• Easy to setup and operate (Console, SDK, CLT)

• Pay as you go

• No need to guess capacity

• Experiment fast with low risk

• Go Global in minutes

Page 18: AWS Webcast - Build a Scalable Search Engine with the New Amazon CloudSearch

Building With CloudSearch

Page 19: AWS Webcast - Build a Scalable Search Engine with the New Amazon CloudSearch

Create a Domain

Page 20: AWS Webcast - Build a Scalable Search Engine with the New Amazon CloudSearch

Upload Data

Page 21: AWS Webcast - Build a Scalable Search Engine with the New Amazon CloudSearch

Document Upload

http(s)://< document service endpoint >/2013-01-01/documents/batch

Accept: application/json

Content-Length: 1176

Content-Type: application/json

Host: doc.imdb-movies-rr2f34ofg56xneuemujamut52i.us-east-1.cloudsearch.amazonaws.com

{ : , : "tt0371746", : { "directors" : [ "Jon Favreau" ], "release_date" : "2008-04-14T00:00:00Z", "rating" : 7.9, "genres" : [ "Action", "Adventure", "Sci-Fi" ], "image_url" : "http://ia.media-imdb.com/images/M/MV5BMTczNTI2ODUwOF5BMl5BanBnXkFtZTcwMTU0NTIzMw@@._V1_SX400_.jpg", "plot" : "When wealthy industrialist Tony Stark is forced to build an armored suit after a life-threatening incident, he ultimately decides to use its technology to fight against evil.", "title" : "Iron Man", "rank" : 171, "running_time_secs" : 7560, "actors" : [ "Robert Downey Jr.", "Gwyneth Paltrow", "Terrence Howard" ], "year" : 2008 }},

{ , : "tt0434409"} ]

Page 22: AWS Webcast - Build a Scalable Search Engine with the New Amazon CloudSearch

Simple Queries

Movies > Sci-Fi/Fantasy > 2008 to 2010 > Downey > "Iron"

Page 23: AWS Webcast - Build a Scalable Search Engine with the New Amazon CloudSearch

Simple Queries

http(s)/<search endpoint>/2013-01-01/search?q=iron+man

{"id": "tt0371746",

"highlights": {

"plot": "When wealthy industrialist Tony Stark is

forced to build an armored suit after a life-threatening

incident, he ultimately decides to use its technology to

fight against evil.",

"title": "Iron Man"} },

{"id": "tt1866249",

"highlights": {

"plot": "A man in an iron lung who wishes to lose his

virginity contacts a professional sex surrogate with the

help of his therapist and priest.",

"title": "The Sessions" } },

Page 24: AWS Webcast - Build a Scalable Search Engine with the New Amazon CloudSearch

Complex Queries

Movies > Sci-Fi/Fantasy > 2008 to 2010 > Downey > "Iron"

Page 25: AWS Webcast - Build a Scalable Search Engine with the New Amazon CloudSearch

Complex Queries

/search?q=(and 'iron' genres:'Sci-Fi/Fantasy' actors:'downey'

year:[2008,2010] category:'Movies')&q.parser=structured&

q.options={fields:['title^2','plot^0.5']}

{"id": "tt0371746",

"fields": {

"title": "Iron Man",

"year": "2008"

}},

{"id": "tt1228705",

"fields": {

"title": "Iron Man 2",

"year": "2010"

}}

Page 26: AWS Webcast - Build a Scalable Search Engine with the New Amazon CloudSearch

Faceting

Movies > Sci-Fi/Fantasy > 2008 to 2010 > Downey > "Iron"

Page 27: AWS Webcast - Build a Scalable Search Engine with the New Amazon CloudSearch

Feature Detail: Faceting

/search?q=iron man&facet.genres={}

{"status": {...},"hits": {...},

"facets": {"genres": {

"buckets": [

{"value": "Action", "count": 62},

{"value": "Sci-Fi/Fantasy", "count": 25},

{"value": "Comedy", "count": 2},

{"value": "History", "count": 1},...

Page 28: AWS Webcast - Build a Scalable Search Engine with the New Amazon CloudSearch

Adjustable Ranking

Movies > Sci-Fi/Fantasy > 2008 to 2010 > Downey > "Iron"

Page 29: AWS Webcast - Build a Scalable Search Engine with the New Amazon CloudSearch

Expressions

• Baseline TF-IDF function provides textual relevance

• Expressions use field sources or other expressions

• Allows customization per-user or per-query

Page 30: AWS Webcast - Build a Scalable Search Engine with the New Amazon CloudSearch

Movies > Sci-Fi/Fantasy > 2008 to 2010 > Downey > "Iron"

Highlighting

Page 31: AWS Webcast - Build a Scalable Search Engine with the New Amazon CloudSearch

Feature Detail: Highlighting

/search&q=iron+man&highlight.plot={"format":"text"}

{"status": {"rid": "8Pq/88woCwrstGQ=","time-ms": 48},

"hits": {"found": 9,"start": 0,

"hit": [{

"id": "tt1228705",

"fields": {

"title": "Iron Man 2"

},

"highlights": {

"plot": "With the world now aware of his identity as

*Iron* *Man*, Tony Stark must contend..."

} }, . . .

Page 32: AWS Webcast - Build a Scalable Search Engine with the New Amazon CloudSearch

Movies > Sci-Fi/Fantasy > 2008 to 2010 > Downey > "Iron"

Page 33: AWS Webcast - Build a Scalable Search Engine with the New Amazon CloudSearch

Feature Detail: Suggestions

http://<endpoint>/2013-01-01/suggest?q=ir&suggester=title_sug

{"status": {"rid": "t7mti80oAQrstGQ=","time-ms": 3},

"suggest": {"query": "ir", "found": 5,

"suggestions": [

{"suggestion":"Iron Man Three","score": 0,

"id": "tt0371746"},

{ "suggestion": "Iron Man", "score": 0,

"id": "tt1228705"},

Page 34: AWS Webcast - Build a Scalable Search Engine with the New Amazon CloudSearch

Feature Detail: Availability Options

Page 35: AWS Webcast - Build a Scalable Search Engine with the New Amazon CloudSearch

Feature Detail: Scaling Options

Page 36: AWS Webcast - Build a Scalable Search Engine with the New Amazon CloudSearch

Feature Detail: IAM Integration

Configuration API Only

{

"Version":"2012-10-17",

"Statement": [

{ "Effect": "Allow",

"Action": ["cloudsearch:*"],

"Resource": "arn:aws:cloudsearch:us-east-1:111122223333:domain/imdb-movies" },

{ "Effect": "Deny",

"Action": ["cloudsearch:DeleteDomain"],

"Resource": "arn:aws:cloudsearch:us-east-1:111122223333:domain/imdb-movies" }

]

}

Page 37: AWS Webcast - Build a Scalable Search Engine with the New Amazon CloudSearch

Closing Thoughts

• Content Discovery goes hand in hand with Content. Search is

everywhere!

• CloudSearch is a fully managed, easy to use, cost effective search

service

• Get the powerful search features found in open source engines

(Apache Solr) combined with value add AWS features (easy setup, on

demand pricing, auto scaling, Multi-AZ, global availability)