clustering as presented at ux poland 2013

35
Copyright © President & Fellows of Harvard College. Ravi Mynampaty Categorizing Your Search Queries to Improve Findability

Upload: ravi-mynampaty

Post on 29-Nov-2014

762 views

Category:

Technology


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Clustering as presented at UX Poland 2013

Copyright © President & Fellows of Harvard College.

Ravi Mynampaty

Categorizing Your Search Queries to Improve Findability

Page 2: Clustering as presented at UX Poland 2013

About this talk…

Case study on how we are improving search and

browse by performing clustering exercises on search

query data

Not rocket science

High-level overview

You can follow this method, with your own insights and

tweaks

You can kick this off next week at your work

Page 3: Clustering as presented at UX Poland 2013

Inspired by…

• Chapters 8 & 9

• The power of incrementalism

Page 4: Clustering as presented at UX Poland 2013

What is clustering?

A process for organizing and analyzing search log

data that:

Is repeatable, low-cost, scalable, simple

Yields actionable results

Supports constant incremental improvement

to search

Page 5: Clustering as presented at UX Poland 2013

What’s clustering good for?

Ensure results for high frequency queries

Improve Metadata and Taxonomy

Inform and validate decision making in site IA

Informs editorial/curatorial activities

Provides Feedback for Search Suggestions

o Autosuggest, synonym lists, no-hits page

suggestions

But more on this later...

Page 6: Clustering as presented at UX Poland 2013

So how do I cluster search queries?

A simple set of steps

Create query report

Cluster queries

Determine # queries to analyze

Analyze clusters

Draw conclusions

and ACT

Page 7: Clustering as presented at UX Poland 2013

Step 1: Create a query report

We started with the site with the most traffic

• Upper-bound limit

• One year’s data by quarter

• Cut off tail at frequency < 10

Page 8: Clustering as presented at UX Poland 2013

Step 1: Create a query report

We started with the site with the most traffic

• Upper-bound limit

• One year’s data by quarter

• Cut off tail at frequency < 10

Page 9: Clustering as presented at UX Poland 2013

Step 1: Create a query report

We started with the site with the most traffic

• Upper-bound limit

• One year’s data by quarter

• Cut off tail at frequency < 10

HBS Working Knowledge FY12 Use Snapshot

Overall Traffic

Page Views: 6,439,485

Visits: 3,635,746

Unique visitors: 2,734,620

On-site searches: 174,425

Views per Visit: 1.77

Local Search visit rate: 5%

Organic Search visit rate: 46%

Page 10: Clustering as presented at UX Poland 2013

Step 2: Cluster the queries

Page 11: Clustering as presented at UX Poland 2013

Step 2 (cont’d): Three levels of clustering

Level Method Example

Narrow Simple

normalization

Eliminate

grammatical,

spelling, typos, and

punctuation

differences

Mid-level Group by subject management,

finance, decision

making

Broad Group by facet topic, name, date,

content type

Page 12: Clustering as presented at UX Poland 2013

Step 2 (cont’d): Levels Tasks Enabled

Level Improve your

base for

query

analysis

Ensure

representation

of major

clusters on your

site

Improve

Metadata/Index

/Taxonomy

Improve

Search

Suggestions

Narrow

(simple)

X X X

Mid-level

(group by

subject)

X X X

Broad

(group by

facet)

X X

Page 13: Clustering as presented at UX Poland 2013

Step 2 (cont’d): Narrow Clustering Example

Page 14: Clustering as presented at UX Poland 2013

Step 2 (cont’d): Mid-level Example Cluster brand

branding 245

brand 160

brand management 73

consumer branding 57

global brand 32

service brands 24

brand image retail bank 17

employer branding 16

brand management professional

services 16

global branding 13

b2b branding 13

importance of branding 12

brand 2002 12

brand equity 11

brand image 11

Page 15: Clustering as presented at UX Poland 2013

Step 2 (cont’d): Broad Clustering Example

Page 16: Clustering as presented at UX Poland 2013

Step 2 (cont’d): List of facets we used

Facet Example

content type case studies, cases, working papers, articles, newspaper

date 2011, world in 2030

demographic characteristics women, Gen Y, gender, baby boomers

event economic crisis

format podcast, video

geographic area india, japan, mount everest

industry global wine industry

job type/role independent director, entrepreneur, ceo, phd economist

organization name ikea, zara, toyota

person name michael porter, kanter, sebenius

product name / brand name ipad

product/commodity coffee, wine, cement

topic this covers the majority of keywords

work faculty work, ex: publication name, title of a case

Page 17: Clustering as presented at UX Poland 2013

Step 3: Choose #clusters to analyze

Number of

Clusters

Analyzed

Analyze Top Hits Improve Metadata/

Taxonomy

/Index

Supply Search

Suggestions

50 X

150 X X

300+ X X X

Page 18: Clustering as presented at UX Poland 2013

Small # Clusters can cover a lot of your data

Number of top clusters % Total Queries

Top 20 clusters 14

Top 30 clusters 18

Top 50 clusters 26

Top 100 clusters 37

Page 19: Clustering as presented at UX Poland 2013

Now you have your clusters…

What do you do with them?

TAKE ACTION!

Page 20: Clustering as presented at UX Poland 2013

Analyze Top (“Short Head”) Clusters

Clustering has created a condensed and reliable

list of your top search queries

Are they what you thought they would be?

Does the information on your site accurately

represent the top searches?

Are you fulfilling user needs?

Page 21: Clustering as presented at UX Poland 2013

Use your clusters: Improve Site Navigation

Examine the short-head of clusters, basically:

For each cluster, add up the frequencies

of queries

Reorder clusters by cumulative frequency

descending

Ensure top clusters are accounted for in your

navigation

Use cluster topics as browse/navigation

headers/footers for your website

Page 22: Clustering as presented at UX Poland 2013

WK Top Clusters

Cluster Frequency

innovation 867

balanced scorecard 794

leadership 570

cases 545

social media 508

negotiation 470

knowledge management 457

ethics 448

apple 430

corporate social responsibility 398

Page 23: Clustering as presented at UX Poland 2013

Use your clusters: Improve Taxonomy

• Missing categories in browse taxonomy

• "Balanced Scorecard"

• “Ethics”

• “Social media”

• Second-level topics in the WK context

Page 24: Clustering as presented at UX Poland 2013

Use your clusters: Improve Taxonomy

• Missing categories in browse taxonomy

• "Balanced Scorecard"

• “Ethics”

• “Social media”

• Second-level topics in the WK context

Page 25: Clustering as presented at UX Poland 2013

Use your clusters: Improve Taxonomy

• Missing categories in browse taxonomy

• "Balanced Scorecard"

• “Ethics”

• “Social media”

• Second-level topics in the WK context

Page 26: Clustering as presented at UX Poland 2013

Use your clusters: Improve Taxonomy

• Missing categories in browse taxonomy

• "Balanced Scorecard"

• “Ethics”

• “Social media”

• Second-level topics in the WK context

Page 27: Clustering as presented at UX Poland 2013

Mid-level clustering:

Informs editorial /curatorial activities

“Featured Topics”

o What topics to highlight this week/month/year

o News items to focus on

o What research guides to create

o How to formulate queries for the topics

Page 28: Clustering as presented at UX Poland 2013

How about improving search?

Clustered list provides synonyms for taxonomy

Requires human judgment and

standards/guidelines for synonyms – in our

case, synonyms are exact

Map to one "like term" in the search engine

Example:

Balanced Scorecard, BSC, Balanced score card

kaplan and norton -> Balanced Scorecard

Page 29: Clustering as presented at UX Poland 2013

Use your clusters: Improve no-hits page

Page 30: Clustering as presented at UX Poland 2013

Time Commitment

• 2 hours to 2 weeks

• Variables include:

• What kind of information you want to gather

• How broad or narrow you want your clusters

• How many queries you analyze

• In our case ~2 person-weeks

Page 31: Clustering as presented at UX Poland 2013

Results vs. Time Invested

Analyze top

clusters

Update

Taxonomy

Create New

Metadata

Determine

New Search

Suggestions

2 Hours X X

6 Hours X X X

One Week X X X X

Page 32: Clustering as presented at UX Poland 2013

Next Steps: Autosuggest

Your top clusters probably make up a large

percentage of what people are looking for

o Use them to establish/supplement

auto-suggest!

Example: suggestions for “innovation”

o innovation and leadership

o disruptive innovation

o innovation management

o open innovation

Page 33: Clustering as presented at UX Poland 2013

Next Steps: New Access Structures

Needed an obvious way to search podcasts

o Put in best bets for now

A lot of people searching for article titles o Considering simple interface/approach for select

field-specific search, e.g. “title”

Consider adding other facets to browse

taxonomy where we have entities tagged o “company name”, “job type/class”, etc.

Page 34: Clustering as presented at UX Poland 2013

Summary

Established plan/process, but be willing to tweak

as you go

Keep it very simple.

Play with your data – the more we played, the better

we understood what benefits could be realized by

levels of clustering and effort

Tuning process/results

o Build staging/working prototypes

o Repeat process on other sites

Page 35: Clustering as presented at UX Poland 2013

Thank you! And remember…TAKE ACTION!

Kropla drąży skalę !

Questions?

[email protected]

@ravimynampaty

http://www.slideshare.net/mynampaty/