1 using words to search a thousand images hierarchical faceted metadata in search & browsing...

1

Using Words to Search a Thousand Images

Hierarchical Faceted Metadata in Search & Browsing

Marti HearstSIMS, UC Berkeley

Research funded by:NSF CAREER Grant IIS-9984741

2

Outline

• How do people search for images?• Current approaches:

– Spatial similarity– Keywords

• Our approach:– Hierarchical Faceted Metadata– Very careful UI design and testing

• Usability Study• Conclusions

3

How do people want to search and browse images?

Ethnographic studies of people who use images intensely find:– Find specific objects is easy

• Find images of the Empire State Building

– Browsing is hard, and people want to use rich descriptors.

4

Ethnographic Studies

• Garber & Grunes ’92– Art directors, art buyers, stock photo

researchers– Search for appropriate images is iterative– After specifying and weighting criteria,

searchers view retrieved images, then• Add restrictions• Change criteria• Redefine Search

– Concept starts out loosely defined, then becomes more refined.

5

Ethnographic Studies

• Markkula & Sormunen ’00– Journalists and newspaper editors– Choosing photos from a digital

archive• Stressed a need for browsing• Searching for specific objects is trivial• Photos need to deal with themes, places,

types of objects, views

– Had access to a powerful interface, but it had 40 entry forms and was generally hard to use; no one used it.

6

Query Study

• Armitage & Enser ’97– Analyzed 1,749 queries submitted to

7 image and film archives– Classified queries into a 3x4 facet

matrix• Rio Carnivals: Geo Location x Kind of

Event

– Conclude that users want to search images according to combinations of topical categories.

7

Ethnographic Study• Ame Elliot ’02

– Architects

• Common activities:– Use images for inspiration

• Browsing during early stages of design

– Collage making, sketching, pinning up on walls• This is different than illustrating powerpoint

• Maintain sketchbooks & shoeboxes of images– Young professionals have ~500, older ~5k

• No formal organization scheme– None of 10 architects interviewed about their image

collections used indexes

• Do not like to use computers to find images

8

Current Approaches to Image Search• Using Visual “Content”

– Extract color, texture, shape• QBIC (Flickner et al. ‘95)• Blobworld (Carson et al. ‘99)• Body Plans (Forsyth & Fleck ‘00)• Piction: images + text (Srihari et al. ’91 ’99)

– Two uses:• Show a clustered similarity space • Show those images similar to a selected one

– Usability studies:• Rodden et al.: a series of studies• Clusters don’t work; showing textual labels is

promising.

9

Rodden et al., CHI 2001

10


11


12

Current Approaches to Image Search

• Keyword based– WebSeek (Smith and Jain ’97)– Commercial image vendors (Corbis,

Getty)– Commercial web image search

systems– Museum web sites

13

A Disconnect

Why are image search systems built so differently from what people want?

– An image is worth a thousand words.– But the converse has merit too!

14

Some Challenges

• Users don’t like new search interfaces.

• How to show lots more information without overwhelming or confusing?

15

Our Approach

• Integrate the search seamlessly into the information architecture.

• Use proper HCI methodologies.• Use faceted metadata

16

Faceted Metadata

17

What are facets?

• Sets of categories, each of which describe a different aspect of the objects in the collection.

• Each of these can be hierarchical.• (Not necessarily mutually exclusive

nor exhaustive, but often that is a goal.)

Time/Date Topic RoleGeoRegion

18

Facet example: Recipes

Course

Main Course

CookingMethod

Stir-fry

Cuisine

Thai

Ingredient

Red Bell Pepper

Curry

Chicken

19

Goal: assign labels from facets

20

Motivation Description: 19th c. paint horse; saddle and hackamore; spurs;

bandana on rider; old time cowboy hat; underchin thong; flying off.

Nature Animal Mammal Horse

Occupations Cowboy

Clothing Hats Cowboy Hat

Media Engraving Wood Eng.

Location North America America

21

Hierarchical Faceted Metadata

• A simplification of knowledge representation

• Does not represent relationships directly

• BUT can be understood well by many people when browsing rich collections of information.

22

How to Put In an Interface?Some Challenges:

• Users don’t like new search interfaces.

• How to show lots of information without overwhelming or confusing?

23

A Solution (The Flamenco Project)

• Use proper HCI methods.

• Organize search results according to the faceted metadata so navigation looks similar throughout

– Easy to see what to go next, were you’ve been

– Avoids empty result sets

– Integrates seamlessly with keyword search

33

What is Tricky About This?

• It is easy to do it poorly– See Yahoo example

• It is hard to be not overwhelming– Most users prefer simplicity unless

complexity really makes a difference

• It is hard to “make it flow”– Can it feel like “browsing the

shelves”?

35

Using HCI Method(Human-Computer Interaction)

• Identify Target Population– Architects, city planners

• Needs assessment. – Interviewed architects and conducted contextual inquiries.

• Lo-fi prototyping. – Showed paper prototype to 3 professional architects.

• Design / Study Round 1. – Simple interactive version. Users liked metadata idea.

• Design / Study Round 2: – Developed 4 different detailed versions; evaluated with 11

architects; results somewhat positive but many problems identified. Matrix emerged as a good idea.

36

Method (cont)• Metadata revision.

– Compressed and simplified the metadata hierarchies

• Design / Study Round 3. – New version based on results of Round 2– Highly positive user response

• Identified new user population/collection– Students and scholars of art history– Fine arts images

• Study Round 4– Compare the metadata system to a strong,

representative baseline

37

Final Usability Study

• Participants & Collection– 32 Art History Students– ~35,000 images from SF Fine Arts Museum

• Study Design– Within-subjects

• Each participant sees both interfaces• Balanced in terms of order and tasks

– Participants assess each interface after use– Afterwards they compare them directly

• Data recorded in behavior logs, server logs, paper-surveys; one or two experienced testers at each trial.

• Used 9 point Likert scales.• Session took about 1.5 hours; pay was $15/hour

38

The Baseline System

• Floogle• Take the best of the existing

keyword-based image search systems

39

sword

43

Evaluation Quandary

• How to assess the success of browsing?– Timing is usually not a good indicator– People often spend longer when

browsing is going well.• Not the case for directed search

– Can look for comprehensiveness and correctness (precision and recall) …

– … But subjective measures seem to be most important here.

44

Hypotheses

• We attempted to design tasks to test the following hypotheses:– Participants will experience greater search

satisfaction, feel greater confidence in the results, produce higher recall, and encounter fewer dead ends using FC over Baseline

– FC will perceived to be more useful and flexible than Baseline

– Participants will feel more familiar with the contents of the collection after using FC

– Participants will use FC to create multi-faceted queries

45

Four Types of Tasks

– Unstructured (3): Search for images of interest – Structured Task (11-14): Gather materials for

an art history essay on a given topic, e.g.• Find all woodcuts created in the US• Choose the decade with the most• Select one of the artists in this periods and show all

of their woodcuts• Choose a subject depicted in these works and find

another artist who treated the same subject in a different way.

– Structured Task (10): compare related images• Find images by artists from 2 different countries

that depict conflict between groups.

– Unstructured (5): search for images of interest

46

Other Points

• Participants were NOT walked through the interfaces.

• The wording of Task 2 reflected the metadata; not the case for Task 3

• Within tasks, queries were not different in difficulty (t’s<1.7, p >0.05 according to post-task questions)

• Flamenco is and order of magnitude slower than Floogle on average.– In task 2 users were allowed 3 more minutes in

FC than in Baseline.– Time spent in tasks 2 and 3 were significantly

longer in FC (about 2 min more).

47

Results

• Participants felt significantly more confident they had found all relevant images using FC (Task 2: t(62)=2.18, p<.05; Task 3: t(62)=2.03, p<.05)

• Participants felt significantly more satisfied with the results (Task 2: t(62)=3.78, p<.001; Task 3: t(62)=2.03, p<.05)

• Recall scores:– Task2a: In Baseline 57% of participants found

all relevant results, in FC 81% found all.– Task 2b: In Baseline 21% found all relevant,

in FC 77% found all.

48

Post-Interface Assessments

All significant at p<.05 except simple and overwhelming

49

Perceived Uses of Interfaces

What is interface useful for?

6.44

5.475.91

4.91

7.97 7.91

6.646.16

0.00

1.00

2.00

3.00

4.00

5.00

6.00

7.00

8.00

9.00

Useful for mycoursework

Useful forexploring anunfamiliarcollection

Useful for findinga particular image

Useful for seeingrelationships b/w

images

SHASTA

DENALI

Baseline

FC

50

Post-Test Comparison

15 16

2 30

1 29

4 28

8 23

6 24

28 3

1 31

2 29

FCBaseline

Overall Assessment

More useful for your tasksEasiest to useMost flexible

More likely to result in dead endsHelped you learn more

Overall preference

Find images of rosesFind all works from a given period

Find pictures by 2 artists in same media

Which Interface Preferable For:

51

Facet Usage

• Facets driven largely by task content– Multiple facets 45% of time in structured tasks

• For unstructured tasks, – Artists (17%)– Date (15%)– Location (15%)– Others ranged from 5-12%– Multiple facets 19% of time

• From end game, expansion from– Artists (39%)– Media (29%)– Shapes (19%)

52

Qualitative Observations• Baseline:

– Simplicity, similarity to Google a plus– Also noted the usefulness of the category links

• FC:– Starting page “well-organized”, gave “ideas for what to

search for”– Query previews were commented on explicitly by 9

participants– Commented on matrix prompting where to go next

• 3 were confused about what the matrix shows

– Generally liked the grouping and organizing– End game links seemed useful; 9 explicitly remarked

positively on the guidance provided there.– Often get requests to use the system in future

53

Study Results Summary

• Overwhelmingly positive results for the faceted metadata interface.

• Somewhat heavy use of multiple facets.

• Strong preference over the current state of the art.

• This result not seen in similarity-based image search interfaces.

• Hypotheses are supported.

54

Implementation

• All open source code– Mysql database– Python web server (Webkit)– Python code– Lucene search engine (java)

55

Summary and Conclusions

56

Summary• We have addressed several interface

problems:– How to seamlessly integrate metadata

previews with search• Show search results in metadata context• “Disambiguate” search terms

– How to show hierarchical metadata from several facets

• The “matrix” view• Show one level of depth in the “matrix” view

– How to handle large metadata categories• Use intermediate pages

– How to support expanding as well as refining

57

Summary

• Usability studies done on 3 collections:– Recipes: 13,000 items– Architecture Images: 40,000 items– Fine Arts Images: 35,000 items

• Conclusions:– Users like and are successful with the

dynamic faceted hierarchical metadata, especially for browsing tasks

– Very positive results, in contrast with studies on earlier iterations

– Note: it seems you have to care about the contents of the collection to like the interface

58

Summary

• Validating an approach to web site search– Use hierarchical faceted metadata

dynamically, integrated with search

• Many difficult design decisions– Iterating and testing was key

• Bits and pieces were there in industry– The approach is being picked up all over– There are providers (endeca, siderean)

59

Advantages of the Approach

• Supports different search types– Highly constrained known-item searches– Open-ended, browsing tasks – Can easily switch from one mode to the

other midstream– Can both expand and refine

• Allows different people to add content without breaking things

• Can make use of standard technology

60

Some Unanswered Questions

• How to integrate with relevance feedback (more like this)?– Would like to use blobworld-like

features

• How to incorporate user preferences and past behavior?

• How to combine facets to reflect tasks?

61

The Flamenco Project Team

Kevin Chen Ame Elliott

Jennifer EnglishKevin Li

Rashmi Sinha Kirsten Swearingen

Ping Yee

http://flamenco.berkeley.edu

62

Thank you!

flamenco.berkeley.edu

For more information:

1 using words to search a thousand images hierarchical faceted metadata in search & browsing...

Documents

images slide

use images

image search keyword

images similar

appropriate images

retrieved images

search concept

images text srihari