faceted search using solr and ontopia

39
Faceted search using Solr and Ontopia 2009-11-03 Geir Ove Grønmo, [email protected]

Upload: geir-ove-gronmo

Post on 01-Nov-2014

7.340 views

Category:

Technology


0 download

DESCRIPTION

A presentation about how Ontopia and Solr can be integrated.

TRANSCRIPT

Page 1: Faceted search using Solr and Ontopia

Faceted search using Solr and Ontopia

2009-11-03Geir Ove Grønmo, [email protected]

Page 2: Faceted search using Solr and Ontopia

Agenda

• Short introductions to Solr and Ontopia• What is faceted search?• An integration of the two – a prototype• Demos

Page 3: Faceted search using Solr and Ontopia

Apache Solr

• A search engine– implemented as HTTP service on top of Apache

Lucene– searching and indexing (no web-crawling)– adds support for faceted search (and more)– sharding and replication– distributed search– excellent interoperability (i.e not really Java-specific)

• Next release: Solr 1.4• Open source:

– http://lucene.apache.org/solr/– Apache Licence 2.0

Page 4: Faceted search using Solr and Ontopia

Ontopia

• A Topic Maps toolkit:– data representation, persistence and querying– application development– written in Java

• Next release: Ontopia 5.1• Open source:– http://code.google.com/p/ontopia/– Apache Licence 2.0

Page 5: Faceted search using Solr and Ontopia

Where the meat is...

• Solr– fast textual search and faceted search support

• Ontopia– rich semantic data and structured search

• User interface design– providing a useful interface to the user

Page 6: Faceted search using Solr and Ontopia

But first, what is faceted search?

• A technique for refining search results– Integrates textual search and navigation

• Allows concept composition– slow + expensive + red + used + car– article + in english + about salmon– people + aged 20-30 + SQL expert– punk rock songs + < 1 minute + in norwegian

+ released 1980-1982

• Support exploration and learning• Never returns zero results

Page 7: Faceted search using Solr and Ontopia
Page 8: Faceted search using Solr and Ontopia

How is it done?

• Given a starting set– usually all documents– or the result of filling in the search input box

• ...do the following:– count the number of hits matching each facet

field– which fields to facet on are defined at query

time

Page 9: Faceted search using Solr and Ontopia
Page 10: Faceted search using Solr and Ontopia
Page 11: Faceted search using Solr and Ontopia
Page 12: Faceted search using Solr and Ontopia

An example without faceted search

Page 13: Faceted search using Solr and Ontopia

Facet types

• Standard facets– a list of facet values

• Hierarchical facet values– taxonomy of facet values

• Range/query facets– dates– prices– alphabet buckets– intervals (lower and upper bounds)

Page 14: Faceted search using Solr and Ontopia

Standard facets

Page 15: Faceted search using Solr and Ontopia

Hierarchical facet values

Note: the facets can also be hierarchical

Page 16: Faceted search using Solr and Ontopia

Alphabet buckets

Page 17: Faceted search using Solr and Ontopia

Range facets

Page 18: Faceted search using Solr and Ontopia

User interface considerations

• Single select– link– radio button

• Multi select– checkboxes

• Decide on which operator to use: AND/OR– within a facet– between facets

• How many facet values to display– given limited screen real estate

• How to provide intuitive undo operation

Page 19: Faceted search using Solr and Ontopia

Examples

Page 20: Faceted search using Solr and Ontopia

Scoring

• Some types of documents should be ranked higher than others

• Solr lets one boost the default score:– per document– per field

• The total score of a documents depends on:– the boost and score of the fields adjusted by

how relevant a field is relatively to the actual query

– the boost of the document

Page 21: Faceted search using Solr and Ontopia

Sorting

• How to sort the list of facets?– by relevance

• How to sort the values of each facet?– by number of hits– alphabetically

• How to sort the search result?– by relevance– alphabetically– by date

Page 22: Faceted search using Solr and Ontopia

Proposition

• “Concept composition, using faceted search, and Topic Maps is a perfect match”

Page 23: Faceted search using Solr and Ontopia

Why not use Ontopia only?

• You can, but it is not optimizedfor this use case

• It lets you implement faceted search– but it’ll be too slow

• The reasons are:– all the expensive processing will have to

happen at runtime, and not indexing time– involves a lot of traversal– relies on the underlying fulltext search engine– search has limited cacheability

Page 24: Faceted search using Solr and Ontopia

Trade-offs

• Considerations:– Search performance– Indexing performance– Consistency

• Ontopia– no indexing overhead– results always up-to-date

• Solr– very fast search– indexing overhead– index must be kept up-to-date regularly

Page 25: Faceted search using Solr and Ontopia

Solr – the data model

• An index contains documents• Documents have fields• A field can have multiple values

{ “id”: “1234”, “title”: “Structure and Interpretation of Computer Programs”, “authors”: [“Harold Abelson”, “Gerald Jay Sussman”] }

Page 26: Faceted search using Solr and Ontopia

Ontopia – the data model

• A topic map contains– topics– and information about them

• Identities• Names• Associations to other topics• Occurrences (read: non-association

properties)

Page 27: Faceted search using Solr and Ontopia

Integrating Solr and Ontopia

• Proposed solution:– Solr indexes constructed from Ontopia

queries– For each document type create a query that

extracts data from the topic map to fields in documents

– Then do faceting on selected fields

• Use-case specific schema definition– should be project specific (to some degree)

• Perform full index or incremental reindex

Page 28: Faceted search using Solr and Ontopia

Index rule set

Page 29: Faceted search using Solr and Ontopia

Index rule: Organisasjonsenheter

Page 30: Faceted search using Solr and Ontopia

Query result: Organisasjonsenheter

Page 31: Faceted search using Solr and Ontopia

Solr index: Organisasjonsenhet

id title type lokalisering

T1001448 Grønnmyr barnehage

Organisasjonsenhet Åsane

T1009449 Sone Arna/Åsane Organisasjonsenhet Arna

T1009465 Sone Fana/Ytrebygda

Organisasjonsenhet Arna

T1009492 Bybanekontoret Organisasjonsenhet Arna

T1009507 Sone Fyllingsdalen/Laksevåg

Organisasjonsenhet Arna

Page 32: Faceted search using Solr and Ontopia

Index rule: Artikler

Page 33: Faceted search using Solr and Ontopia

Query result: Artikler

Page 34: Faceted search using Solr and Ontopia

Solr index: Artikler

id title type description authorT1000005 En kunstner i arbeid Artikkel Kjersti Nygård

T1000010 Slagord for Brinken barnehage.

Artikkel Samspill og glede - det handler om å være tilstede. Slagordet sier noe om hva vi vektlegger i Brinken barnehage.

Siri Olsen

T1000016 Slagord for Brinken barnehage.

Artikkel Salhus barnehage er ein typisk nærmiljøbarnehage.Aktiv bruk av lokalmiljøet er ein viktig del av tilbodet.

Ingebjørg Gausemel

Page 35: Faceted search using Solr and Ontopia

Demo

• A prototype for Bergen kommune

Page 36: Faceted search using Solr and Ontopia

Ideas for the future

• Faceted search user-interface in Ontopoly– could be made declarative

• Incremental reindexing– requires tracking changes– usually done with a timestamp– implement last-modified field in Ontopoly

• Add optional fourth column for score boost?– a float between 0 and 1

• Ontopia extensions for interacting with Solr– JSP tag library– tolog predicates

Page 37: Faceted search using Solr and Ontopia

More demos

• Epicurious: recipe search– http://www.epicurious.com/tools/searchresults

?search=

• Flickr photo search with hierarchical facets– http://people.csail.mit.edu/dfhuynh/projects/hi

erarchical-facets/test.html

• A collection of faceted navigation examples:– http://www.flickr.com/photos/morville/

collections/72157603789246885/

Page 38: Faceted search using Solr and Ontopia

More information

• 3 Quick Design Patterns for Better Faceted Search– http://www.thingsontop.com/3-quick-patterns-b

etter-facet-design-889.html

• How to Make a Faceted Classification and Put It On the Web– http://www.miskatonic.org/library/facet-web-

howto.html

• Book: Faceted Search (Synthesis Lectures on Information Concepts, Retrieval, and Services), Daniel Tunkelang

Page 39: Faceted search using Solr and Ontopia

...is easier to find when using faceted search.

Structured semantics-rich data...