semantic search with topic maps

21
Towards Semantic Search with Topic Maps Lars Marius Garshol <[email protected]> TMRA 2009, November 12, Leipzig

Upload: lars-marius-garshol

Post on 15-Jan-2015

1.345 views

Category:

Technology


2 download

DESCRIPTION

A description of a possible approach to a true semantic search based on Ontopia and Topic Maps, presented at TMRA 2009.

TRANSCRIPT

Page 1: Semantic Search with Topic Maps

Towards Semantic Search with Topic Maps

Lars Marius Garshol

<[email protected]>

TMRA 2009, November 12, Leipzig

Page 2: Semantic Search with Topic Maps

2

What this talk is about

• Basically, moving from full-text search to a more semantic form of search

• if the user types “hotels Leipzig” can we do something more than look for documents containing these two words?

• for example, can we turn this into “find hotels located in Leipzig”?

• It describes some personal experiments with new approaches• what is described here needs more work

Page 3: Semantic Search with Topic Maps

3

Two kinds of search

• Web-wide search and site-wide search• these two are not the same kind of search• the former means searching everything• the second means searching in a limited domain

• This proposal only deals with site-wide search• to make it work for web-wide search is hard• so we don’t do that

Page 4: Semantic Search with Topic Maps

4

Two (other) kinds of search

• Natural language search• where users put questions to the machine, typically using something

approaching complete sentences• users are assumed to be at least somewhat familiar with the domain

• Web-site search• users behave unpredictably• users do not necessarily know the domain• users are unaware of what search technology is used• users cannot be trained

Page 5: Semantic Search with Topic Maps

5

Algorithm

• (1) Parse query into a list of tokens• categorize tokens as “instance”, “topic type”, “unknown”, ...

• (2) Build an interpretation from the token list• the interpretation is a tolog query• if none found, fall back to full-text

• (3) Verify interpretation against schema• if one is present, that is

• (4) Run chosen interpretation, present results• also present interpretation, so the user knows what is happening• allow the user to override and fall back to normal full-text search

Page 6: Semantic Search with Topic Maps

6

Tokens

• The types of tokens are:• T topic type (e.g., “person”)• I instance topic (e.g., “Lars Marius Garshol”)• A association type (e.g., “employed by”)• ? unrecognized word (e.g., “TMRA”)

• For example, the search “hotels Leipzig” would typically be parsed into to the following list of tokens

• T hotel, topic type• I Leipzig, instance of city

Page 7: Semantic Search with Topic Maps

7

Example: a photo topic map

• I use a topic map to organize my digital photos

• it now holds ~13,000 photos• online at http://www.garshol.priv.no/tmphoto/

• A web application is used for search and navigation

• I’ve added the semantic search to this application for demonstration purposes

PhotoPerson

Event

Category

Location

Page 8: Semantic Search with Topic Maps

8

Page 9: Semantic Search with Topic Maps

9

Page 10: Semantic Search with Topic Maps

10

Hierarchies

• In many cases, the generic “I” interpretation is too simplistic• none of the Sam Oh photos are marked as being taken in Canada, they are all

marked as being taken in places that are contained in Canada• this is a very common case

• Solved by using ontology annotation• Kal Ahmed has published a set of PSIs for indicating hierarchical association

types• these are used by the Ontopia tools, at least• these can be used to pick up hierarchical association types and extending the

interpretation of “I” terms to handle them

Page 11: Semantic Search with Topic Maps

11

Page 12: Semantic Search with Topic Maps

12

Page 13: Semantic Search with Topic Maps

13

Hotel Europa is in Montreal.Ste Brigitte des Saults is onthe road between Montrealand Quebec City.

Page 14: Semantic Search with Topic Maps

14

Page 15: Semantic Search with Topic Maps

15

Verifying the interpretation

• Not all interpretations can actually produce results

• for example, “puccini tenor” does not work, because no topics are related to both

• We can actually work this out, based on the schema, because

• there is no topic type to which both composers and voice types can be related

• studying the schema will tell us this

• Studying the schema also helps us explain the interpretation to the user

Sam Oh

person photo

Montréal

person location

location

Page 16: Semantic Search with Topic Maps

16

How to use this with your topic map

• Install the component, then search• No configuration is necessary!• However, for better results you may want to

• add more names for some topics• mark hierarchical association types as such (should be done already)• mark topic types with large instance sets as such

Page 17: Semantic Search with Topic Maps

17

Current implementation

• Just a Jython script using Ontopia• 541 lines• builds a set of token objects, then a set of constraint objects• then introspects the schema to remove hopeless constraints

• Stemming is still missing!• need to modify Ontopia full-text search to do this

• Run from a JSP file by means of the Jython API• just 10-15 lines of glue code

• Longer-term this may turn into a proper Ontopia component• time horizon not at all clear

Page 18: Semantic Search with Topic Maps

18

Weaknesses

• No relevance ranking• given “beer Oslo”, all found photos are equally closely tied to “beer” and “Oslo”• there is nothing to rank their relevance by• on the other hand, all hits are definitely relevant to the query as given

• Homonym support too simplistic• it’s not clear that it will actually handle all cases in practice• a better approach would be to construct multiple interpretations and then choose

between them• ideally the user should be allowed to override the choice

• Very closely tied to topic map structure• if the user uses the wrong terms, the approach does not work• only allows structured searches along the dimensions actually in the topic map• how much of an issue this is is likely to depend on the application

Page 19: Semantic Search with Topic Maps

19

Do users actually query this way?

• Literature studies and log mining indicate that:• nearly all queries are just 1 or 2 words• 2-word queries tend to be either

• the name of a entity (New York), or• qualified searches (Montréal city)

• Conclusion• this feature has to be used with caution• it may work best when users can be told about it• site feedback may encourage users to use it more

• More work is needed on this

Page 20: Semantic Search with Topic Maps

20

Taking this further

• Limitations• so far all queries use a single variable• no understanding of association types• no understanding of occurence types• no notion of ordering (first, last, biggest, smallest, ...)

• This can be implemented• an earlier prototype could interpret queries such as “operas based on works

written by Shakespeare”• other elements also implementable

• However, this takes the system further away from normal user searches

• more thinking needed on how to handle this• make it a semi-formal language?• turn it into a full natural language search component?

Page 21: Semantic Search with Topic Maps

21

Conclusion

• The system really does have a kind of semantic understanding• you type “beer Oslo”, and it says “I think you want photos of beer taken in Oslo”

• Easy to implement• no configuration necessary• component can be plugged into any web application based on Ontopia• (also easy to implement on top of other Topic Maps engines)

• Does not match current user behaviour• more work necessary on this

• Not as advanced as it could be• single-variable queries only• no understanding of association types• more work to be done on this, too