tmsync: synchronizing topic maps

21
http:// www.ontopia.net/ © 2006 Ontopia AS 1 TMSync Topic map-to-topic map updates Lars Marius Garshol CTO, Ontopia <[email protected]> TMRA 2006 2006-10-11

Upload: lars-marius-garshol

Post on 15-Jan-2015

738 views

Category:

Technology


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: TMSync: Synchronizing topic maps

http://www.ontopia.net/© 2006 Ontopia AS 1

TMSync

Topic map-to-topic map updates

Lars Marius GarsholCTO, Ontopia

<[email protected]>

TMRA 2006

2006-10-11

Page 2: TMSync: Synchronizing topic maps

http://www.ontopia.net/© 2006 Ontopia AS 2

Agenda

• Background– the problem– why TMSync is the solution

• TMSync in detail– what it is– how it works

• Applications– what you can do with TMSync

• Conclusion

Page 3: TMSync: Synchronizing topic maps

http://www.ontopia.net/© 2006 Ontopia AS 3

Background

The problem

Solving it with TMSync

Page 4: TMSync: Synchronizing topic maps

http://www.ontopia.net/© 2006 Ontopia AS 4

The problem

• Topic Maps hold out a promise as a great technology for data integration

– because of merging, global identifiers, etc

• However, dynamic sources are poorly supported at the moment– that is, converting once is easy, but staying in sync is hard

• A solution that only supports static integration is near-worthless– in practice, integrated data is nearly always going to need updating from the

source– building a one-time conversion is easy– building data integration with update support is hard– so, suddenly data integration with Topic Maps isn’t so easy, after all

Page 5: TMSync: Synchronizing topic maps

http://www.ontopia.net/© 2006 Ontopia AS 5

Merging is not the solution

• Merging in Topic Maps is often thought of in terms of <mergeMap>

– this is only useful if you are working from XTM files– <mergeMap> only has an effect when the XTM file is loaded– after that, the only way to use the <mergeMap> is to reload from scratch– reloading from scratch loses all changes...

• Real applications are based on databases– here <mergeMap> has no effect

Page 6: TMSync: Synchronizing topic maps

http://www.ontopia.net/© 2006 Ontopia AS 6

What TMSync is

• A simple way to update part of one topic map with part of another– define which part of the target topic map you want,– define which part of the source topic map it is the master for, and– the algorithm does the rest

Page 7: TMSync: Synchronizing topic maps

http://www.ontopia.net/© 2006 Ontopia AS 7

If the source is not a topic map

• Simply do a normal one-time conversion– let TMSync do the update for you

• In other words, TMSync reduces the update problem to a conversion problem

source.xml

convert.xslt TMSync

Page 8: TMSync: Synchronizing topic maps

http://www.ontopia.net/© 2006 Ontopia AS 8

TMSync in depth

What it is

How it works

Page 9: TMSync: Synchronizing topic maps

http://www.ontopia.net/© 2006 Ontopia AS 9

TMSync in mathematical terms

• A function that given– a target topic map,– a source topic map,– a topic selector for the target map (a function),– a characteristic selector for the target map (a function),– a topic selector for the source map (a function),– a characteristic selector for the source map (a function),

• produces an updated target map

Page 10: TMSync: Synchronizing topic maps

http://www.ontopia.net/© 2006 Ontopia AS 10

Mathematical specification

• Currently based on the Q model[1]– mainly because this was the only model in existence when I started working

• Will translate to the TMRM– since this is better-known, and now has a TMDM mapping

[1] Q: A Model for Topic Maps,

http://www.ontopia.net/topicmaps/materials/quads.html

Page 11: TMSync: Synchronizing topic maps

http://www.ontopia.net/© 2006 Ontopia AS 11

The selection process

name

name

occurrence

occurrence

occurrence

Page 12: TMSync: Synchronizing topic maps

http://www.ontopia.net/© 2006 Ontopia AS 12

The update process

name

name

occurrence

occurrence

occurrence

NAME

name

occurrence

bar

occurrence

NAME

bar

Page 13: TMSync: Synchronizing topic maps

http://www.ontopia.net/© 2006 Ontopia AS 13

How to configure the algorithm

• How to specify the topics– use a query– this gives great flexibility, while keeping the algorithm simple– it also means that we can efficiently find the set of topics to work on

• How to specify the characteristics– use a query, again, or– use a set of types, or– ...

Page 14: TMSync: Synchronizing topic maps

http://www.ontopia.net/© 2006 Ontopia AS 14

What the algorithm does

• For each topic in the sync’ed fragment– remove all sync’ed characteristics not in the source

• except associations to non-sync’ed topics

– add all characteristics in the source that are not in the target– leave the rest alone

• Remove and add topics in the same way

Page 15: TMSync: Synchronizing topic maps

http://www.ontopia.net/© 2006 Ontopia AS 15

Applications

City of Bergen

US Publisher

Page 16: TMSync: Synchronizing topic maps

http://www.ontopia.net/© 2006 Ontopia AS 16

The City of Bergen

LivsIT

Service

Unit Person

City of Bergen

LivsIT

Norge.no

Page 17: TMSync: Synchronizing topic maps

http://www.ontopia.net/© 2006 Ontopia AS 17

City of Bergen configuration

• On the source side– query to get all instances of “category” and “keyword”– accept all characteristics

• On the target side– query to get all instances of “category” and “keyword”

• except those with mark-as-local associations

– accept all characteristics except local search name and mark-as-local

Page 18: TMSync: Synchronizing topic maps

http://www.ontopia.net/© 2006 Ontopia AS 18

Nameless US publisher

• Use an automated process to classify documents– documents get reclassified now and then– output of process is an XTM document

• If documents did not get reclassified, import would be enough– as it is, they use TMSync

classified.xtm

TMSync

Page 19: TMSync: Synchronizing topic maps

http://www.ontopia.net/© 2006 Ontopia AS 19

Conclusion

Related work

Further work

Page 20: TMSync: Synchronizing topic maps

http://www.ontopia.net/© 2006 Ontopia AS 20

Related work

• RDFSync– algorithm to synchronize two RDF graphs efficiently– no business case focus

• TM-Views– one possible way to define fragments for update

• TMRAP– uses TMSync for the update-topic request

Page 21: TMSync: Synchronizing topic maps

http://www.ontopia.net/© 2006 Ontopia AS 21

Further work

• Reformulate algorithm to TMRM instead of Q– this will be done in the paper submitted to the proceedings

• Improve algorithm to handle delta sets– that is, to only need information about what has changed since last in the

source– this should not be very difficult– may do this for the final paper