jcdl2013 mklein

18
JCDL 2013 July 24 th Indianapolis, IN 1 Martin Klein @mart1nkle1n martinklein0815@gmail. com Herbert Van de Sompel @hvdsomp [email protected] http:// www.openarchives.org/rs/ Extending Sitemaps for ResourceSync

Upload: martin-klein

Post on 08-May-2015

339 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Jcdl2013 mklein

JCDL 2013 July 24th Indianapolis, IN 1

Martin Klein@mart1nkle1n

[email protected]

Herbert Van de Sompel@hvdsomp

[email protected]

http://www.openarchives.org/rs/

Extending Sitemaps for ResourceSync

Page 2: Jcdl2013 mklein

JCDL 2013 July 24th Indianapolis, IN 2

ResourceSync Core Team

Page 3: Jcdl2013 mklein

JCDL 2013 July 24th Indianapolis, IN 3

ResourceSync Technical Group

JISC

Richard JonesGraham Klyne

Stuart Lewis

OCLC

Jeff Young

LOCKSS

David Rosenthal

RedHat

Christian Sadilek

Ex Libris Inc.

Shlomo Sanders

Library of Congress

Kevin Ford

Page 4: Jcdl2013 mklein

JCDL 2013 July 24th Indianapolis, IN 4

Synchronize

• Web resourceso things with a URI that can be dereferenced

• many/few• big/small• fast/slow

What

• Keep “in sync”• Destination (client) follows changes at a Source

(server) over time• Keep copies on different systems the same

Page 5: Jcdl2013 mklein

JCDL 2013 July 24th Indianapolis, IN 5

Two ResourceSync Capabilities

Resource List

Lists resources

subject to synchronization

Change List

Lists changes to resources

subject to synchronization

• Allow Destinations to obtain current resources• Requires URI

• Allow Destination to verify accuracy of sync’ed content• Requires lastmod and fixity information

• Allow Source to include references to additional content• Requires inclusion of links

Page 6: Jcdl2013 mklein

JCDL 2013 July 24th Indianapolis, IN 6

Entrance…. Sitemaps

• Resource List is an inventory – so is a Sitemap

• Low barrier of adoption

• Ack’ed by Google, Yahoo!, Bing

Page 7: Jcdl2013 mklein

JCDL 2013 July 24th Indianapolis, IN 7

<loc>http://example.com/res1</loc>

<lastmod>2013-07-24-T09:00:00Z</lastmod>

<url>

</url>

<url>

</url>

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9”>

</urlset>

Sitemap Format

Page 8: Jcdl2013 mklein

JCDL 2013 July 24th Indianapolis, IN 8

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9”> --- root level --- document info, lastmod, links <url> --- resource level --- fixity, change type, and other resource info, links <loc>http://example.com/res1</loc> <lastmod>2013-07-24T09:00:00Z</lastmod> </url> <url> … </url></urlset>

ResourceSync Sitemap Extensions

Page 9: Jcdl2013 mklein

JCDL 2013 July 24th Indianapolis, IN 9

Testing ResourceSync Sitemap Extensions

Series of informal experiments

1. Enhance Sitemaps with attributes and elements

2. Submit Sitemaps to Google’s Webmaster Tool

3. Evaluate immediate feedback

4. Check Google index

Concerns:

1. Rejection of ResourceSync documents due to

a. Added elements and attributes on root level

b. Added elements and attributes on resource level

2. Unwanted indexing of URIs from links vs. <loc>

Page 10: Jcdl2013 mklein

JCDL 2013 July 24th Indianapolis, IN 10

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9” xmlns:rs="http://www.openarchives.org/rs/terms/”> <rs:meta capability=”resourcelist” modified=”2013-07-24-T11:00:00Z"/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-07-24T09:00:00Z</lastmod> </url></urlset>

Sitemap Extensions Test #1

Inclusion of elements and attributes at root level

to convey: • Type of capability• Last modification date

Page 11: Jcdl2013 mklein

JCDL 2013 July 24th Indianapolis, IN 11

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9” xmlns:rs="http://www.openarchives.org/rs/terms/”> <url> <loc>http://example.com/res1</loc> <lastmod rs:change=”updated">2013-07-24T09:00:00Z</lastmod> <rs:fixity type=“md5”>a2f29dklfgj9823lksdf90sfkd</rs:fixity> <rs:mimetype>text/html</rs:mimetype> </url></urlset>

Sitemap Extensions Test #2

Inclusion of elements and attributes at resource level

to convey: • Change type• Metadata

Page 12: Jcdl2013 mklein

JCDL 2013 July 24th Indianapolis, IN 12

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9” xmlns:rs="http://www.openarchives.org/rs/terms/”> <rs:link rel=”resourcesync” href=”http://example.com/capabilitylist.xml"/> <rs:link rel=”describedby” href=”http://example.com/info-about-source.xml"/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-07-24T09:00:00Z</lastmod> </url></urlset>

Sitemap Extensions Test #3

Inclusion of links at root level to: • Navigate through the framework• Point at misc documents

Page 13: Jcdl2013 mklein

JCDL 2013 July 24th Indianapolis, IN 13

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9” xmlns:rs="http://www.openarchives.org/rs/terms/”> <url> <loc>http://example.com/res1</loc> <lastmod>2013-07-24T09:00:00Z</lastmod> <rs:link rel="duplicate" href="http://mirror.example.com/res1"/> <rs:link rel="http://www.openarchives.org/rs/terms/patch” href="http://example.com/res1-json-patch" type="application/json-patch"/> </url></urlset>

Sitemap Extensions Test #4

Inclusion of links at resource level to: • Point to related resources documents

Page 14: Jcdl2013 mklein

JCDL 2013 July 24th Indianapolis, IN 14

Results - Sitemap Extensions Test #4

As expected:

1. Child elements tolerated

2. Google indexes URI within <loc>

Unintended consequences:

3. Google indexes URIs within <rs:link>

2 & 3 together is not desired e.g.,• When mirror location is provided, URI in <rs:link>

should and URI in <loc> should not be indexed• URI in <rs:link> points at partial content

Page 15: Jcdl2013 mklein

JCDL 2013 July 24th Indianapolis, IN 15

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9” xmlns:rs="http://www.openarchives.org/rs/terms/”> <url> <loc>http://example.com/res1</loc> <lastmod>2013-07-24T09:00:00Z</lastmod> <rs:link rel="duplicate" href="http://mirror.example.com/res1"/> <rs:link rel="http://www.openarchives.org/rs/terms/patch” href="http://example.com/res1-json-patch" type="application/json-patch"/> </url></urlset>

Sitemap Extensions Test #4

Inclusion of links at resource level to: • Point to related resources documents

Page 16: Jcdl2013 mklein

JCDL 2013 July 24th Indianapolis, IN 16

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9” xmlns:rs="http://www.openarchives.org/rs/terms/”> <rs:ln rel=”resourcesync” href=”http://example.com/capabilitylist.xml"/> <rs:md capability=”changelist” modified=” 2013-07-24-T11:00:00Z"/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-07-24T09:00:00Z</lastmod> <rs:md change=”updated” type=”text/html” hash=”md5:a2f94c567f9b370c43fb1188f1f46330”/> <rs:ln rel=”duplicate” href=”http://mirror.example.com/res1"/> </url></urlset>

Summary

Page 17: Jcdl2013 mklein

JCDL 2013 July 24th Indianapolis, IN 17

http://www.openarchives.org/rs/

Page 18: Jcdl2013 mklein

JCDL 2013 July 24th Indianapolis, IN 18

Martin Klein@mart1nkle1n

[email protected]

Herbert Van de Sompel@hvdsomp

[email protected]

http://www.openarchives.org/rs/

Extending Sitemaps for ResourceSync

Thank you!