re-usable metadata, re-usable content
Post on 13-Apr-2017
1.923 Views
Preview:
TRANSCRIPT
UKOLN is supported by:
Re-usable metadata, re-usable content
Paul WalkTechnical Managerp.walk@ukoln.ac.uk
A centre of expertise in digital information management
www.ukoln.ac.uk
harvesting, searching, syndicating
• options for metadata and content:
• the lines can be blurred– search engines also harvest!
• your metadata may be my content
metadata content
harvestable
searchable ✓ ✓syndicable ✓
being harvestable (1)
• Open Archives Initiative– OAI-PMH
– repositories
– OAI-ORE
• aggregators:• Intute Institutional Repository
Search– currently harvesting eprints
metadata records from 88 institutions
– planning to explore the harvesting of metadata for:
• images• learning objects• other media.....
• MLA’s Discover Service– your content is of interest to other
domains
being harvestable (2)
• what is your metadata record actually going to point to?– more than one item of content?
– a ‘jumping off’ page?
– is this consistent?
• what metadata format are you going to use?– is it commonly supported?
– are you using it correctly? (you’d be surprised.....)
• where/how is your metadata going to be used?– this is necessarily out of your control!
being searchable (1)
• exposing your content to search engines
• search engine optimisation (SEO)
– make it easy for the search engines
– have content people want
– make it eminently linkable
• Google is your friend!– SiteMaps - describe your content in
ways Google can understand
– OAI-PMH interface can be treated as a SiteMap
being searchable (2)
• Z39.50– from the library domain
– allows the target to participate in a cross search
– very mature, very widely deployed
– not a web protocol
• SRU– web-ified Z39.50
– ReSTful
– Common Query Language (CQL)
• SRW– as above, but for heavier SOA/Web Services use
• OpenSearch– piggyback on RSS/Atom
being searchable (3)
• search portals
• community portals
• institutional portals/VLEs
be syndicable, enable re-use by 3rd parties
• consider RSS (and the Atom syndication format)– in some ways the lingua franca of Web 2.0
– machine and human friendly
– surprsing how much content lends itself to this structure
• RSS2.0 can also ‘enclose’ binary data– syndicating podcasts
• “the coolest use of your data will be thought of by someone else”• be mashup friendly:
– addressable content
– cool URLs
– simple formats
– aspire to APIs that need no documentation!
human and machine interfaces (1)
• they’re completely different....right?
• well, not necessarily– RSS!
– OAI-PMH with a CSS stylesheet referenced from the XML
human and machine interfaces (2)
• ‘screen-scraping’ is back in fashion
• plain old semantic HTML (POSH)
• linked-data (the semantic web with a small ‘s’)
• the web of data is imminent!
future design: taking a REST from service provision
• the resource-oriented-architecture
• ReST:– resources with cool URLs
– 4 HTTP verbs: get, put, post & delete
– CRUD for the Web (create, retrieve, update, delete)
• make everything addressable with URLs• be cool!
– make the URLs persistent
– make them human-parsable
– e.g.• http://www.myserver.com/gallery/collections/pictures/image_0001.jpg
– is better than:• http://www.myserver.com/gallery.php?collection_id=7&item_id=0001
my suggestions
• using web protocols
• make content addressable - and persistently so
• reduce barriers to third-parties developing other (competing!?) UIs– are our UIs really just ‘gateways’ to information (implying that there is a wall around that
information)
• making the machine APIs the heart of our services– a good design principle is to use the machine API as the API used by our own user-
interfaces
– we just can’t know for sure all the ways in which our information services might be used
acknowledgements
• in preparation for this presentation, I blogged about giving this presentation and asked my readers:
– “Aside from the obvious stuff like OAI-PMH, Google, RSS, what should I be talking about? Persistent identifiers? Cool URLs? Any other suggestions?”
• 6 responses - all containing great suggestions which I have incorporated into this presentation, from the following people:
– Jim Downing, Owen Stephens, Ian Ibbotson, Pete Johnston, Mike Ellis
• thanks!!
• you can read all of the comments, and find links/addresses for these people on my blog at:
– http://blog.paulwalk.net/2008/02/11/making-digitised-content-available-for-searching-and-harvesting/
comments
• Ian Ibbotson said:– It’s very hard to engineer a consistent search user interface when half the metadata refers
to the actual digital artefact, and half to a front page. It’s useful to have both links, as you can then negotiate with providers if they feel you need to go through a front page for stats and marketing....
• Pete Johnstone said:– a shift away from the “repository” towards the “collection” or “collections” (which I think is
the consequence of a more “resource-oriented view”)
• Owen Stephens said:– Integration of resources into the wider web - e.g. LoC experiment with Flickr to expose
content. Many projects in this area create a new silo of material that is hidden from the wider web [...] reusable metadata as well as objects.
• Jim Downing said:– ....making the content reusable (not a hard sell in eLearning?). Recent use of RDF and
Atom in a cultural setting: Asemantics BBC aggregator
• Mike Ellis said:– ....RSS, and possibly “programmable” RSS (for example, surfacing search results by
adding query parameters to the feed address, etc)....
questions?
top related