documenting metadata application profiles and vocabularies

Paul Walk

Director, Antleaf

Managing Director, Dublin Core Metadata Initiative (DCMI)

Web: http://www.paulwalk.net

Email: paul@paulwalk.net

Twitter: @paulwalk

www.antleaf.com www.dublincore.org

Sharing profiles: Documenting profiles and

vocabularies on the Web

is it more important that

application profiles are

machine-friendly, or user-

friendly?

the specific challenge:

how to manage & publish the Dublin Core

technical documentation in a more

efficient & sustainable way, making it

as user-friendly as possible while

maintaining its machine-readability

context

• DCMI publishes important technical

documentation (vocabularies,

specifications, models) on the Web

• until recently, managed in sophisticated

bespoke system:

• sources edited as XML files

• maintained in a Subversion

repository

• assembled & converted with shell

scripts and 'Ant'

• FTP to a 'staging server'

• deployed to the live server by the

server admin, on request

• essentially a "closed" system

three technologies which make the difference

1. Git• stable, sophisticated, free version control technology which is ubiquitously

supported

• github: global scale infrastructure providing git as a service

• invite contribution by 'pull request’

2. Markdown• simple, parseable but easily readable plain text format

3. Static website generators• a new class of content management system where sources are managed

locally and compiled into webpages which are then uploaded to a server

(like we used to do it in the early 90s!)

• supports distributed content-management via git

• supports long-term preservation by requiring only simple text-based

formats

• supports use of desktop authoring tools - e.g. text-editors

we are exploring how these three

technologies:

* Git/GitHub

* Markdown (with metadata “front matter”)

* static-site generators

can be harnessed together to address

our challenge

what are static site

generators?

what are static site generators?

• a different kind of web-content management system, designed to publish

content as static content to a bog-standard web-server.

• content is processed during the publishing operation, rather than when the

user requests content (although client-side Javascript still supported)

• simple command-line application to generate content and serve pages

• no database - content in semi-structured text files

components - standard to most systems

1. content-model

• folder hierarchy, text files

2. content pages

• (markdown, front-matter)

• blog type content is also often supported

3. templates (& themes)

• (with some level of basic scripting)

4. generator software

• typically a command-line script or application

5. configuration file

1. content-model

• text files arranged in folder

hierarchy

• folder hierarchy relates to URL path

structure

• filename relates to URL

2. content pages

• "front-matter" metadata

• often in YAML format like here

• main body in Markdown, arbitrary

HTML also accepted where necessary

3. templates

• can reference metadata (e.g. 'page title') from content page

• can re-use 'partial' templates (e.g. a common 'header' & 'footer')

• often in a common templating language such as HAML

• (example below is in Go's templating syntax)

= include partials/header.html .

div.row-fluid

div class="col-xs-12"

h1.page-title {{if .Draft}}[**draft**]{{end}}{{.Title}}

h2.page-title

i {{.Params.author}}, {{.Date.Format "Monday, January 02, 2006"}}

= include partials/share_buttons.html .

= include _internal/disqus.html .

= include partials/footer.html .

4. generator software

• used to generate new content:

• also used to run a local sever to see how the site will look

deployment options

• SFTP

• Rsync (over SSH)

• git commit hooks (or GitHub webhooks)

• requires the site to be built on the server, so a little more infrastructure (a

simple CGI) is required

436 known generators

https://staticsitegenerators.net

workflow

‘flipping’ the approach

old approach (single source file)

new approach (many source files, one per term)

pros and cons

• old approach (source in XML file

or similar)

• pros:

• easy to track source files (few in

number)

• easy to transform into other

machine-readable formats

• cons:

• difficult to maintain the source -

not user-friendly

• poor support for extensive free

text description

• new approach (source in

Markdown+YAML)

• pros:

• easier to for humans to read and

maintain

• good support for extensive free

text description

• easy to re-use

(partially/completely)

• cons:

• may not suit very complex

vocabularies/or profiles

simplifying curation and preservation

• version control and redundancy• synchronised repositories & distributed version control via Git

• active curation• ease of access and contribution to sources via Git

• simple & readable plain text formats (Markdown)

• "one click" deployment

• minimal deployment infrastructure• standard web-server

• text files, open formats, no database or server-side 'logic', static site

generators

• reduces broken websites

issues & challenges

1. is this still too technical for

some people who may need

to maintain a metadata

profile or vocabulary?

2. will this approach be

sophisticated enough to

document the majority of

candidate

profiles/vocabularies?

3. can we generalise this

approach to provide a

useful, re-usable tool kit for

others to adopt?

4. how do we handle

versioning? By term, or by

‘collection’ - e.g. vocabulary

or profile

versioning by term

Paul WalkDirector, Antleaf

Managing Director, Dublin Core Metadata Initiative (DCMI)

Web: http://www.paulwalk.net

Email: paul@paulwalk.net

Twitter: @paulwalk www.antleaf.com www.dublincore.org

Thank you!

documenting metadata application profiles and vocabularies

Technology

creating and maintaining metadata vocabularies for network

strategies llc taxonomy 28 august 2007copyright 2007...

formats, metadata, standards and vocabularies for...

the role of descriptive metadata & controlled vocabularies...

www.cetis.ac.ukdublin core metadata initiative...

using descriptive metadata and controlled vocabularies to...

the getty vocabularies: technical overview...

ch 6. more metadata vocabularies

towards an interoperability framework for metadata...

teresa susana mendes pereira bernardino perspectiva sobre...

why documenting research data? is it worth the extra effort?...

granule metadata model draft - august 22, 2014 … · web...

florida crash metadata (4/12/2011)...florida crash metadata...

research vocabularies australia: vocabularies as a national...

documenting scientific workflows: the metadata ...1 by david...

semantic technologies for spatial infrastructures ·...

authority descriptions agris vocabularies classifies...

modeling the complexity of music metadata in semantic...

the nsdl registry diane hillmann jon phipps. what we’re...

documenting to preserve your data: metadata in support of...