documenting metadata application profiles and vocabularies
TRANSCRIPT
Paul Walk
Director, Antleaf
Managing Director, Dublin Core Metadata Initiative (DCMI)
Web: http://www.paulwalk.net
Email: [email protected]
Twitter: @paulwalk
www.antleaf.com www.dublincore.org
Sharing profiles: Documenting profiles and
vocabularies on the Web
is it more important that
application profiles are
machine-friendly, or user-
friendly?
the specific challenge:
how to manage & publish the Dublin Core
technical documentation in a more
efficient & sustainable way, making it
as user-friendly as possible while
maintaining its machine-readability
context
• DCMI publishes important technical
documentation (vocabularies,
specifications, models) on the Web
• until recently, managed in sophisticated
bespoke system:
• sources edited as XML files
• maintained in a Subversion
repository
• assembled & converted with shell
scripts and 'Ant'
• FTP to a 'staging server'
• deployed to the live server by the
server admin, on request
• essentially a "closed" system
three technologies which make the difference
1. Git• stable, sophisticated, free version control technology which is ubiquitously
supported
• github: global scale infrastructure providing git as a service
• invite contribution by 'pull request’
2. Markdown• simple, parseable but easily readable plain text format
3. Static website generators• a new class of content management system where sources are managed
locally and compiled into webpages which are then uploaded to a server
(like we used to do it in the early 90s!)
• supports distributed content-management via git
• supports long-term preservation by requiring only simple text-based
formats
• supports use of desktop authoring tools - e.g. text-editors
we are exploring how these three
technologies:
* Git/GitHub
* Markdown (with metadata “front matter”)
* static-site generators
can be harnessed together to address
our challenge
what are static site
generators?
what are static site generators?
• a different kind of web-content management system, designed to publish
content as static content to a bog-standard web-server.
• content is processed during the publishing operation, rather than when the
user requests content (although client-side Javascript still supported)
• simple command-line application to generate content and serve pages
• no database - content in semi-structured text files
components - standard to most systems
1. content-model
• folder hierarchy, text files
2. content pages
• (markdown, front-matter)
• blog type content is also often supported
3. templates (& themes)
• (with some level of basic scripting)
4. generator software
• typically a command-line script or application
5. configuration file
1. content-model
• text files arranged in folder
hierarchy
• folder hierarchy relates to URL path
structure
• filename relates to URL
2. content pages
• "front-matter" metadata
• often in YAML format like here
• main body in Markdown, arbitrary
HTML also accepted where necessary
3. templates
• can reference metadata (e.g. 'page title') from content page
• can re-use 'partial' templates (e.g. a common 'header' & 'footer')
• often in a common templating language such as HAML
• (example below is in Go's templating syntax)
= include partials/header.html .
div.row-fluid
div class="col-xs-12"
h1.page-title {{if .Draft}}[**draft**]{{end}}{{.Title}}
h2.page-title
i {{.Params.author}}, {{.Date.Format "Monday, January 02, 2006"}}
{{.Content}}
= include partials/share_buttons.html .
= include _internal/disqus.html .
= include partials/footer.html .
4. generator software
• used to generate new content:
• also used to run a local sever to see how the site will look
deployment options
• SFTP
• Rsync (over SSH)
• git commit hooks (or GitHub webhooks)
• requires the site to be built on the server, so a little more infrastructure (a
simple CGI) is required
436 known generators
https://staticsitegenerators.net
workflow
‘flipping’ the approach
old approach (single source file)
new approach (many source files, one per term)
pros and cons
• old approach (source in XML file
or similar)
• pros:
• easy to track source files (few in
number)
• easy to transform into other
machine-readable formats
• cons:
• difficult to maintain the source -
not user-friendly
• poor support for extensive free
text description
• new approach (source in
Markdown+YAML)
• pros:
• easier to for humans to read and
maintain
• good support for extensive free
text description
• easy to re-use
(partially/completely)
• cons:
• may not suit very complex
vocabularies/or profiles
simplifying curation and preservation
• version control and redundancy• synchronised repositories & distributed version control via Git
• active curation• ease of access and contribution to sources via Git
• simple & readable plain text formats (Markdown)
• "one click" deployment
• minimal deployment infrastructure• standard web-server
• text files, open formats, no database or server-side 'logic', static site
generators
• reduces broken websites
issues & challenges
1. is this still too technical for
some people who may need
to maintain a metadata
profile or vocabulary?
2. will this approach be
sophisticated enough to
document the majority of
candidate
profiles/vocabularies?
3. can we generalise this
approach to provide a
useful, re-usable tool kit for
others to adopt?
4. how do we handle
versioning? By term, or by
‘collection’ - e.g. vocabulary
or profile
versioning by term
Paul WalkDirector, Antleaf
Managing Director, Dublin Core Metadata Initiative (DCMI)
Web: http://www.paulwalk.net
Email: [email protected]
Twitter: @paulwalk www.antleaf.com www.dublincore.org
Thank you!