metadata in practice: issues of interoperability, quality, and standardization
DESCRIPTION
Final paper for INFO653: Digital LibrariesTRANSCRIPT
Metadata in Practice: Issues of Interoperability, Quality, and Standardization 1
Metadata in Practice:
Issues of Interoperability, Quality, and Standardization
Melissa Ormond
Drexel University
2010‐2011 Summer
Dr. Xia Lin
Metadata in Practice: Issues of Interoperability, Quality, and Standardization 2
ABSTRACT
This paper examines the high importance of metadata in both a digital library and as part of the World
Wide Web. Special focus is placed on the goals of metadata, detailed versus simple digital identification,
and embedded metadata and information retrieval. I will examine both metadata scheme
standardization such as Dublin Core, as well as the importance of interoperability and application
profiles. I will evaluate multiple metadata schemes such as Dublin Core in the World Wide Web,
automatic metadata generation such as that of the NSDL and OAI‐PMH, and author generated metadata
such as that found in DSpace. I will look at the issue of digital archiving and preservation and the
importance of metadata in ensuring both. Finally, I will examine the issue of metadata evaluation and
metrics for measuring. As the evolution of technology continues to change, the importance of how these
resources are described and managed will take on a new importance. “The rapid changes in the means
of information access occasioned by the emergence of the World Wide Web have spawned an upheaval
in the means of describing and managing information resources. Metadata is a primary tool in this work,
and an important link in the value chain of knowledge economies” (Duval et al., 2002). The future
mission of LIS professionals will be to determine the best means in which this metadata can be
integrated.
Metadata in Practice: Issues of Interoperability, Quality, and Standardization 3
1. INTRODUCTION
In order to fully understand both the nature of metadata and its vital importance, one must first
understand the environment in which metadata plays its most important role – the digital library. The
Dictionary for Library and Information Sciences (2004) states that a digital library is, “a library in which a
significant proportion of the resources are available in machine‐readable format (as opposed to print or
microform), accessible by means of computers…the digital content may be locally held or accessed
remotely via computer networks” (Reitz, 2004). While this definition may be suitable to the
generalization of a digital library, it gives no direction as to the function or mission of a digital library.
Candela et al., (2007) argue that a digital library can “represent the meeting point of many disciplines
and fields, including data management, information retrieval, library sciences, document management,
information systems, the web, image processing, artificial intelligence, human‐computer interaction,
and digital curation” (Candela et al., 2007). In their article, “What is a Digital Library Anymore, Anyways,”
Lagoze et al. (2005) argue that a digital library is more than just accessibility of many disciplines. “There
seems to be a belief that a digital library is just about search and access. These functions are indeed
essential (and remain challenging), but they are just part of an information environment” (Lagoze et al.).
They went on to provide an expanded view of a digital library, one in which not only provides a
collection of materials relevant to the library’s mission, but furthermore provides the medium to be
both collaborative and contextual. The digital library must move past the “search and access” function
and instead work to “create a rich, asynchronous workplace in which information is shared, aggregated,
manipulated, and refined” (Lagoze et al.).
2. DIGITAL V. TRADITIONAL LIBRARIES
Helping users make a conceptual association is one of the most important, yet difficult, services
of a digital library. In a physical library, the spatial arrangement of items, such as books, conveys
meaning; i.e. associated items by subject. It is commonplace to visit a traditional library and browse a
Metadata in Practice: Issues of Interoperability, Quality, and Standardization 4
particular subject area to locate similar materials. Physical library metadata, in the form of indexes such
as the Dewey decimal classification and classification schemes such as that of the Library of Congress,
ensure spatial arrangement of these physical materials. In fact, the spatial arrangement can itself be
viewed as a form of metadata (Nurberg et al., 1995). When it comes to digital libraries the relevant
issues regarding spatial arrangement must be resolved for a fully digital environment. How can the
organization afforded by physical library metadata be translated into a digital environment where there
are no shelves to arrange related materials? This is where the use of digital metadata plays a vital role.
“While spatial arrangement of library materials is a physical library metadata element with physical
presence, other metadata with no direct physical reality must also be translated, or adapted in its
application, if it is to be used in a digital library” (Nurberg et al., 1995).
However, this issue of spatial arrangement is not the only issue central to a digital library, and
not the only issue that is influenced by metadata. Digital libraries must also address how to digitize
items and store them online, how to include new forms of medium such as images, audio, and video,
how to locate information, when to use existing technologies, and how to deal with information
overload (Nurnberg et al., 1995). While digital libraries share both some of the same issues and goals of
a physical library, they have the extra burden of determining how to “adapt the tradition of the physical
library into the digital realm” (Nurnberg et al., 1995). Besides those listed above, a central problem with
a digital library translation is the ideal of archiving and preservation. “If physical libraries primarily
contain physical data and digital libraries primarily contain digital data, then how can digital libraries
preserve and disseminate the vast amounts of existing physical data?” (Nurnberg et al., 1995). According
to Nurnberg et al., the answer is fairly straightforward. Since digital libraries cannot contain the physical
content they can instead “contain digital translation of this data” (Nurnberg et al., 1995). These succinct
surrogate records of digital data can be found in the form of digital metadata.
Metadata in Practice: Issues of Interoperability, Quality, and Standardization 5
3. WHAT IS METADATA
Before we begin discussing the implications of metadata in the digital library environment, we
must first answer the fundamental question of, what is metadata. Metadata is data that is both
machine‐readable and descriptive. Digital metadata serves many purposes including; resource
discovery, management, delivery, and preservation to name just a few. Basically, metadata is data about
data. It is information that describes content. Metadata is a traditional way, along with subject indexing
and classification, to connect subjects, topics, and documents. The purpose of metadata is to allow users
to easily find the information that they need. Metadata is needed to improve retrievability of electronic
resources and to provide sufficient and appropriate description of their content so that users can choose
among different resources that might appear similar.
In order to improve retrievability of resources, metadata must first efficiently describe the
digital resource in question. This is done in three distinct ways; first, the metadata must describe the
content of the resource, primarily what is the resource about. Secondly, the metadata must describe the
context of the resource. This includes essential questions such as who, what, where, why, and how; any
aspect associated with the creation of the resource. And finally, the metadata must describe the
structure of the resource, including form (Gill et al., 2008). The issue of retrievability comes to light once
an item has been properly described using metadata. An item with effective metadata can be properly
organized for later retrieval. However, the question must b e asked ‐ what makes effective metadata? Is
metadata only effective when all of the possible metadata element fields are completed?
3.1. DETAILED V. SIMPLE
To begin to answer the above question, let’s start with an analysis of detailed metadata
descriptions v. simple metadata descriptions. Detailed metadata descriptions, while they may improve
information retrieval, require not only trained staff to assign the metadata, but increased cost
Metadata in Practice: Issues of Interoperability, Quality, and Standardization 6
associated with the metadata development. Detailed metadata can also lead to issues with consistent
and standard metadata – as a rule it is typically easier to standardized simple metadata. Besides ease in
consistency, simple metadata is both less costly and provides a greater probability of interoperability.
However, simple metadata is not without its flaws, the greatest being an increased chance of false
results during information retrieval, due to less detailed and specific search parameters. Whether
simple or detailed, in the end, “the richness of metadata descriptions will be determined by policies and
best practices designated by the agency creating the metadata” (Duval et al., 2002). Designing some
form of ideal metadata, that which is standardized and allows for greater interoperability, requires a
high degree of flexibility on the part of the agency creating the metadata. However, if the history of
metadata has taught us anything, it is that there is no easy answer, or solution, to the issue of digital
content and information retrieval.
3.2 ISSUES WITH METADATA
Early forms of metadata such as MARC (Machine‐Readable Catalog) and AACR2 are still
predominate metadata in the physical library world, however when it comes to the digital library world
embedded metadata has become the new standard. The Dublin Core Element Set (DC) was developed in
1995 as a means in which to improve indexing of search engines by embedding metadata elements into
web pages or encoding through the use of XML (Huddleston, 2008). The DC consists of 15 core
elements, including contributor, creator, date, and subject. Although the Dublin Core variants seem
simple enough, they are in reality very complex in nature. According to Arms (2000), “simplicity is both
the strength and the weakness of the Dublin Core. Whereas traditional cataloging rules are long and
complicated, requiring professional training to apply effectively, the Dublin Core can be described
simply. However, simplicity conflicts with precision” (Arms, 2000, Chapter 10). The perceived simplicity
of the core elements leads many to believe that DC metadata can simply be created by untrained staff,
Metadata in Practice: Issues of Interoperability, Quality, and Standardization 7
however this minimalist views seems to have been initially supported. “The initial aim [of the Dublin
Core Element Set] was to create a single set of metadata elements for untrained people who publish
electronic materials to use in describing their work” (Arms, 2000, Chapter 10).
While this minimalist view is still held by many, some professionals feel that a more structured
metadata, one that is managed through controlled cataloging rules, is the better option. What a more
structured metadata can provide is increased standardization, precision, and interoperability, but at
what cost? Not only would the structured form be more complex in nature, therefore requiring greater
staff training, but the standardization process would be both time consuming and costly. As an example,
the IEEE‐LOM standard metadata took five years between the initial specification and the finished
product; five years of meetings, testing, and evaluations (Duval et al., 2002). Arms (2000) argues for a
proposed strategy that does not favor one option over another, but rather utilizes both. “The minimalist
option will meet the original criterion of being usable by people who have no formal training…the
structured option will be more complex, requiring fuller guidelines and a trained staff” (Arms, 2000,
Chapter 10). While this proposed strategy does seem optimal, there are still many issues to consider,
including standardized adoption of policies and cataloging rules, as well as how to manage the addition
of new elements.
One such way in which to manage this addition is through the use of application profiles. “An
application profile is an assemblage of metadata elements selected from one or more metadata
schemas and combined in a compound schema. Application profiles provide the means to express
principles of modularity and extensibility” (Duval et al., 2002). To put this in laymen’s terms, the
argument claims that it is impossible to have one single metadata format (with elements) that could
accommodate all digital library applications. What an application profile does is allow “designers to ‘mix
and match’ schemas as appropriate…while retaining interoperability with the original base schemas”
Metadata in Practice: Issues of Interoperability, Quality, and Standardization 8
(Duval et al., 2002). In order to achieve this flexibility, application profiles must both enforce constraints
on element appearances, while defining the interrelationships between the constrained elements (Duval
et al., 2002). What it comes down to, is the main goal of applying application profiles is to increase
interoperability between metadata standards.
4. METADATA INTEROPERABILITY
In order to continue let’s first define what is meant by metadata interoperability. According to
Duval (2001), interoperability is “enabling information that originates in one context to be used in
another in ways that are as highly automated as possible” (Duval, 2001). In regards to metadata records,
this refers to mechanisms that enable two or more metadata schemes to “cross boundaries of context”
(Duval, 2001). To end users, metadata interoperability allows the searcher to cross search among
multiple repositories whose metadata can ‘speak to each other,’ subsequently, “preventing end users
from being locked into proprietary systems” (Duval, 2001). While the benefit of metadata
interoperability seems straightforward, achieving this interoperability is not so simple. As long as digital
libraries continue to act as standalone repositories, with their own mission, policies, and metadata
standards, how could LIS professionals ever hope to achieve metadata interoperability? While the goal
of application profiles is to alleviate some of the issues surrounding interoperability, this is not always
the case. How can we move beyond one single metadata standard without compromising
interoperability?
According to Manduca (2006), whose article focused on educational digital libraries, to achieve
an integrated network “requires knowledge of the breadth of resources and the breadth of user
communities” (Manduca et al., 2006). A good example of this focus on user community can be seen
through the IEEE LOM data model (Learning Object Metadata). While some professionals were focused
on the Dublin Core and W3C’s development of the Resource Description Framework (1999), a standard
for the semantic web that allows software to navigate Web content to link data, the learning community
Metadata in Practice: Issues of Interoperability, Quality, and Standardization 9
was busy developing its own standards (Duval et al., 2002). The IEEE LOM, while similar to the Dublin
Core, contains its own elements and syntax. The basic IEEE LOM Metadata Scheme contains a hierarchy
of elements. The first level of the hierarchy consists of nine categories, including, lifecycle, technical, and
classification. Each of the nine categories contains sub‐elements, a hierarchy of attributes, or data
elements, that further describe a learning objective. Because of the complexity of the IEEE LOM
metadata scheme, it is necessary to utilize the services of a metadata expert, although the standard
itself lacks any definition as to whom or what is responsible for creating the metadata attributes.
Back to the issue of interoperability, with the development of multiple different metadata
standards and repositories, such as the IEEE LOM, there became a need in the LIS community to find
ways in which to aggregate metadata from different repositories into one place that could be easily
searched (Ward, 2003). Out of that complex need came the development of the OAI‐PMH (Open
Archives Initiative Protocol for Metadata Harvesting). The function of the OAI‐PMH is twofold; first,
administrators of digital libraries participating in the protocol (data providers) must expose Dublin Core
metadata via OAI‐PMH. Secondly, service providers take the exposed metadata and harvest the data
(Ward, 2003). This twofold action is meant to facilitate “cross‐domain resource discovery and digital
library interoperability” (Ward, 2003). What service providers discovered during the harvesting process
was that most data providers used only a select few of the Dublin Core elements, such as ‘creator’ and
‘identifiers,’ and that this lack of standardization on the part of the data provider resulted in greater
time and costs on the part of the service provider to subsequently fill in the blanks (Ward, 2003).
One such digital repository that has benefited from the use of OAI‐PMH is the National Science
Digital Library (NSDL), a national network of digital libraries focused on science, technology, engineering,
and mathematics. The NSDL is funded by the National Science Foundation (NSF) and its metadata
repository is based on the Dublin Core. “Dublin Core metadata records, which include URLs to the
corresponding digital resources, are ingested into the MR via OAI‐PMH. As part of the ingest process,
Metadata in Practice: Issues of Interoperability, Quality, and Standardization 10
these records are processed to normalize dates and various controlled vocabulary elements” (Lagoze et
al., 2005).Based on the complexity of the NSDL it became obvious that utilizing a standardized metadata
scheme would be virtually impossible. Because of this, the NSDL accommodates a wide variety of
metadata standards, including “a broad spectrum of metadata quality, anticipating a wide variety of
errors or inconsistencies” (Hillmann et al., 2004). However, the acceptance of inconsistencies is not the
only issues plaguing the NSDL. Further issues that became relevant from the metadata harvest included
a wide variety of data failures including missing data (blank elements), incorrect or confusing data, and a
lack of controlled vocabulary (Hillmann et al., 2004). While the use of Dublin Core and OAI‐PMH has
allowed the NSDL to provide “basic digital library services, [it] has also revealed a number of
implementation problems. The most outstanding of these relate to metadata quality and OAI‐PMH
validity, especially XML‐schema compliance” (Lagoze et al., 2005).
5. HOW TO DETERMINE QUALITY
In order for metadata to be truly effective, it must be of the highest quality, however how does
one evaluate what constitutes high quality metadata? In their 2004 article, “The Continuum of Metadata
Quality: Defining, Expressing, Exploiting,” Bruce and Hillmann, argue the difficulty, if not impossibility, of
defining a standard metrics for determining metadata quality. What they do suggest is that an
examination “of the most commonly recognized characteristics of quality metadata: completeness,
accuracy, provenance, conformance to expectations, logical consistency and coherence, timeliness, and
accessibility,” can begin to help determine the necessary metrics for metadata quality (Bruce & Hillman,
2004). While I agree with their argument, I would argue that metadata quality evaluation must start at
the beginning, with the creation of metadata. It has been often argued that most metadata today is
created by those with little to no metadata training or experience. As previously argued, the superficial
simplicity of the Dublin Core, while serving as the motivation for its development, can also serve as its
downfall. The minimalist view has led many to believe that Dublin Core metadata can be assigned by
Metadata in Practice: Issues of Interoperability, Quality, and Standardization 11
novices, with no formalized training. Rob Huddleston’s 2008 HTML, XHTML, and CSS, a training guide for
those taking an introductory web design class, devotes only four pages to web site meta elements, such
as those of the Dublin Core. The instructions include adding two main meta elements, keywords and
description, but give very little, if any at all, explanation as to the function of metadata or restrictions or
format of the two elements. In fact, the section clearly points out that since sites like Google ignore
metadata the addition of this information to the header of your website may not be worth the effort
(Huddleston, 2008). This lack of metadata importance is all the more obvious when one searches
Google.
5.1 METADATA QUALITY IN THE WORLD WIDE WEB
As an example I performed a simple Google search for ‘Tudor History,’ with resulted in
24,000,000 results. The first result was TudorHistory.org, a comprehensive site on the Tudor monarchs,
including photos, original resources, and timelines. A quick view at the page’s source material shows a
complete lack of standard metadata in the form of Dublin Core:
<HEAD> <meta http‐equiv="content‐type" content="text/html;charset=ISO‐8859‐1"> <META NAME="GENERATOR" CONTENT="Adobe PageMill 3.0 Mac"> <TITLE>TudorHistory.org</TITLE> <meta name="verify‐v1" content="2wVKaBktTtl0TaXXJCLtl5vYgVnF6pQKA5FqOOgvXlQ=" > <script src="http://www.google‐analytics.com/urchin.js" type="text/javascript"> </script> <script type="text/javascript">_uacct = "UA‐75588‐1";urchinTracker(); </script> </HEAD>
Out of the first five results provided by Google, none of the sites contained any form of Dublin Core
metadata. Results from this search can be found in Appendix A.
If search sites such as Google ignore metadata elements and web designers for the most part
chose not to include these elements in the headings of their web pages, then how do we evaluate
Metadata in Practice: Issues of Interoperability, Quality, and Standardization 12
metadata that seems to be ineffective? Or better yet, how do we convince the community; including
web designers and information professionals, that quality metadata is necessary. I would argue that the
first step would be showcasing the effectiveness of metadata through its ability to benefit information
retrieval, through digital identification and resource discovery, and its ability to provide the means for
digital archiving and preservation. Once a strong argument has been made as to the effectiveness of
metadata you can begin discussing metadata quality. According to Lagoze et al., in their 2006 article on
metadata aggregation, there are three very necessary skill sets required in order to provide quality
metadata, domain experience, metadata experience, and technical experience (Lagoze et al., 2006 ).
While specialized training is both costly and time consuming the added benefit seems worth the cost. If
there is a widespread acceptance of a standard for high quality metadata in the form of Dublin Core (or
another standard in the future) and this standard is adhered to, then some of the issues surrounding
metadata interoperability can also be resolved.
5.2 METADATA QUALITY IN THE NSDL
While the Dublin Core and OAI‐PMH address the important goal of interoperability, this goal has
often been threatened due to the lack of high quality metadata (Lagoze, what is dl). An example of this
can be found in the NSDL. Although the NSDL has decided not to adhere to one metadata standard, such
as the Dublin Core, but rather accommodates a wide variety of metadata schemes, this does not protect
the NSDL from the same issues of metadata quality that those digital libraries that do adhere to the
Dublin Core face. Hillmann et al. (2004) argue that issues such as missing or incomplete data and lack of
controlled vocabulary require NSDL staff to perform ‘safe transformations’ to a large percentage of their
data in order to allow for interoperability among the systems (Hillmann et al., 2004). These ‘safe
transformations’ included removing elements with no data, identifying possible controlled vocabulary,
and normalizing the metadata presentation (Hillmann et al., 2004). These implementation fixes required
Metadata in Practice: Issues of Interoperability, Quality, and Standardization 13
both time and money and worked against the goals of the NSDL which was to automate metadata
harvest and flow with very minimum human intervention (Hillmann et al., 2004).
Besides missing data and a lack of controlled vocabulary, NSDL came across a different issue
during metadata harvest, namely that when not adhering to one standard metadata scheme it is very
possible that “each metadata provider may be using different criteria to assign levels of interactivity,”
meaning the subjectivity used to value metadata, based on the elements with assigned data, may be
different for each metadata provider (Hillman et al., 2004). In this case, how can there ever by an
element of consistency during the harvest without human intervention in order to improve the quality
of the metadata? Even more so, this issue of subjectivity can be used when arguing how to evaluate the
quality of metadata. If the purpose of metadata is to aid end users in their search and retrieval, then the
context of the metadata elements should be measured to determine whether they meet this goal.
However, in the case of the NSDL, which harvests metadata adhering to a number of different schemes,
“different services require different kinds of metadata, perhaps tailored for different purposes, or with
different confidence ratings” (Hillmann et al., 2004). If each metadata provider bases their metadata
schemes around their particular end users, then how can one adequately determine metadata quality
based off of one quality standard?
6. METADATA GENERATION OPTIONS If the NSDL serves as an example of issues arising from auto‐generated metadata, metadata
assigned with little to no human interaction, what sort of issues can be found when dealing with human
generated metadata? We have already touched on the issue of cost associated with training staff.
According to Arms (2000), “cataloging and indexing are expensive when carried out by skilled
professionals. A rule of thumb is that each record costs about $50 to create and distribute” (Arms, 2000,
Chapter 10). If high quality human generated metadata schemes are both costly and time consuming to
Metadata in Practice: Issues of Interoperability, Quality, and Standardization 14
both create and maintain, what are the alternatives, besides automatic indexing which most argue
results in poor quality metadata? An answer to this question could be found through the concept of
author generated metadata. Granted, while the author of a work will not be trained in the usage of
metadata, they will be the most familiar with the work being indexed, as well as the user community in
which the work belongs. Could author generated metadata be the solution to both costly metadata by
professionals and low quality metadata my machines? While the “authoring of data and metadata is
hard and time consuming…and automatic generation of obvious metadata is useful and
possible…semantic metadata will in most cases need to be provided through human intervention (Duvall
et al., 2002). In order to aid in this human intervention, especially in the case of author generated
metadata, a metadata template could be provided for a collection of documents that are similar, or
aimed at the same user community.
6.1 AUTHOR GENERATED METADATA – DSPACE
Taking into account all of the evolving issues surrounding both the creation and use of
metadata, let’s take a look at an example of an institutional digital repositories that stores, manages,
and provides access to institutional assets such as research papers, learning objectives, and research
data, while providing author generated metadata; MIT’s DSpace. DSpace was MIT’s attempt to resolve
the issue of an over abundance of self published materials. “As faculty and other researchers develop
research materials and scholarly publications in increasingly complex digital formats, there is a need to
collect, preserve, index and distribute them” (Smith et al., 2003). Using a qualified Dublin Core metadata
standard, DSpace provides the means “to manage these research materials and publications in a
professionally maintained repository to give them greater visibility and accessibility over time” (Smith et
al., 2003).
Metadata in Practice: Issues of Interoperability, Quality, and Standardization 15
Using the Libraries Working Group Application Profile, there are three required Dublin Core
elements per submission ‐ title, language, and submission date; all other fields such as abstract,
keywords, and rights are optional (Smith et al., 2003). Once the metadata elements have been assigned
by the submitter they are displayed in the item record and indexed for greater searching and browsing
capabilities (Smith et al., 2003). DSpace collections have their own form of interoperability, as metadata
assigned to a record in a particular collection is indexed to allow searching in the initial collection, across
multiple collections, or across Communities (partner institutions participating in DSpace). Also, “to
further its goal of supporting interoperability with other DSpace adopters, and with other digital
repositories, preprint, and e‐print servers, the system has implemented the OAI‐PMH” (Smith et al.,
2003). In order to meet the growing needs for digital preservation, DSpace attempts to capture minimal
technical metadata, such as file format and creation date, in order to support bit preservation (Smith et
al., 2003). Along with quality procedures, servers, and backup plans, DSpace can work to ensure that
“material deposited can be delivered to future users exactly as it was originally received” (Smith et al.,
2003).
If the goal of DSpace is to collect, preserve, index, and distribute materials and publications,
then the requirement for only three Dublin Core metadata elements is problematic. The fields of title,
language, and submission date provide no conceptual associations that would help a user to easily find
needed information, which is the goal of metadata. The fact that DSpace metadata indexing is done by
end‐users, and not trained indexers, could explain why only three fields are required, however, as
argued earlier, metadata records assigned by untrained individuals tend to be plagued with low quality
or incomplete date. (An example of metadata records from DSpace can be located in Appendix B). Also,
DSpace’s preservation mission relies heavily on metadata submissions; however most of the descriptive
fields for a record that would aid in this goal of preservation are not required, and often not captured.
Though DSpace is utilizing the OAI‐PMH by exposing the systems assigned Dublin Core metadata in
Metadata in Practice: Issues of Interoperability, Quality, and Standardization 16
order to ensure that deposited items can be found in the future, the lack of descriptive metadata for
items requires additional support (Smith, et al., 2003). Community representatives would need to work
directly with DSpace user support staff to ensure sufficient metadata creation, however without a policy
standard in which a model workflow could be based allowing each collection to enforce the metadata
standard; this will be an uphill battle.
7. CONCLUSION
The importance of metadata in digital libraries, both as a means for information retrieval and
storage, and digital preservation and archiving, is not to be disputed. Metadata serves as part of the
foundation of librarianship. Every item in a traditional library is assigned some form of metadata,
whether through a card catalog, location on a shelf, indexes such as the Dewey decimal classification, or
classification schemes such as that of the Library of Congress. While metadata has a long standing
tradition in the library environment, this tradition, or acceptance of importance, has not always
transferred over to the digital library realm. Digital metadata is plagued with a number of issues from
poor quality, to lack of interoperability, from cost association, to need of a metadata standard, and from
auto‐generated metadata to human generated metadata. While there appears to be no easy solution to
the metadata issue, what is clear is that without a solution we jeopardize the overall goal and mission of
a digital library, which it to both provide resources to users, and preserve these resources for future
generations. As Arms (2000) argues, “the underlying question is not whether automated digital libraries
can rival conventional digital libraries today. They clearly cannot. The question is whether we can
conceive of a time (perhaps twenty years from now) when they will provide an acceptable substitute”
(Arms, 2000). As the foundation of a digital library, it is time to start conceiving of a time when digital
metadata can fulfill this substitution.
Metadata in Practice: Issues of Interoperability, Quality, and Standardization 17
8. REFERENCES
Arms, W. (2000). Digital Libraries. Chapter 10: Information Retrieval and Descriptive Metadata
Arms, W. (2000). Automated Digital Libraries: How Effectively Can Computers Be Used for Skilled Professional Librarianship? D‐Lib Magazine, 6(7/8).
Bruce,T, & Hillmann, D. (2004). "The Continuum of Metadata Quality: Defining, Expressing, Exploiting." In Metadata In Practice, ALA Edition. D. Hillmann & E Westbrooks. Candela , L., Castelli, D., Pagano, P., & Thanos, C. (2007). Setting the Foundation of Digital Libraries: The DELOS Manifesto. D‐Lib Magazine, 13(3/4).
Duval, E. (2001). Metadata Standards: What, Who & Why. Journal of Universal Computer Science, 7(7).
Duval, E., Hodgins, W., Sutton, S., & Weibel, S. (2002). Metadata Principles and Practicalities. D‐Lib Magazine, 8(4).
Gill, T., Gilliland, A., Whalen, M., & Woodley, M. (2008). “Metadata and the Web.” Introduction to Metadata Online Edition, Version 3.0. http://getty.edu/research/publications/electronic_publications/intrometadata/metadat a.html
Hillmann, D., Dushay, N., & Phipps, J. (2004). “Improving Metadata Quality: Augmentation and Recombination.” In Proceedings Dublin Core Metadata Conference, Shanghai, China. Huddleston, R. (2008) HTML, XHTML, and CSS. Indianapolis, IN: Wiley Publishing, Inc.
Lagoze, C., Krafft, D., Payette, S., & Jesuroga, S. (2005) What Is a Digital Library Anymore, Anyway? D‐Lib Magazine, 11(11).
Lagoze, C., Krafft, D., Cornwell, T., Dushay, N., Eckstrom, D., & Saylor, J. (2006). "Metadata aggregation and "automated digital libraries": A retrospective on the NSDL experience." In Proceedings of the 6th ACS/IEEE‐CS Joint Conference on Digital Libraries, ACM Press.
Manduca, C., Fox, S., & Iverson, E. (2006) Digital Library as Network and Community Center. D‐Lib Magazine, 12(12).
Nurberg, P., Furuta, R., Leggett, J., Marshall, C., & Shipman III, F. (1995). “Digital Libraries: Issues and Architecture.” In Proceedings on Digital Libraries.
Reitz, J. (2004). Dictionary for Library and Information Science. Westport, Connecticut: Libraries Unlimited.
Smith, M., Barton, M., Bass, M., Branschofsky, M., McClellan, G., Stuve, D., Tansley, R., & Walker, J. (2003). DSpace: An Open Source Dynamic Digital Repository. D‐Lib Magazine, 9(1).
Metadata in Practice: Issues of Interoperability, Quality, and Standardization 18
Ward, J. (2003). “Quantitative Analysis of Unqualified Dublin Core Metadata Element Set Usage within Data Providers Registered with the Open Archives Initiative.” Proceedings of the 3rd ACM/IEEE‐ CS joint conference on Digital Libraries.
Metadata in Practice: Issues of Interoperability, Quality, and Standardization 19
9. APPENDIX A
First five results from Google search ‘Tudor History’
Example of embedded metadata all lacking use of Dublin Core
1. http://tudorhistory.org/ <HEAD> <meta http‐equiv="content‐type" content="text/html;charset=ISO‐8859‐1"> <META NAME="GENERATOR" CONTENT="Adobe PageMill 3.0 Mac"> <TITLE>TudorHistory.org</TITLE> <meta name="verify‐v1" content="2wVKaBktTtl0TaXXJCLtl5vYgVnF6pQKA5FqOOgvXlQ=" > <script src="http://www.google‐analytics.com/urchin.js" type="text/javascript"> </script> <script type="text/javascript">_uacct = "UA‐75588‐1";urchinTracker(); </script> </HEAD>
2. http://www.englishhistory.net/tudor.html
<head> <meta name="content" content="Tudor England 1485 to 1603 images biographies primary sources"> <meta name="author" content="Marilee Mongello"> <meta name="page_topic" content="Tudor dynasty Henry VII, Henry VIII, The Six Wives of Henry VIII, Elizabeth I, Mary I, Edward VI, Lady Jane Grey, 16th century England"> <meta name="GENERATOR" content="Microsoft FrontPage 5.0"> <meta name="ProgId" content="FrontPage.Editor.Document"> <meta http‐equiv="Content‐Type" content="text/html; charset=windows‐1252"> <meta http‐equiv="Content‐Language" content="en‐us"> <title>Tudor England 1485 to 1603: Table of Contents</title> <style fprolloverstyle="">A:hover {color: #0000FF; font‐weight: bold} </style> </head>
3. http://www.tudorplace.com.ar/TUDOR.htm
<head> <meta http‐equiv="Content‐Type" content="text/html; charset=iso‐8859‐1"> <meta name="GENERATOR" content="Microsoft FrontPage 4.0"> <title>TUDOR</title> <style> <!‐‐a.new { color: #CC2200; }
Metadata in Practice: Issues of Interoperability, Quality, and Standardization 20
‐‐> </style> </head>
4. http://womenshistory.about.com/library/quiz/bltqueenquiz.htm
<HTML> <head> <title>Which Tudor Queen Are You? A Women's History Quiz</title> <meta name="keywords" content="women's history, women's studies, quiz, question of the week"> <meta name="description" content="Which queen in Tudor history are you most like?: quotations by notable women biographies of notable women tudor queen tudor period personality quiz"> <script src="pquizheadtqueen.js" type="text/javascript"> </script><!‐‐GIHEDSTRT‐‐> <meta charset="ISO‐8859‐1"> <meta http‐equiv="X‐UA‐Compatible" content="IE=edge,chrome=1"> <meta name="ROBOTS" content="NOODP"> <meta name="pd" content="Saturday, 19‐Jun‐2010 14:40:14 GMT"> <link rel="icon" href="http://0.tqn.com/f/a08.ico"> <link rel="search" type="application/opensearchdescription+xml" href="http://0.tqn.com/4g/o/os.xml" title="About.com"> <script>var ziRfw=0;zobt=" Women's History Ads";zOBT=" Ads";function zIpSS(u){zpu(0,u,280,375,"ssWin")}function zIlb(l,t,f){zT(l,'18/1Pp/wX')}</script> <link rel="stylesheet" href="http://0.tqn.com/0g/dc/s63.css" media="all"><!‐‐[if lt IE 9]><link rel="stylesheet" href="http://0.tqn.com/8g/dc/ie.css" type="text/css" media="all"><![endif]‐‐><!‐‐[if lt IE 8]><link rel="stylesheet" href="http://0.tqn.com/8g/dc/rdie.css" type="text/css" media="all"><![endif]‐‐> <meta http‐equiv="pics‐Label" content='(pics‐1.1 "http://www.icra.org/pics/vocabularyv03/" l gen true for "http://womenshistory.about.com" r (n 0 s 0 v 0 l 0 oa 0 ob 0 oc 0 od 0 oe 0 of 0 og 0 oh 0 c 0) gen true for "http://womenshistory.about.com" r (n 0 s 0 v 0 l 0 oa 0 ob 0 oc 0 od 0 oe 0 of 0 og 0 oh 0 c 0))'> </HEAD
5. http://www.elizabethi.org/links/
(Although there is a dc. Element in the below header, they are not standard Dublin Core elements)
<html> <head> <meta http‐equiv="Content‐Type" content="text/html; charset=iso‐8859‐1"> <meta name="Author" content="."> <meta name="Description" content="Links to sites of interest on Tudor History"> <meta name="KeyWords" content="elizabeth, queen, reign,life,tudor,history,overview,biography,"> <title>LINKS: TUDOR HISTORY (Elizabethi.org)</title> <!‐‐ ValueClick Media POP‐UNDER CODE v1.8 for elizabethi.org (4 hour) ‐‐> <script language="javascript"><!‐‐ var dc=document; var date_ob=new Date();
Metadata in Practice: Issues of Interoperability, Quality, and Standardization 21
dc.cookie='h2=o; path=/;';var bust=date_ob.getSeconds(); if(dc.cookie.indexOf('e=llo') <= 0 && dc.cookie.indexOf('2=o') > 0){ dc.write('<scr'+'ipt language="javascript" src="http://media.fastclick.net'); dc.write('/w/pop.cgi?sid=7948&m=2&tp=2&v=1.8&c='+bust+'"></scr'+'ipt>'); date_ob.setTime(date_ob.getTime()+14400000); dc.cookie='he=llo; path=/; expires='+ date_ob.toGMTString();} // ‐‐> </script> <!‐‐ ValueClick Media POP‐UNDER CODE v1.8 for elizabethi.org ‐‐> <style TYPE="text/css"> </style> </head>
Metadata in Practice: Issues of Interoperability, Quality, and Standardization 22
10. APPENDIX B
*Examples of metadata records from DSpace:
Required elements completed. Additional elements such as abstract, description, and keywords
Title: LOM – IEEE Learning Objective Metadata
Authors: (removed)
Keywords:
LOM ‐ IEEE
Learning Objective Metadata
LOM
Issue Date: 13‐Jul‐2011
Abstract:
A brief review of the Learning Objective Metadata Standard (LOM – IEEE). The paper
describes the purpose, function, and basic structure of the LOM‐IEEE standard, as well as
an assessment of the metadata standard as found on the Learning Resource Exchange
Portal.
Description: Prepared as Assignment #2 for Dr. Xia Lin's INFO 653 (Digital Libraries) class at Drexel
University during the Summer 2011 quarter.
URI: http://hdl.handle.net/2114/686
Appears in
Collections: Metadata Project Review (2010‐2011 Summer)
Metadata in Practice: Issues of Interoperability, Quality, and Standardization 23
Keywords element contains only one term, metadata, which is very general.
Title: Review of Gateway to Educational Materials
Authors: (removed)
Keywords: Metadata
Issue Date: 18‐Apr‐2011
Abstract: A review of the Gateway to Educational Materials metadata project.
URI: http://hdl.handle.net/2114/660
Appears in Collections: Metadata Project Review (2010‐2011 Spring)
Contains most additional elements
Title: The Knife Case: Design and Construction
Authors: (removed)
Keywords: Kaufman Collection
knife cases
Issue Date: Sep‐2004
Publisher: Society of American Period Furniture Makers
Citation: x. (2004). The knife case: Design and construction. American Period Furniture, 4, 42‐57.
Abstract:
The origin, design and use of knife cses in early America are briefly explored. The design
and construction of a reproduction case, inspired by an original in the collection of
George M. and Linda H. Kaufman, are described in detail.
URI: http://hdl.handle.net/2114/129
ISSN: 1542‐0299
Appears in
Collections: Articles
Metadata in Practice: Issues of Interoperability, Quality, and Standardization 24
No keywords element – allowing only search by title, author, or date
Title: Museum and Cultural Collections (MOAC) with Certification Statement
Authors: (removed)
Issue Date: 20‐Apr‐2011
URI: http://hdl.handle.net/2114/667
Appears in Collections: Metadata Project Review (2010‐2011 Spring)
*As evident from example metadata records, there is not consistency on metadata creation – each
community requires different metadata elements be completed for submissions.
I certify that: This paper/project/exam is entirely my own work. I have not quoted the words of any other person from a printed source or a website without indicating what has been quoted and providing an appropriate citation. I have not submitted this paper / project to satisfy the requirements of any other course. Signature: Melissa Ormond Date: 8/28/2011