cataloging & classification quarterly metadata quality in ... quality in digital...

17
PLEASE SCROLL DOWN FOR ARTICLE This article was downloaded by: [Drexel University] On: 7 December 2009 Access details: Access Details: [subscription number 909300635] Publisher Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37- 41 Mortimer Street, London W1T 3JH, UK Cataloging & Classification Quarterly Publication details, including instructions for authors and subscription information: http://www.informaworld.com/smpp/title~content=t792303976 Metadata Quality in Digital Repositories: A Survey of the Current State of the Art Jung-Ran Park a a Drexel University, Philadelphia, Pennsylvania, USA To cite this Article Park, Jung-Ran'Metadata Quality in Digital Repositories: A Survey of the Current State of the Art', Cataloging & Classification Quarterly, 47: 3, 213 — 228 To link to this Article: DOI: 10.1080/01639370902737240 URL: http://dx.doi.org/10.1080/01639370902737240 Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf This article may be used for research, teaching and private study purposes. Any substantial or systematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material.

Upload: others

Post on 18-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cataloging & Classification Quarterly Metadata Quality in ... Quality in Digital Repositories...the Art Jung-Ran Park a a Drexel University, Philadelphia, Pennsylvania, USA To cite

PLEASE SCROLL DOWN FOR ARTICLE

This article was downloaded by: [Drexel University]On: 7 December 2009Access details: Access Details: [subscription number 909300635]Publisher RoutledgeInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Cataloging & Classification QuarterlyPublication details, including instructions for authors and subscription information:http://www.informaworld.com/smpp/title~content=t792303976

Metadata Quality in Digital Repositories: A Survey of the Current State ofthe ArtJung-Ran Park a

a Drexel University, Philadelphia, Pennsylvania, USA

To cite this Article Park, Jung-Ran'Metadata Quality in Digital Repositories: A Survey of the Current State of the Art',Cataloging & Classification Quarterly, 47: 3, 213 — 228To link to this Article: DOI: 10.1080/01639370902737240URL: http://dx.doi.org/10.1080/01639370902737240

Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf

This article may be used for research, teaching and private study purposes. Any substantial orsystematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply ordistribution in any form to anyone is expressly forbidden.

The publisher does not give any warranty express or implied or make any representation that the contentswill be complete or accurate or up to date. The accuracy of any instructions, formulae and drug dosesshould be independently verified with primary sources. The publisher shall not be liable for any loss,actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directlyor indirectly in connection with or arising out of the use of this material.

Page 2: Cataloging & Classification Quarterly Metadata Quality in ... Quality in Digital Repositories...the Art Jung-Ran Park a a Drexel University, Philadelphia, Pennsylvania, USA To cite

Cataloging & Classification Quarterly, 47:213–228, 2009Copyright © Taylor & Francis Group, LLCISSN: 0163-9374 print / 1544-4554 onlineDOI: 10.1080/01639370902737240

Metadata Quality in Digital Repositories:A Survey of the Current State of the Art

JUNG-RAN PARKDrexel University, Philadelphia, Pennsylvania, USA

This study presents the current state of research and practice onmetadata quality through focus on the functional perspective onmetadata quality, measurement, and evaluation criteria coupledwith mechanisms for improving metadata quality. Quality meta-data reflect the degree to which the metadata in question performthe core bibliographic functions of discovery, use, provenance, cur-rency, authentication, and administration. The functional per-spective is closely tied to the criteria and measurements used forassessing metadata quality. Accuracy, completeness, and consis-tency are the most common criteria used in measuring metadataquality in the literature. Guidelines embedded within a Web formor template perform a valuable function in improving the qualityof the metadata. Results of the study indicate a pressing need forthe building of a common data model that is interoperable acrossdigital repositories.

KEYWORDS metadata, quality control, metadata quality evalu-ation, completeness, accuracy, consistency, metadata guidelines,(semi) automatic metadata generation, digital repositories

INTRODUCTION

The rapid proliferation of digital repositories by libraries and other organiza-tions calls for in-depth research on metadata quality evaluation. As evinced

Received July 2008; revised September 2008; accepted October 2008.This study is supported through a research award from the Institute of Museum and

Library Services. I thank the guest editors and reviewers for their invaluable comments andsuggestions. My appreciation also to Caimei Lu for her assistance during the preparation ofthis study.

Address correspondence to Jung-ran Park, Ph.D., Assistant Professor, College of Infor-mation Science and Technology, Drexel University, 3141 Chestnut Street, Philadelphia, PA19104. E-mail: [email protected]

213

Downloaded By: [Drexel University] At: 22:22 7 December 2009

Page 3: Cataloging & Classification Quarterly Metadata Quality in ... Quality in Digital Repositories...the Art Jung-Ran Park a a Drexel University, Philadelphia, Pennsylvania, USA To cite

214 J.-r. Park

through information sharing for non-networked traditional bibliographic col-lections through authority control, successful resource discovery and ex-change across ever-growing distributed digital collections demands metadatainteroperability based on accurate and consistent resource description.

The goal of this study is to present the current state of research andpractice on metadata quality from a broad perspective. In particular, thisstudy focuses on the following areas: a functional perspective on metadataquality, criteria/matrices for measuring metadata quality and mechanisms forimproving metadata quality. The following study questions will guide thedirection of this article:

• Where are we in terms of metadata quality evaluation?• What are the major criteria used to measure metadata quality?• How can we improve metadata quality?

In this article, I will present an overview of these three questions through areview of the relevant literature. Directions for future research and practicein the area of metadata quality will also be discussed.

FUNCTIONAL PERSPECTIVES ON METADATA QUALITY

The vital issues affecting metadata quality evaluation have been relatively un-explored; accordingly, very few studies have attempted to define “metadataquality.” Moen et al. delineate the challenges inherent in examining metadataquality by pointing out the lack of established conceptual and operationaldefinitions of metadata quality.1

In examining metadata in e-print archives, Guy et al. assessed meta-data quality based on the functional requirements. Their definition of qualitymetadata centers on the “functional requirements of the system it is designedto support . . . summarized as quality is about fitness for purpose.”2 The func-tional requirements can be established by defining both the internal require-ments related to the needs of end-users in a local setting and by definingexternal requirements related to disclosed and exposed local metadata relat-ing to external service providers such as the Open Archives Initiative.

For example, if searching and browsing by publication year is listed asa functional requirement, then it is necessary to have content rules speci-fying the format of publication year (e.g., 05-06-2007) in order to meet thefunctional requirements. Otherwise, different formats (e.g., 05/06/2007 or05–06-07) can be used; this will in turn interfere with sorting of the docu-ments by publication year. Users will then be hampered if they choose tosearch and browse documents by publication year.3

NISO addresses the metadata quality problem in the context of metadatacreation by machine and by non-cataloging professionals who are unfamiliarwith cataloging, indexing or vocabulary control. Instead of a direct definition

Downloaded By: [Drexel University] At: 22:22 7 December 2009

Page 4: Cataloging & Classification Quarterly Metadata Quality in ... Quality in Digital Repositories...the Art Jung-Ran Park a a Drexel University, Philadelphia, Pennsylvania, USA To cite

Metadata Quality in Digital Repositories 215

of metadata quality, NISO presents the six principles of what is termed “good”metadata: Good metadata, per NISO, (1) conforms to community standards;(2) supports interoperability; (3) uses authority control and content standards;(4) includes a clear statement of the conditions and terms of use; (5) supportslong-term curatorship and preservation; and (6) should have the qualities ofgood objects, including authority, authenticity, archivability, persistence, andunique identification.4

The criteria and principles articulated by NISO function to provide aframework of guidance for building robust digital collections. Among theaforementioned principles, support of interoperability and the use of au-thority control and content standards relate to semantic factors bearing onconditioning metadata quality. Lei et al. also note such semantic factors bydefining high quality metadata; that is, it accurately captures the meaning ofthe data. Quality metadata also attaches a single semantic identifier to eachentity regardless of different names or labels with which the entity is referredto.5

As presented, the quality of metadata reflects the degree to which themetadata in question perform the core bibliographic functions of discovery,use, provenance, currency, authenticity, and administration. In other words,the principal purpose of metadata is to a large degree related to that of tradi-tional online library catalogs and databases in finding, identifying, selecting,and obtaining items.6

METADATA QUALITY EVALUATION

As stated earlier, metadata quality evaluation has been relatively unexplored;however, there is growing awareness of the essential role of metadata qualitycontrol. In this section, the author will discuss studies dealing with metadataquality evaluation in the metadata creation stage and in the context of fed-erated collections. (Studies directly related to metadata quality measurementand mechanisms for quality improvement will be presented separately insubsequent sections.)

Metadata Creation Stage

Several studies report issues of metadata quality at the metadata creationstage. Currier et al. point in particular to the current lack of formal inves-tigation into the metadata creation process.7 The study presents problemsinherent in the metadata creation stage such as inaccurate data entry and in-consistency of subject vocabularies that result in adverse effects on resourcediscovery.

While stressing the importance of quality assurance, Barton et al. drawattention to issues centering on the creation of good quality metadata; theyargue that these issues have not received much attention, even though muchwork has been done in developing standardized approaches to metadata

Downloaded By: [Drexel University] At: 22:22 7 December 2009

Page 5: Cataloging & Classification Quarterly Metadata Quality in ... Quality in Digital Repositories...the Art Jung-Ran Park a a Drexel University, Philadelphia, Pennsylvania, USA To cite

216 J.-r. Park

structure.8 The important role played by quality assurance, particularly at thestage of metadata creation, is also noted by Guy et al. in examining e-printarchives.9

While looking at author generated metadata quality at the metadatacreation stage, Greenberg et al. examined eleven metadata records us-ing the National Institute of Environmental Health Sciences Dublin Coreschema.10 They investigated whether a simple Web form with textual guid-ance and selective use of features (e.g., pop-up windows, drop-down menus,scrolling lists) could assist document authors in the generation of meta-data. The results of the study indicate that such a simple Web form canassist document authors in the production of acceptable quality metadata.This is particularly promising in the sense that document authors with-out a cataloging background will play an increasingly important role inmetadata creation; in addition, document authors lack metadata creationtools.11

The flexibility and complex structure of natural language allow for therepresentation of a concept in various ways. In natural language, mappingbetween word forms and meanings can be many-to-many. That is, the samemeaning can be expressed by several different forms (e.g., synonyms) andthe same forms may designate different concepts (e.g., homonyms). In ad-dition, the same concept can be expressed by different morpho-syntacticforms (e.g., noun, adjective, compound noun, phrase, and clause).12

These linguistic phenomena may engender confusion in the sense thatdifferent communities may use dissimilar word forms to deliver identicalor similar concepts, or may use the same forms to convey different con-cepts. The findings of the study by Park evidence this complexity of naturallanguage and its impact on metadata application, especially at the creationstage.13 In relation to this, Barker and Ryan also address different interpreta-tions of terminology use by metadata authors.14

Bearing on the complexity of natural language, Heflin and Hendlerstress the indispensability of cataloging professionals and human indexersin the metadata creation process: “it is difficult for machines to make deter-minations of this nature, even if they have access to a complete automateddictionary and thesaurus.”15 Barton et al. also argue on this issue: “. . . notall problems of metadata quality can be addressed effectively by machinesolutions.”16

Federated Collections

Issues of metadata quality in relation to interoperability are especially pro-nounced in the context of federated collections. Some of the studies dealingwith metadata quality problems look at federated collections in particular.Shreeves et al. evaluated the quality of harvested metadata at an aggregator

Downloaded By: [Drexel University] At: 22:22 7 December 2009

Page 6: Cataloging & Classification Quarterly Metadata Quality in ... Quality in Digital Repositories...the Art Jung-Ran Park a a Drexel University, Philadelphia, Pennsylvania, USA To cite

Metadata Quality in Digital Repositories 217

level in order to determine how metadata quality at the local level affectsthe searching of a federated collection.17 By examining the metadata au-thoring practices of several projects funded through the Institute of Museumand Library Services, they found that the quality of metadata varied betweencollections of metadata records. They discuss the challenges of maintainingconsistency of metadata across federated digital resources, while presentingquality control and normalization processes, which may bring forth “share-able” metadata.

With specific regard to the National Science Digital Library (NSDL) col-lections, studies agree that there is currently no method for evaluating andintegrating the results from the more than 100 collections submitted fromvarious data providers.18 Bui and Park examine metadata quality of the opensource Metadata Repository at the NSDL.19 The lack of consistency in meta-data uses in NSDL is partially due to the fact that metadata in the repositoryis derived from many different data providers. As well, these data providersutilize a variety of schemes other than the DC metadata scheme. However,for data harvesting purposes, all metadata schemes in NSDL are mappedonto the DC scheme. In this mapping process, inaccurate and inconsistentmappings may occur. Such drawbacks in mapping no doubt hinder metadatainteroperability even across NSDL collections.

In Zeng’s study of metadata quality regarding complete use of metadatain the NSDL, she illustrates how the disuse of a particular element suchas the DC qualified element audience can seriously diminish the impact ofmetadata on potential resource users.20 For instance, Zeng shows that few ofthe NSDL contributors used the audience element to indicate the projectedgrade label for the educational resources they were describing. Such anoversight may diminish the value of the metadata provided to the users ofthe NSDL database.

METADATA QUALITY MEASUREMENT CRITERIA

As presented in an earlier section, good metadata reflect the degree towhich the metadata in question perform the core bibliographic functions ofdiscovery, use, provenance, currency, authentication, and administration.21

Such functional perspectives are closely tied with the criteria and measure-ments that are used for assessing metadata quality. Even though there hasbeen little focus on establishing frameworks for measuring metadata quality,studies have identified major criteria that can be used for assessing metadataquality.

This section presents an overview of the criteria and matrices usedfor metadata quality evaluation; in subsequent subsections, the author willseparately discuss the core measurement criteria that have been most widelyused in the literature.

Downloaded By: [Drexel University] At: 22:22 7 December 2009

Page 7: Cataloging & Classification Quarterly Metadata Quality in ... Quality in Digital Repositories...the Art Jung-Ran Park a a Drexel University, Philadelphia, Pennsylvania, USA To cite

218 J.-r. Park

To build a networked government information system, Moen et al. iden-tified 23 evaluation criteria drawn from literature on metadata quality.22 Theyprovide an in-depth discussion on metadata quality issues through exami-nation of metadata records of 42 federal agencies using the GovernmentInformation Locator Service (GILS). In this comprehensive analysis, out ofthe initially identified 23 criteria, they primarily employed the following ma-jor set of criteria: completeness, accuracy, consistency, and currency.

Statistics Canada’s Quality Assurance Framework presents six dimen-sions of information quality: relevance, accuracy, timeliness, accessibility,interpretability, and coherence.23 Bruce and Hillmann further refine thesesix principles by modifying them for the library community.24 The suggestedcriteria concern completeness, accuracy, provenance, conformance to ex-pectation, logical consistency, coherence, timeliness, and accessibility. Thesecriteria are particularly developed in the context of aggregated collections.

Stvilia et al. and Gasser and Stvilia propose a Information Quality (IQ)framework by analyzing 32 quality assessment frameworks from the informa-tion quality literature.25 Their sophisticated framework consists of 21 qualitydimensions comprising three categories of intrinsic, relational and reputa-tional IQs. Assessment of data quality by intrinsic IQ dimensions can beexamined through attributes of the objects by measuring conformance to agiven standard. The criteria for the intrinsic IQ are accuracy/validity, cohe-siveness, complexity, semantic consistency, structural consistency, currency,informativeness, naturalness, and precision. The attributes of intrinsic IQdimensions are to a large degree constant and persistent.

The relational IQ dimensions concern relationships between an infor-mation object and its usage context. The criteria for the relational/contextualIQ are accuracy, completeness, complexity, latency/speed, naturalness, in-formativeness, relevance (aboutness), precision, security, verifiability, andvolatility. The reputational IQ concerns a criterion of authority centering onreputation of an information object in a given community.

It is interesting to note the overlap between the information qualityframework proposed by Gasser and Stvilia and Stvilia et al. and the frame-work proposed by Bruce and Hillman. Shreeves et al. illustrate this overlapby mapping the measurement criteria as shown in Figure 1.26

It is of benefit to present Hughes’ mechanism in measuring metadataquality of the Open Language Archives.27 The study employed two numericindicators: code existence score and element absence penalty. Code exis-tence score reflects increase in the quality of metadata; on the other hand,the element absence penalty reflects the decline of the quality due to theabsence of core metadata elements such as title, description, subject, date,and identifier (see also a list of criteria for assessing geographic metadataquality by Tolosana-Calasanz et al.28)

Among the presented criteria and matrix, accuracy, completeness, andconsistency are the most commonly used criteria in measuring metadata

Downloaded By: [Drexel University] At: 22:22 7 December 2009

Page 8: Cataloging & Classification Quarterly Metadata Quality in ... Quality in Digital Repositories...the Art Jung-Ran Park a a Drexel University, Philadelphia, Pennsylvania, USA To cite

Metadata Quality in Digital Repositories 219

FIGURE 1 Mapping between the Bruce and Hillman Framework and the Gasser and StviliaFramework.

quality. As mentioned, Moen et al. measured accuracy, consistency, com-pleteness, and currency in their analysis of GILS metadata records.29 Parkand Bui and Park also measured accuracy, completeness, locally added el-ements, and consistency in relation to interoperability.30 In an evaluationstudy on contributor-supplied metadata using RILM Abstracts of Music Lit-erature, Wilson utilized criteria such as correctness, appropriateness, andcompleteness.31 The two criteria, correctness and appropriateness, are alsoutilized by Rothenberg and Greenberg et al.32 Zeng also utilized similar cri-teria in examining the metadata quality of the NSDL.33 Thus, I will separatelydiscuss the criteria that have been most commonly utilized: completeness,accuracy, and consistency.

Completeness

As Bruce and Hillman note, completeness does not necessarily mean thatall the metadata elements in a given metadata scheme are used.34 The com-pleteness of metadata description can be measured by full access capacityto individual local objects and connection to the parent local collection(s).This reflects the functional purpose of metadata in resource discovery anduse.

Metadata guidelines (e.g., policies, best practices, application profiles)affect the usage of metadata elements. For instance, a mandatory or required

Downloaded By: [Drexel University] At: 22:22 7 December 2009

Page 9: Cataloging & Classification Quarterly Metadata Quality in ... Quality in Digital Repositories...the Art Jung-Ran Park a a Drexel University, Philadelphia, Pennsylvania, USA To cite

220 J.-r. Park

element in one domain could be optional in another domain; this mayin turn affect the degree of completeness (see also Greenberg et al. andWilson35).

In this sense, the completeness of metadata description seems to beconditioned by characteristics of the resource type within a given domainand specifically by local metadata guidelines and best practices. The localguidelines are further modulated by the functional purpose (e.g., informationaccess and service). In this sense, the characteristics of local communities(e.g., collections, agency creating the metadata) as well as the resourceitself seem to modulate the completeness of the metadata description. Thus,the completeness of metadata description entails several factors: resourcetype (i.e., object), its relation to the local collection(s), and the metadataguidelines.

Park evidences such factors as local collections and resource type inmetadata creation across DC-based digital image collections.36 One of thesurveyed collections utilized a high number (78.1%) of local metadata ele-ments for describing image resources. These local metadata elements mostlyconcern provenance information such as contact information, ordering in-formation, and acquisition. In this study, the five metadata elements (i.e.,subject, description, title, format, and coverage) constitute over 50% of all theDC metadata elements.

Accuracy

Accuracy (also known as correctness) concerns the accurate description andrepresentation of data and resource content. It also concerns accurate datainput.37 Zeng describes the accuracy of metadata in terms of three ele-ments: the correctness of the data element’s content, intellectual property,and instantiation.38

Several studies report problems in accurate metadata description of datacontent including inaccurate data entry such as misspelling and errors informat of date (see Currier et al. and Beall39 ). While discussing issues andchallenges stemming from iLumina project experiences, McClelland et al.also present mismatches of imported metadata from data providers such asmissing and incorrect data value.40

Sokvitne discussed inaccuracies in DC metadata application drawn from20 Australian government and educational organizations.41 He found thatthe highest instances (58%) of duplication of the same data value occur withmetadata elements such as creator, contributor, and publisher (see Park andCaplan on semantic overlap and inaccurate application of these elements42).Wilson also observed inaccurate data entry (e.g., non-authoritative forms,capitalization, punctuation, and spelling and typographical errors) as well asincorrect data value.43

Downloaded By: [Drexel University] At: 22:22 7 December 2009

Page 10: Cataloging & Classification Quarterly Metadata Quality in ... Quality in Digital Repositories...the Art Jung-Ran Park a a Drexel University, Philadelphia, Pennsylvania, USA To cite

Metadata Quality in Digital Repositories 221

Consistency

Consistency (also known as comparability or coherence) can be measuredby looking at data value on the conceptual level and data format on the struc-tural level. Conceptual/semantic consistency entails the degree to which thesame data values or elements are used for delivering similar concepts in thedescription of a resource. On the other hand, structural consistency concernsthe extent to which the same structure or format is used for presenting similardata attributes and elements of a resource.44 For instance, the different for-mats of encoding the date element (e.g., YYYY-MM-DD or MM-DD-YYYY)may bring forth inconsistency on the structural level.

Consistency especially matters in the context of the heterogeneous na-ture of resource types and of federated repositories. Metadata guidelines varyinstitution by institution and remain somewhat open to interpretation. Thisalso affects consistency. For instance, the DC identifier can be used for a va-riety of data elements such as call number (e.g., LCC, DDC), image number,negative number, serial number, and photographer’s reference number.

By looking at the NSDL metadata repositories, Zeng notes that sourcelinks, identification and identifiers, description of sources, and data syntax(e.g., jpeg, jpeg/image) are particularly inconsistent across repositories.45

The findings of Park’s study suggest that metadata semantics may affectconsistency.46 The study indicates that semantic overlap among certain DCmetadata element names and corresponding definitions (e.g., type vs. format;source vs. relation) create conceptual ambiguities and consequently hinderconsistent metadata application. Godby et al. also demonstrate inconsistentmetadata use by analyzing 400 Dublin Core records. For instance, they reportthat both the subject and description include the same data values such assubject headings and free-text descriptions.47

MECHANISMS FOR IMPROVING METADATA QUALITY

Metadata quality assurance needs to be built into metadata creation at theoutset. Poor quality of metadata by data providers negatively affects theperformance of service providers as well. Metadata guidelines (e.g., bestpractices, application profiles) and tools for metadata creation may be ef-fective for quality assurance, especially at the metadata creation stage. Inthis section, I will primarily focus on the mechanisms centering on metadataguidelines and tools and automatic metadata generation.

Metadata Guidelines

Metadata guidelines function as an essential mechanism for metadata cre-ation and quality control. Thus, examination of documentation practices andexisting best practice guidelines is a critical issue.48

Downloaded By: [Drexel University] At: 22:22 7 December 2009

Page 11: Cataloging & Classification Quarterly Metadata Quality in ... Quality in Digital Repositories...the Art Jung-Ran Park a a Drexel University, Philadelphia, Pennsylvania, USA To cite

222 J.-r. Park

FIGURE 2 Embedded Metadata Guidelines within a System.

Metadata guidelines are typically composed of metadata element names,definitions/descriptions of metadata elements, content encoding rules, andsome examples. Park and Lu examine seven locally created metadata guide-lines that are used for building digital repositories.49 The data sample isdrawn only from cultural heritage repositories that are built based on the DCmetadata scheme.

Results of the analysis of local metadata guidelines evince great diver-gence in the application of the DC metadata scheme. Each set of guidelinesutilizes different labels and local additions and variants to DC metadata stan-dard to describe local digital resources. This indicates the pressing need forsystematic examination of local metadata guidelines and best practices (seealso Heery and Park50 ).

There are a variety of metadata creation tools for automaticindexing with minimum editing (see DCMI: http://dublincore.org/tools).51 For instance, DC-Dot’s Dublin Core metadata editor (http://www.ukoln.ac.uk/metadata/dcdot/) enables automatic DC metadata generationfor certain data elements and allows the generated metadata to beedited.52 A promising approach to using metadata guidelines is to embedsuch guidelines within a system. The example in Figure 2 from UKOLN(http://www.ukoln.ac.uk/metadata/ rslp/tool/) is illustrative.53

As shown, some of the elements of metadata guidelines such as fieldnames, definitions, and encoding schemes are embedded within a digitalcollection management system. This may enable catalogers or documentauthors to create metadata in accordance with embedded local guidelines.

Downloaded By: [Drexel University] At: 22:22 7 December 2009

Page 12: Cataloging & Classification Quarterly Metadata Quality in ... Quality in Digital Repositories...the Art Jung-Ran Park a a Drexel University, Philadelphia, Pennsylvania, USA To cite

Metadata Quality in Digital Repositories 223

The results of the study by Greenberg et al. also indicate that a simpleone-page (computer screen) Web form with textual guidance and selectiveuse of features such as drop-down menus and pop-up windows may assistdocument authors in generating quality metadata.54

Automatic Metadata Generation

The enormous volume of online and digital resources makes (semi)automaticmetadata generation an impending need. Some of the applications and meth-ods specially designed for automatic metadata generation are reviewed inwhat follows.

The project AMeGA by Greenberg et al. aims to identify and recom-mend functionalities for applications supporting automatic metadata gener-ation in the library/bibliographic control community.55 The project proposestwo types of systems for automatic metadata generation: general content cre-ation software together with specialized metadata generation applications. Incontent creation software, automatic techniques can be employed to producetechnical metadata such as date created, date modified, size (e.g., bytes),and format. Automatic production of technical metadata is promising in thesense that this may contribute to leveraging the speed of metadata generationand facilitating more consistent metadata application.

There are also some promising studies that exploit various methodsand sources for automatic metadata generation. For instance, to facilitatehigh quality metadata generation, Hatala and Forth developed a computer-aided metadata generation system that suggests the most relevant val-ues for a particular metadata field.56 The suggested values are generatedthrough a combination of inheritance, aggregation, content-based similar-ity, and ontology-based similarity. In addition to standard sources suchas object content and user profiles, the system exploits metadata recordassemblies, metadata repositories, domain ontologies, and inference rulesas prime sources for metadata generation. The main advantage of thesystem is that it is independent of the metadata schema and applicationdomain.

For learning objects, Cardinaels et al. identify four main categories ofsources for metadata extraction: document content analysis, document con-text analysis, document usage, and composite document structure.57 In theproposed framework, learning object metadata is derived from two sources:the learning object itself and the context in which the learning object isused. The central component of the framework is Simple Indexing Inter-face (SII), which consists of two major groups of classes that generate themetadata: object-based indexers and context-based indexers. Based on thisframework, an automatic metadata generator is designed and implementedfor the Blackboard Learning Management System.

Downloaded By: [Drexel University] At: 22:22 7 December 2009

Page 13: Cataloging & Classification Quarterly Metadata Quality in ... Quality in Digital Repositories...the Art Jung-Ran Park a a Drexel University, Philadelphia, Pennsylvania, USA To cite

224 J.-r. Park

In the same way as SII, Meire et al. propose an automatic metadata gen-eration system, a so called Simple Automatic Metadata Generation Interface,which combines two groups of metadata generators: object-based generatorsand context-based generators.58 The former generates metadata values basedon the object itself whereas the latter generates metadata values based on theway the object is used. The metadata values created by the two generatorsare merged and the conflicts are resolved.

In terms of learning objects in both text and video formats, Ying etal. present the IBM Magic System.59 The system includes various contentanalytic modules for metadata generation: (1) audiovisual analysis modulesthat recognize semantic sound categories and identify narrators and informa-tive text segments; (2) text analysis modules that extract title, keywords, andsummary from text documents; (3) text categorizer that classifies a documentaccording to a pre-generated taxonomy.

As shown, a variety of methods, sources, and tools have been ex-ploited for automatic metadata generation. It is particularly noteworthythat the aforementioned studies explored automatic metadata generationnot only for technical metadata (e.g., format, date) but also for descriptivemetadata (e.g., subject) by utilizing classification schemes such as taxon-omy and ontology. It is also worth noting that metadata value is capturedthrough exploiting sources not only from the document (object) itself, butalso from its context, including document usage, user profile, and metadatarepositories.

CONCLUSION

Quality metadata reflect the degree to which the metadata in questionperform the core bibliographic functions of discovery, use, provenance,currency, authentication, and administration. The functional perspective isclosely tied with the criteria and measurements used for assessing metadataquality. Accuracy, completeness, and consistency are the most commonlyused criteria in measuring metadata quality in literature. Metadata guide-lines and (semi) automatic metadata generation tools appear to be the mostfrequently utilized mechanisms for quality assurance.

When we compare two or more of the locally created guidelines (e.g.,best practices, local application profiles), we face issues in relation to in-teroperability. In spite of problems in achieving interoperability across localguidelines, these guidelines seem to be fundamental in ensuring a minimumlevel of consistency in resource description within a collection and acrossdistributed digital repositories. A simplified version of metadata guidelinescan be embedded within a Web form or template in the form of pop-up win-dows or other forms, providing a benefit to catalogers or document authorsin the creation of quality metadata.

Downloaded By: [Drexel University] At: 22:22 7 December 2009

Page 14: Cataloging & Classification Quarterly Metadata Quality in ... Quality in Digital Repositories...the Art Jung-Ran Park a a Drexel University, Philadelphia, Pennsylvania, USA To cite

Metadata Quality in Digital Repositories 225

Results of the study indicate that there is a pressing need for the buildingof a common data model that is interoperable across libraries. Developmentof such a common data model demands evaluation of the current practicesof metadata creation and examination of the locally created documentationssuch as metadata guidelines, best practices, and application profiles.

One of the crucial mechanisms for improving metadata quality is to pro-vide continuing education to cataloging professionals. Robertson examinesthe implications of metadata quality to LIS professionals; he points out thechallenge of metadata quality relative to LIS by addressing the need for ap-plication of core skills of the profession to settings that are outside of familiarterritory.60 This resonates the fact that there are research needs for identi-fying new competencies and skill sets needed by cataloging and metadataprofessionals and for current trends in LIS curricula designed to address suchneeds.61

In summary, the rapidly growing body of digital repositories calls forfurther investigation of metadata quality. Future studies lie in identifying fac-tors behind inaccurate, inconsistent, and incomplete metadata creation andapplication. Semantic factors of metadata schemes demand in-depth studiesin relation to quality metadata generation. Development of a framework formeasuring metadata quality and mechanisms for improving quality are alsocritical areas for further studies.

NOTES

1. W. E. Moen, E. L. Steward, and C. R. McClure, “The Role of Content Analysis in EvaluatingMetadata for the U.S. Government Information Locator Service: Results from an Exploratory Study,” 1997,http://www.unt.edu/wmoen/publications/GILSMDContentAnalysis.htm.

2. Marieke Guy, Andy Powell, and Michael Day, “Improving the Quality of Metadata in E-printArchives,” Ariadne 38 (2004), http://www.ariadne.ac.uk/issue38/guy/

3. Ibid.; For the utility of metadata in the context of user service, see also Diane Hillmann, NaomiDushay, and Jon Phipps, “Improving Metadata Quality: Augmentation and Recombination” (paper pre-sented at the International Conference on Dublin Core and Metadata Applications (DC-2004), Shanghai,China, 2004).

4. National Information Standards Organization, A Framework of Guidance for Building GoodDigital Collections (Bethesda, MD: NISO Press, 2007), 61–2.

5. Y. G. Lei, M. Sabou, V. Lopez, J. H. Zhu, V. Uren, and E. Motta, “An Infrastructure for AcquiringHigh Quality Semantic Metadata,” in Semantic Web: Research and Applications, ed. J. Dominque (Berlin:Springer, 2006): 230–44.

6. International Federation of Library Associations and Institutions. Cataloging Section. “Func-tional Requirements for Bibliographic Records: Final Report,” 1998, http://www.ifla.org/VII/s13/frbr/frbr.htm.

7. S. Currier, J. Barton, R. O’Beirne, and B. Ryan. “Quality Assurance for Digital Learning ObjectRepositories: Issues for the Metadata Creation Process,” ALT-J Research in Learning Technology 12, no. 1(2004): 5–20.

8. J. Barton, S. Currier, and J. Hey, “Building Quality Assurance into Metadata Creation: AnAnalysis Based on the Learning Objects and E-Prints Communities of Practice” (paper presented at the2003 Dublin Core Conference Supporting Communities of Discourse and Practice-Metadata Research andApplications, Seattle, Washington, 2003), 8.

9. Guy, Powell, and Day, “Improving the Quality of Metadata in E-print Archives.”

Downloaded By: [Drexel University] At: 22:22 7 December 2009

Page 15: Cataloging & Classification Quarterly Metadata Quality in ... Quality in Digital Repositories...the Art Jung-Ran Park a a Drexel University, Philadelphia, Pennsylvania, USA To cite

226 J.-r. Park

10. Jane Greenberg, M. C. Pattuelli, B. Parsia, and W. D. Robertson, “Author-Generated DublinCore Metadata for Web Resources: A Baseline Study in an Organization,” Journal of Digital Information2, no. 2 (2001): 1–10.

11. Guy, Powell, and Day, “Improving the Quality of Metadata in E-print Archives.”12. Jung-ran Park, “Hindrances in Semantic Mapping among Metadata Schemes: A Linguistic

Perspective,” Journal of Internet Cataloging 5, no. 3 (2002): 59–79. In this article, it is stressed thatsynonymy and polysemy pose particular challenges in semantic interoperability across heterogeneousknowledge organization schemes. See also Jung-ran Park, “Evolution of a Concept Network and ItsImplications to Knowledge Representation,” Journal of Documentation 63, no. 6 (2007): 963–83.

13. Jung-ran Park, “Semantic Interoperability and Metadata Quality: An Analysis of Metadata ItemRecords of Digital Image Collections,” Knowledge Organization 33, no. 1 (2006): 20–34.

14. E. Barker and B. Ryan, “The Higher Level Skills for Industry Repository: Case Studies inImplementing Metadata Standards,” CETIS: Centre for Educational Technology Interoperability Standards,2003, http://metadata.cetis.ac.uk/guides/usage survey/cs hlsi.pdf.

15. Jeff Heflin and James Hendler, “Semantic Interoperability on the Web” (paper pre-sented at the Extreme Markup Languages, Montreal, 2000), 2, http://www.cs.umd.edu/projects/plus/SHOE/pubs/extreme2000.pdf.

16. J. Barton, S. Currier, and J. Hey, “Building Quality Assurance into Metadata Creation: AnAnalysis Based on the Learning Objects and E-Prints Communities of Practice”“ (paper presented at the2003 Dublin Core Conference Supporting Communities of Discourse and Practice-Metadata Research andApplications, Seattle, Washington, 2003), 8.

17. Sarah L. Shreeves, Ellen M. Knutson, Besiki Stvilia, Carole L. Palmer, Michael B. Twidale, andTimothy W. Cole, “Is ‘Quality’ Metadata ‘Shareable’ Metadata? The Implications of Local Metadata Practicesfor Federated Collections,” in the Proceedings of the Twelfth National Conference of the Association ofCollege and Research Libraries, ed. H. Thompson (Chicago, IL: Association of College and ResearchLibraries, 2005), 223–37.

18. P. Shin, “Towards Making the NSDL Collection More Accessible Though a Testbed,” Reportfrom the Annual NSDL Meeting (November 14–17, 2004).

19. Ann Bui and Jung-ran Park, “An Assessment of Metadata Quality: A Case Study of the NationalScience Digital Library Metadata Repository,” in Information Science Revisited: Approaches to Innovation,ed. H. Moukdad, CAIS/ACSI 2006 Proceedings of the 2006 annual conference of the Canadian Associationfor Information Science held with the Congress of the Social Sciences and Humanities of Canada, Toronto,Ontario, June 1–3, 2006, http://www.cais-acsi.ca/proceedings/2006/bui 2006.pdf.

20. Marcia Zeng, “Metadata Quality Study for the National Science Digital Library (NSDL) Meta-data Repository” (paper presented at the Research and Teaching Talk Series, Information Science andTechnology, Drexel University, 2006).

21. International Federation of Library Associations and Institutions. Cataloging Section: 1998.22. Moen, Steward, and McClure, “The Role of Content Analysis in Evaluating Metadata for the

U.S. Government.”23. Statistics Canada, Minister of Industry, Statistics Canada’s Quality Assurance Framework, 2002,

http://www.statcan.ca/english/freepub/12–586-XIE/12–586-XIE02001.pdf.24. Thomas R. Bruce and Diane Hillmann, “The Continuum of Metadata Quality: Defining, Ex-

pressing, Exploiting,” in Metadata in Practice, eds. D. Hillmann and E. L. Westbrooks (Chicago: AmericanLibrary Association, 2004).

25. Les Gasser and Besiki Stvilia, “A New Framework for Information Quality,” in Technical Report(Champaign: University of Illinois at Urbana Champaign, 2001). For the information quality frameworkproposed by Stvilia et al., see the following: B. Stvilia, L. Gasser, M. Twidale, S. Shreeves, and T.Cole, “Metadata Quality for Federated Collections” (paper presented at the International Conference onInformation Quality, ICIQ 2004, Cambridge, MA, 2004).

26. Shreeves et al., “Is ‘Quality’ Metadata ‘Shareable’ Metadata?”27. B. Hughes, “Metadata Quality Evaluation: Experience from the Open Language Archives Com-

munity”,” in Digital Libraries: International Collaboration and Cross-Fertilization, Proceedings of 7thInternational Conference on Asian Digital Libraries, eds. Z. Chen et al. (Shanghai, China, 2004), 320–29.

28. R. Tolosana-Calasanz, J. A. Alvarez-Robles, J. Lacasta, J. Nogueras-Iso, P. R. Muro-Medrano,and F. J. Zarazaga-Soria, “On the Problem of Identifying the Quality of Geographic Metadata,” Researchand Advanced Technology for Digital Libraries (2006), 32–43.

Downloaded By: [Drexel University] At: 22:22 7 December 2009

Page 16: Cataloging & Classification Quarterly Metadata Quality in ... Quality in Digital Repositories...the Art Jung-Ran Park a a Drexel University, Philadelphia, Pennsylvania, USA To cite

Metadata Quality in Digital Repositories 227

29. Moen, Steward, and McClure, “The Role of Content Analysis in Evaluating Metadata for theU.S. Government.”

30. Jung-ran Park, “Semantic Interoperability across Digital Image Collections: A Pilot Study onMetadata Mapping,” in Data, Information, and Knowledge in a Networked World, ed. L. Vaughan,CAIS/ACSI Proceedings of the 2005 annual conference of the Canadian Association for Informa-tion Science, London, Ontario, Canada, June 2–4, 2005, http://www.cais-acsi.ca/proceedings/2005/park J 2005.pdf; Bui and Park, “An Assessment of Metadata Quality.”

31. A. J. Wilson, “Toward Releasing the Metadata Bottleneck—A Baseline Evaluation ofContributor-Supplied Metadata,” Library Resources & Technical Services 51, no. 1 (2007): 16–28.

32. Greenberg et al., “Author-Generated Dublin Cire Metadata for Web Resources”; Jeff Rothen-berg, “Metadata to Support Data Quality and Longevity” (paper presented at the the1st IEEE MetadataConference, Maryland, 1996).

33. Zeng, “Metadata Quality Study.”34. Bruce and Hillman, “The Continuum of Metadata Quality.”35. Greenberg et al., “Author-Generated Dublin Cire Metadata for Web Resources”; Wilson, “To-

ward Releasing the Metadata Bottleneck.”36. Park, “Semantic Interoperability across Digital Image Collections.”37. Gasser and Stvilia, “A New Framework for Information Quality.”38. Zeng, “Metadata Quality Study.”39. Currier’ et al., “Quality Assurance for Digital Learning Object Repositories”; Jeffrey Beall, “Meta-

data and Data Quality Problems in the Digital Library,” Journal of Digital Information 6, no. 3 (2005).40. Marilyn McClelland, David McArthur, and Sarah Giersch, “Challenges for Service Providers

When Importing Metadata in Digital Libraries,” D-Lib Magazine 8, no. 4 (2002).41. Lloyd Sokvitne, “An Evaluation of the Effectiveness of Current Dublin Core Metadata

for Retrieval” (paper presented at the VALA: Victorian Association for Library Automation, 2000),http://www.vala.org.au/vala2000/2000pdf/Sokvitne.PDF.

42. Park, “Semantic Interoperability across Digital Image Collections”; Park, “Semantic Interoper-ability and Metadata Quality”; P. Caplan, Metadata Fundamentals for All Libraries (Chicago: AmericanLibrary Association, 2003).

43. Wilson, “Toward Releasing the Metadata Bottleneck.”44. Gasser and Stvilia, “A New Framework for Information Quality.”45. Zeng, “Metadata Quality Study.”46. Park, “Semantic Interoperability and Metadata Quality.”47. Carol Jean Godby, Devon Smith, and Eric Childress, “Two Paths to Interoperable Metadata”

(paper presented at the DC-2003 Supporting Communities of Discourse and Practice—Metadata Research& Applications, Seattle, Washington, 2003).

48. See the following studies: Thomas R. Bruce and Diane Hillmann, “The Continuum of MetadataQuality: Defining, Expressing, Exploiting,” in Metadata in Practice, eds. D. Hillmann and E. L. Westbrooks(Chicago: American Library Association, 2004); Currier et al., “Quality Assurance for Digital LearningObject Repositories”; Barton, “Building Quality Assurance into Metadata Creation.”

49. Jung-ran Park and Caimei Lu, “An Analysis of Seven Metadata Creation Guidelines: Issuesand Implications,” in 2008 Annual ER&L (Electronic Resources & Libraries) Conference (Atlanta, Georgia,2008).

50. Rachel Heery, “Metadata Future: Steps toward Semantic Interoperability”, in Metadata in Prac-tice, eds. D. Hillmann and E. L. Westbrooks (Chicago: American Library Association, 2004); Park, “SemanticInteroperability and Metadata Quality.”

51. Dublin Core Metadata Initiative, “DCMI Tools and Software,” http://dublincore.org/tools.52. UKOLN, “DC-Dot’s Dublin Core Metadata Editor,” http://www.ukoln.ac.uk/metadata/dcdot/.53. UKOLN, “RSLP Collection Development,” http://www.ukoln.ac.uk/metadata/rslp/tool/.54. Greenberg et al., “Author-Generated Dublin Cire Metadata for Web Resources.”55. Jane Greenberg, K. Spurgin, and A. Crystal, “Final Report for the AMEGA (Automatic Metadata

Generation Applications) Project.” UNC School of Information and Library Science (2005).56. Marek Hatala and Steven Forth, “A Comprehensive System for Computer-Aided Metadata Gen-

eration” (paper presented at the 2003 WWW , Budapest, Hungary, 2003).57. Kris Cardinaels, Michael Meire, and Erik Duval, “Automating Metadata Generation: The Simple

Indexing Interface” (paper presented at the 14th international conference on World Wide Web, Chiba,Japan, 2005).

Downloaded By: [Drexel University] At: 22:22 7 December 2009

Page 17: Cataloging & Classification Quarterly Metadata Quality in ... Quality in Digital Repositories...the Art Jung-Ran Park a a Drexel University, Philadelphia, Pennsylvania, USA To cite

228 J.-r. Park

58. Michael Meire, Xavier Ochoa, and Erik Duval, “Samgi: Automatic Metadata Generation V2.0”(paper presented at the World Conference on Educational Multimedia, Hypermedia and Telecommuni-cations (EDMEDIA), Chesapeake, VA, 2007).

59. L. Ying, D. Chitra, and F. Robert, “Creating Magic: System for Generating Learning ObjectMetadata for Instructional Content” (paper presented at the 13th annual ACM international conference onMultimedia, Singapore, 2005).

60. J. R. Robertson, “Metadata Quality: Implications for Library and Information Science Profes-sionals,” Library Review 54, no. 5 (2005): 295–300.

61. Jung-ran Park and Caimei Lu, “Metadata Professionals: Roles and Competencies as Reflected inJob Announcements, 2003–2006,” Cataloging & Classification Quarterly, 47(2): 145–160; Jung-ran Park,Caimei Lu, and Linda Marion, “Cataloging Professionals in Digital Environment: A Content Analysis of JobDescriptions,” Journal of American Society of Information Science and Technology (accessed January 9,2007), http://www3.interscience.wiley.com/journal/121629757/abstract.

Downloaded By: [Drexel University] At: 22:22 7 December 2009