joseph t. tennis: casting our eyes over the threads of the cataloguer’s work: population...

90
Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research Joseph T. Tennis University of Washington Evolution and Variation of Classification Systems KnoweScape Workshop March 4-5, 2015 Amsterdam

Upload: cost-action-td1210

Post on 07-Aug-2015

41 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Casting Our Eyes �Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research �

Joseph T. Tennis University of Washington

Evolution and Variation of Classification Systems KnoweScape Workshop March 4-5, 2015 Amsterdam

Page 2: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

The question before us

What is the nature of the evolution* and variation among knowledge organization systems (KOS)? Corollary questions Is this a simple space or a complex space? How often does it change? Can we engender a common vocabulary to describe this space?

*NB: evolution can be considered a loaded term by some – that is it could be interpreted as fit for survival, and that is not what is intended

here. I often use change in lieu of evolution to clarify this.

Page 3: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

The question before us

There are very practical reasons why we want to ask this question.

Interoperability (sometimes called alignment*) With widespread, yet still hopeful, collaboration across cultural heritage sectors – those with rich KOS, and with further development across a range of sectors we must understand this problem of how KOS interoperate, clarify its pressing issues, and perhaps even incorporate this into formal education. *Alignment in my mind suggests more

similarities than differences, and this seems presumptuous

Page 4: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

The question before us

There are very practical reasons why we want to ask this question.

Digital Preservation Digital preservation is not simply the storage of material on hard disk it is also the system of policies and practices that guarantee digital material a usable future. In service of that goal we need to understand changes in our KOS.

Page 5: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

The question before us

There are very practical reasons why we want to ask this question.

Application Variations (repurposing) By examining evolution and variety we can also better evaluate particular applications of KOS. It is one thing to study the standard, the ideal type, of the KOS, but it is another see how different institutions, sectors, and projects install and perhaps alter that ideal type.

Page 6: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

The question before us

I have, elsewhere, called the examination of these phenomenon, how we change KOS change over time and repurpose them, as second-order problems [0]. The same is true for designing for KOS interoperability.

This is because, in my mind, the first order is how to design the KOS ex nihilo. And in many ways we understand this problem of KOS design.

Page 7: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

The question before us

So we are left to examine this universe of KOS, how it changes, and the aspects of its variety.

Now we can frame the question, and establish what, from my perspective, we know at this point.

We can then outline ways forward both in research and development.

Page 8: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Outline

KOS as the product of problem-solving Design of Metadata and Indexing Languages Metadata in the Wild Time and Variety Population Perspective and a Metadata Observatory

Page 9: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

KOS as the Product of Problem-Solving

Page 10: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

KOS as the Product of Problem-Solving Ben Good has claimed that we are no in a Cambrian Age of KOS [1].

In this context many different folks are trying to solve the information organization problem.

Each of them has approached it from their perspective, disciplinary biases, and using tools they are familiar with (e.g., library classification, Protégé, web browser bookmarks).

Page 11: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

This is the first reason we might take a population perspective in the study of metadata. Namely, we expect variety.

For example, there was some debate in the late 90s on whether or not ontologies were the reinvention of classification. Vickery took this up in 1997 and Soergel in 1999, with Gilchrist taking a bird’s eye view in 2003 [2, 3, 4].

KOS as the Product of Problem-Solving

Page 12: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

As we read these accounts it becomes clear that there are differences that make a difference. And we are still discussing these concepts, from various perspectives, in the literature (cf., Barcellos Almeida, 2013 [5]).

And it is true that many think that such variety is nothing by reinvention – recasting old concepts and practices in new language. Michael Gorman is one of these folks [6, 7].

KOS as the Product of Problem-Solving

Page 13: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

And yet, there are contexts where there is no difference made.

KOS as the Product of Problem-Solving

Are there differences made in this LOV

service?

Page 14: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

I have done some work on this problem. I will introduce it a bit later. I have called it framework analysis [8, 9, 10]. Suffice it to say here, that I believe it is useful and generous to consider the program of creating KOS as problem solving done by many folks in many contexts.

KOS as the Product of Problem-Solving

Page 15: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Design of Metadata and Indexing Languages – First Order KOS Work

Page 16: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Design of Metadata and Indexing Languages Metadata

Machine and human readable assertions about resources.

Indexing Languages A set of representations, that is systematically ordered, that provides access to the conceptual content, and indicates or establishes relationships, between terms to denote concepts and between natural language and terms used to denote concepts

Page 17: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Design of Metadata and Indexing Languages Indexing languages are, in my mind, the superclass under which thesauri, classification schemes, ontologies, taxonomies of various sorts hang. Having said that, indexing languages can and are used for other things than indexing. But we’ll not take that up in this talk.* Soergel [3] offers a good starting list of functions.

*But these may be of interest to studying a wide variety of metadata

– articulating fully their purposes

Page 18: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Design of Metadata and Indexing Languages Metadata, confusingly, sometimes simply refers to one subset of KOS or sometimes to the whole universe of KOS. This requires that we further clarify the form and function that we assume we find in the universe of KOS. NB: KOS in my mind is both metadata AND indexing languages.

Page 19: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Design of Metadata and Indexing Languages For me, metadata is human and machine readable assertions about resources, where resources are the W3C definition of anything with an identity.

Your definition may differ, and that is perhaps part of our building a common vocabulary. So let’s discuss.

However, I do not find it important to retrofit non-machine readable description into the definition of metadata. It has its own names (e.g., cataloguing).

Page 20: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Design of Metadata and Indexing Languages It has been helpful in the context of Dublin Core Metadata work to clarify between schemes and schemas. These are naïve distinctions, if you will, made of convenience, and so through more thorough research may be revised; but in this context it is helpful, I think to distinguish between the attributes of a resource and values you might use to describe that resource.

Page 21: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Design of Metadata and Indexing Languages Attribute: Value Author: Joseph T. Tennis Subject: Evolution of KOS Drawn from a schema: Drawn from a scheme (or not) We may find these don’t work well in some contexts, but let’s try it out for now.

Page 22: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Design of Metadata and Indexing Languages Review Metadata Indexing Languages KOS Schemas Schemes

Page 23: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Design of Metadata and Indexing Languages There is a large literature on the right way to design metadata and indexing languages. There is good reason for this, and it is a useful body of literature. For one thing it is not as straightforward as one might assume to construct an indexing language.

Page 24: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Design of Metadata and Indexing Languages Whether one consults the literature or not, the result of trying to solve problems in information organization results in some form of KOS. And they are out there. Multiplying and evolving.

Page 25: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Metadata in the Wild

Page 26: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Metadata in the Wild

If we take away the research on the design of KOS, we are left with the literature that describes how it is implemented, maintained, and evaluated. We are also left with literature that reads KOS in particular ways.

Page 27: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Metadata in the Wild

In both of these cases we are talking about metadata in the wild. In 2005 we saw a declaration in the form of a call for papers by Jack Andersen of the then Royal School calling for, what I now term, a descriptive turn in knowledge organization research.

Page 28: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Metadata in the Wild

He said, “Much classification research, and knowledge organization research in general, has tended to be concerned with rules, principles, standards or techniques; that is, with prescriptive issues. This workshop will focus on descriptive issues,” [11].

Page 29: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Metadata in the Wild

Of course we had seen work well before this time that could be described as descriptive rather than prescriptive as well. We could cite Richardson’s bibliography from 1901 or earlier works that inventoried extant schemes [12, 13].

Page 30: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Metadata in the Wild

And Bowker and Star have been famously critical of decisions of classification as infrastructure – where professional work around changing what was there or in faithfully representing controversial topics is seen as compromise and therefore fruitful for investigation. For example, representing the full range of nurses work from medical procedures to counseling is not straightforward [14].

Page 31: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Metadata in the Wild

And finally, both Melanie Feinberg’s work and Melissa Adler’s work, while quite different, provide us ways in which we can read KOS as authored rhetorical arguments or institutions of dominance, power, and instruments that promulgate particular worldview if not prejudice, respectively [15, 16]

NB: Both at Local/Global Knowledge Organization Workshop in Copenhagen

in August

Page 32: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Metadata in the Wild

And it is in this context, that we again ask the question and its corollaries.

What is the nature of the evolution and variation among knowledge organization systems (KOS)?

Corollary questions Is this a simple space or a complex space? How often does it change? Can we engender a common vocabulary to describe this space?

Page 33: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Metadata in the Wild

And it is here that we can begin to discuss what has been done and how we might go forward.

Page 34: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Time and Variety

Page 35: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Time and Variety

Time I think it is safe to assume that we all know that KOS change over time. We revise, edit, sunset, phoenix, and otherwise rework our schemas and schemes. I have been curious about this since 2002.

Page 36: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Time and Variety

In an ISKO paper I looked at the entry from EUGENICS relative index of the DDC at two points in time, at edition 16 and edition 20. This simple case study was enough to demonstrate there is sometimes dramatic change in long-lived large indexing languages. I wanted to learn more.

Page 37: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Time and Variety

For those that do not know, EUGENICS is the body of knowledge and the practice of creating better human beings through selective breeding and sterilization measures. It was once considered, by the DDC to be a biological science. It is now a widely debunked science, but the term persists in many different contexts (even legitimate scientific ones).

Page 38: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Time and Variety

It makes sense that if I was curious to see how indexing languages (schemes) change over time I could use this example and a couple of other subjects to see how things change. To that end I began data collection. This took a village, but it was fun and worth the effort.

Page 39: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Time and Variety

We reviewed all editions of DDC for Eugenics and Anatomy* We identified where in the classification we could find these subjects from 1876-2010. These were often in different places (because of the nature of DDC – variety cue!), but it showed us where cataloguers might put books on these subjects.

*Among others, like Gypsies, Algebra, Woman, Civil Disobedience, etc.

Page 40: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Time and Variety

DDC 1911 Ed. 7

DDC 1979 Ed. 19

Page 41: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Time and Variety

The second set of data were gathered using Z39.50 protocol, harvesting MARC records from 572 catalogues that both (1) used EUGENICS or ANATOMY as a first subject heading (in the 650 field of the MARC record, the subject added entry for topics) [17], and (2) used the DDC in the 082 field of the MARC record. After automatically removing duplicate records we were left with c. 927 records for EUGENICS and c. 1965 for ANATOMY.

Page 42: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Time and Variety

Page 43: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Time and Variety

Combining this data would give us insight into where some cataloguers were putting books on EUGENICS and ANATOMY.

Page 44: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Time and Variety

A note about data, and this data specifically is that it is MESSY and we do not necessary trust our sources. So at best this is an exploratory look at this phenomenon and we should improve on methods of data collection and analysis.

Page 45: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Time and Variety In this dataset we have Date derived from LCCN DDC class number Date of publication Date of publication cleaned (removing c. etc.) Year differing between LCCN date and pub. date Title Server Abridged notation present or not Classification edition number if present Record from Library of Congress? Total count of identical records

Page 46: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Time and Variety In this dataset we have Date derived from LCCN DDC class number Date of publication Date of publication cleaned (removing c. etc.) Year differing between LCCN date and pub. date Title Server Abridged notation present or not Classification edition number if present Record from Library of Congress? Total count of identical records

DDC edition date DDC classes possible Discontinued classes See alsos Edition number Notes

Page 47: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Time and Variety

We can now line these two datasets up and explore our question about subject change over time. That is, we can see its ontogeny. Ontogeny is the totality of changes of an individual of a species from conception to full maturation.

Page 48: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Time and Variety

Page 49: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Time and Variety

Page 50: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Time and Variety

Page 51: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Time and Variety

Page 52: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Time and Variety

Page 53: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Time and Variety

Page 54: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Time and Variety

Page 55: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Time and Variety

Page 56: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Time and Variety

Page 57: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Time and Variety

Page 58: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Time and Variety

Page 59: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Time and Variety

Page 60: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Time and Variety

Page 61: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Time and Variety

There are many questions that can be asked of this data and I will be talking more about this tomorrow. I have some things here in appendixes if we have time. I can also provide citations.

Page 62: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Time and Variety

Variety Now we can talk about variety in this context. This is a harder problem for me, because there may be infinite ways we describe variety in KOS.

Page 63: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Time and Variety

Let’s take a (potentially) simple example. What is the difference and similarity between

Descriptor Set (Mooers) Thesauri Classification Schemes Schemes for Classification (Ranganathan) Ontologies Folksonomies?

Page 64: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Time and Variety

In the past I have looked at this in two ways. By establishing a hierarchical or nested method of comparative analysis Through exploratory naïve linguistic expression.

Page 65: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Time and Variety

I have tried to establish rubrics or frameworks whereby we could lay various standards of KOS against. These frameworks include [18]:

Structure Work Practices Discourse

Page 66: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Time and Variety Elsewhere, Elin Jacob and I say, “The structure of a social tagging system, a metadata scheme, or an indexing language must be understood within the framework in which it occurs. The information organization framework itself is comprised of three distinct but interrelated components: the discourse that establishes the goals, priorities and values of the system; the work practices involved in the application and maintenance of the system; and the structure that instantiates both the discourses underlying the framework and the work practices that make it visible,” [10].

Page 67: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Time and Variety Elsewhere, Elin Jacob and I say, “For example, ontology curation (or engineering) is an information organization framework, and the Gene Ontology (GO) is a specific instance of ontology curation. The discourses revolving around GO reflect the fact that its work practices are focused on representation of the natural (or biological) world; and the structure of GO is therefore informed by this scientific and representationalist focus and the work practices and discourses that follow from that focus.” [10].

Page 68: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Time and Variety In an earlier project trying to make sense of the then popular social tagging work (folksonomies), I tried to compare that work to cataloguing in a similar way.

Page 69: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Time and Variety

[18]

Page 70: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Time and Variety The second way I have tried to characterize similarities and differences has been with naïve linguistic expression. In this exercise, Ben Good and I were trying to see if there was a way to quantify a gold standard of indexing languages, such that through automatic inspection we could assess and modify those that were not satisfactory. I must say that I was not convinced this was the right way to go, but I was curious about what clusters would form and why when we reduced all indexing languages to a bag of terms and ran analysis over them.

Page 71: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Time and Variety

Excerpt from [1]

Page 72: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Time and Variety

Excerpt from [1]

0  0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  1  

%  OLP  uniterms:  %  OLP  duplets:  

%  OLP  triplets:  

%  OLP  quadplus:  

OLP  flexibility:  

%  containsAnother:  

%  containedByAnother:  

Number  disInct  terms:  

Mean  Term  Length  Max  Term  Length  

Min  Term  Length  

Median  Term  Length  

Standard  DeviaIon  -­‐  Term  Length  

Skewness  -­‐  Term  Length  

Coefficient  of  variaIon  -­‐  Term  Length  

OLP  max  number  sub  terms  per  term  

OLP  mean  number  sub  terms  per  term  

OLP  median  number  sub  terms  per  term  

21  Connotea  

Page 73: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Time and Variety

Excerpt from [1]

0  0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  1  

%  OLP  uniterms:  %  OLP  duplets:  

%  OLP  triplets:  

%  OLP  quadplus:  

OLP  flexibility:  

%  containsAnother:  

%  containedByAnother:  

Number  disInct  terms:  

Mean  Term  Length  Max  Term  Length  

Min  Term  Length  

Median  Term  Length  

Standard  DeviaIon  -­‐  Term  Length  

Skewness  -­‐  Term  Length  

Coefficient  of  variaIon  -­‐  Term  Length  

OLP  max  number  sub  terms  per  term  

OLP  mean  number  sub  terms  per  term  

OLP  median  number  sub  terms  per  term  

16  CHEBI  

Page 74: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Time and Variety

Excerpt from [1]

0  0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  

%  OLP  uniterms:  %  OLP  duplets:  

%  OLP  triplets:  

%  OLP  quadplus:  

OLP  flexibility:  

%  containsAnother:  

%  containedByAnother:  

Number  disInct  terms:  

Mean  Term  Length  Max  Term  Length  

Min  Term  Length  

Median  Term  Length  

Standard  DeviaIon  -­‐  Term  Length  

Skewness  -­‐  Term  Length  

Coefficient  of  variaIon  -­‐  Term  Length  

OLP  max  number  sub  terms  per  term  

OLP  mean  number  sub  terms  per  term  

OLP  median  number  sub  terms  per  term  

1  MeSH  PrefLabels    

Page 75: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Time and Variety

Excerpt from [1]

0  0.2  0.4  0.6  0.8  1  

%  OLP  uniterms:  %  OLP  duplets:  

%  OLP  triplets:  

%  OLP  quadplus:  

OLP  flexibility:  

%  

%  

Number  disInct  Mean  Term  Length  

Max  Term  Length  Min  Term  Length  

Median  Term  

Standard  DeviaIon  

Skewness  -­‐  Term  

Coefficient  of  

OLP  max  number  

OLP  mean  number  OLP  median  

20  Bibsonomy  

0  0.2  0.4  0.6  0.8  1  %  OLP  

%  OLP  duplets:  %  OLP  triplets:  

%  OLP  

OLP  flexibility:  

%  

%  

Number  disInct  Mean  Term  

Max  Term  Min  Term  

Median  Term  

Standard  

Skewness  -­‐  

Coefficient  of  

OLP  max  

OLP  mean  OLP  median  

21  Connotea  

0  0.2  0.4  0.6  0.8  1  

%  OLP  uniterms:  %  OLP  duplets:  

%  OLP  triplets:  

%  OLP  quadplus:  

OLP  flexibility:  

%  containsAnother:  

%  

Number  disInct  Mean  Term  Length  

Max  Term  Length  Min  Term  Length  

Median  Term  Length  

Standard  DeviaIon  -­‐  

Skewness  -­‐  Term  

Coefficient  of  

OLP  max  number  sub  

OLP  mean  number  OLP  median  number  

22  CiteUlike  

Page 76: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Time and Variety I do not know if these means anything, but I have kept collecting similar data. I have about 36 single versions of this data. Including English dictionaries. And here is where the two come together. I need multiple versions to make sense of this over time.

Page 77: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Population Perspective and a �

Metadata Observatory

Page 78: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Population Perspective and a Metadata Observatory I have tried to demonstrate through my past research that there is sufficient reason to investigate KOS from a population perspective. We have a wide range of standards, types, and a potentially even wider range of implementations that change over time. In order for us to better understand this universe I believe we need to work toward a metadata observatory.

Page 79: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Population Perspective and a Metadata Observatory Like scanning the night sky for different instances of blue dwarf stars or gassy giant planets, we can look for various instances of schemes and schemas. We can then see how they change over time. How they are similar to or different from others. Currently I’m interested in wikipedia’s category system and its nature and changes. I’m also interested in building a view of all the DDC numbers in use. There would be a lot we could see from a metadata observatory.

Page 80: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

The question before us

What is the nature of the evolution* and variation among knowledge organization systems (KOS)? Corollary questions Is this a simple space or a complex space? How often does it change? Can we engender a common vocabulary to describe this space?

*NB: evolution can be considered a loaded term by some – that is it could be interpreted as fit for survival, and that is not what is intended

here. I often use change in lieu of evolution to clarify this.

Page 81: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Population Perspective and a Metadata Observatory Possible features of this observatory might be:

Real Time Metadata Feeds Metadata Viz Run Analysis on Metadata Metadata Maps (geographic and conceptual) Upload Your Metadata Version Comparisons

Page 82: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Thank you�[email protected]

Joseph T. Tennis University of Washington

Evolution and Variation of Classification Systems KnoweScape Workshop March 4-5, 2015 Amsterdam

Page 83: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Appendix A.�Time and Variety Now that we have these visualizations in our minds (perhaps), we can talk about Semantic Gravity Collocative Integrity

Page 84: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Appendix A.�Time and Variety Semantic Gravity Cataloguer privileges collection over updated scheme (theory) Collocative Integrity Degree to which scheme comports with cataloguing practice

Page 85: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Time and Variety

Page 86: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Appendix A.�Time and Variety

0%  

20%  

40%  

60%  

80%  

100%  

1899   1911   1913   1919   1922   1927   1932   1942   1951   1958   1965   1971   1979   1989   1991   2003  

Anatomy  

Old  

Out  

In  

0%  

20%  

40%  

60%  

80%  

100%  

1899   1911   1913   1915   1919   1922   1927   1932   1942   1951   1958   1965   1971   1979   1989   1996   2003  

Eugenics  

Old  

Out  

In  

[19]

Page 87: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

Appendix A.�Time and Variety

0%  

20%  

40%  

60%  

80%  

100%  

1899-­‐2003  

Eugenics  

Old  

Out  

In  

0%  

20%  

40%  

60%  

80%  

100%  

1899-­‐2003  

Anatomy  

Old  

Out  

In  

[19]

Page 88: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

References 0 Tennis, J. T. (2010). Form, Intention, and Indexing: The Liminal and Integrated Conceptions of Work in Knowledge Organization. In Advances in Classification Research. Vol. 21. Available: http://journals.lib.washington.edu/index.php/acro/issue/archive 1 Good, B. M. & Tennis, J. T. (2009). Term based comparison metrics for controlled and uncontrolled indexing languages. In Information Research 14(1). Available: http://www.informationr.net/ir/14-1/paper395.html 2 Vickery, B. V. (1997). Ontologies. In Journal of Information Science 23(4): 277-286. 3 Soergel, D. (1999). The rise of ontologies or the reinvention of classification. In JASIST 50(12): 1119-1120. 4 Gilchrist, A. (2003). Thesauri, taxonomies and ontologies – an etymological note. In Journal of Documentation 59(1): 7-18. 5 Barcellos Almeida, M. (2013). Revisiting Ontologies: A Necessary Clarification. In JASIST 64(8): 1682-1693. 6 Gorman, M. (1990). A Bogus and Dismal Science; or, the Eggplant That Ate Library Schools. In American Libraries 21(5): 463-465. 7 Gorman, M. (1999). Metadata or cataloguing? In Journal of Internet Cataloging 2: 5-22.

Page 89: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

References 8 Tennis, J. T. (2006). Comparative Functional Analysis of Boundary Infrastructures, Library Classification, and Social Tagging. In Information Science Revisited: Approaches to Innovation. Proceedings of the Annual Meeting of the Canadian Association for Information Science/L'Association canadienne des sciences de l'information. York University, Toronto. 9 Tennis, J. T. (2006). Function, Purpose, Predication, and Context of Information Organization Frameworks. In Knowledge Organization for a Global Learning Society: Proceedings of the 9th International Conference for Knowledge Organization. International Society for Knowledge Organization 9th International Conference. (Vienna, Austria. Jul, 2006). Advances in Knowledge Organization vol 10. Ergon. Wurzburg: 303-310. 10 Tennis, J. T. and Jacob, E. K. (2008). "Toward a Theory of Structure in Information Organization Frameworks." (2008). In Culture and Identity in Knowledge Organization: Proceedings of the 10th International Conference for Knowledge Organization. (Montreal, Quebec August 5-8, 2008). Advances in Knowledge Organization vol. 11. Ergon: Wurzburg: 262-268. 11 Andersen, J. (2005). Call for papers. 16th ASIS&T SIG-CR Classification Research Workshop, 2005, “What knowledge organization does and how it does it: Critical Studies in and of Classification and Indexing.” Available: http://dhhumanist.org/Archives/Virginia/v18/0597.html

Page 90: Joseph T. Tennis: Casting Our Eyes Over the Threads of the Cataloguer’s Work: Population Perspective in Metadata Research

References 12 Ricahrdson, E. C. (1901). Classification: Theoretical and Practical. Scribner’s Sons. 13 Horne, T. H. (1825). Outlines for the classification of a library; respectfully submitted to the consideration of the trustees of the British Museum. G. Woodfall. 14 Bowker, G. and Star, S. L. (2000). Sorting Things Out: Classification and Its Consequences. MIT Press., 15 Feinberg, M. (2011). How information systems communicate as documents: the concept of authorial voice. Journal of Documentation 67(6), 1015-1037. 16 Adler, M. (2015). Broker of Information, the “Nation’s Most Important Commodity”: The Library of Congress in the Neoliberal Era. In Information and Culture 50(1): 24-50. 17 Library of Congress. (2007). 650-Subject Added Entry –Topical Term. http://www.loc.gov/marc/bibliographic/bd650.html [18] Tennis, J. T. (2006). Social tagging and the next steps for indexing. In Advances in classification research, Vol. 17: Proceedings of the 17th ASIS&T SIG/CR Classification Research Workshop (Austin, TX, November 4, 2006), ed. Jonathan Furner and Joseph T. Tennis. [19] Tennis, J. (2013). Collocative Integrity and Our Many Varied Subjects: What the Metric of Alignment between Classification Scheme and Indexer Tells Us About Langridge’s Theory of Indexing. NASKO, 4(1). Retrieved from http://journals.lib.washington.edu/index.php/nasko/article/view/14660