building a names backbone

33
Building a “names backbone” Nicky Nicolson, RBG Kew

Upload: nickyn

Post on 20-Jul-2015

626 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Building a names backbone

Building a “names

backbone”

Nicky Nicolson, RBG Kew

Page 2: Building a names backbone

A names backbone

== “an environment for the management of multiple

overlapping classifications and tracking how these

change over time”

Not a monolith:

• Built on a layered view of the domain – clearly

separating names and taxonomy

• Names form the objective basis for higher layers

Page 3: Building a names backbone

The current situation…

Many overlapping systems, few links

Page 4: Building a names backbone

… and what we’re aiming for:

Authoritative data, reduced duplication, many more links

Page 5: Building a names backbone

Names backbone: a layered environment

Page 6: Building a names backbone

Name occurrence layer AKA

“Nomen-clutter”

== any attempt

at the

transcription of

a name..

Page 7: Building a names backbone

Names layer

Holds objective

published facts

about a name:

-Orthography

- Authorship

- Protologue

reference

- Type citation

- Objective

synonymy

Page 8: Building a names backbone

Concepts layer

Hypotheses

draw names

together to form

concepts via

heterotypic

synonymy

Page 9: Building a names backbone

The (current) problem:

Most people want

to operate at

concept level…

Page 10: Building a names backbone

The (current) problem:

… but have

to start right

down at the

lowest level

Page 11: Building a names backbone

The problem:

Page 12: Building a names backbone

Solving the problem…

We need to provide ways to allow people to better

navigate between the layers, and better focus their

efforts – e.g. build classifications using the same

objective bases.

We started with a blank sheet of paper – it’s hard to get

existing systems to conform to the layering that we

need

Page 13: Building a names backbone

Drawbacks of data models used to

date

• conflated the storage of names and concepts.

• store only a single classification

• store only the end product of a thought process, not

work in progress

• are difficult to version

• are difficult to query effectively (for hierarchies etc)

Page 14: Building a names backbone

A new (graph) model

• Stores data as graphs – composed of nodes and

directed relationships

• Both nodes and relationships can hold data as

properties

• Supports highly interconnected data

• Supports self-referential data

• Optimised for queries on relationships

Page 15: Building a names backbone

Using a graph model to hold

concept data: Attempt #1

Two nodes, with name

+ status properties,

and an “accepted_as”

link.

== a naïve use of the

graph model: status is

stored in 2 places

(explicitly in status

property, implicitly

by the participation

relationship)

Page 16: Building a names backbone

Using a graph model to hold

concept data: Attempt #2

More strict about the

separation of the

nomenclatural

information (the nodes)

and the taxonomic

information (the

relationships between

nodes), but the link

is still very sparse…

Page 17: Building a names backbone

Using a graph model to hold

concept data: Attempt #3

Add an attribute to

indicate which

classification asserts

this subjective

relationship:

Taxonomic status of a

name is inferred from

its participation

in a subjective

taxonomic relationship.

Page 18: Building a names backbone

Links become more interesting

than the nodes

Expand the data

held on the

subjective

relationship to allow

it to be

computationally

assessed

Page 19: Building a names backbone

Multiple opinions – using the

same name nodes

Reuse the name

nodes to store

multiple opinions

using the same

basic facts (name

nodes)

Page 20: Building a names backbone

Relationships held

Objective, e.g.:

• Combination-basionym

• Later_homonym

• Alternative_name_for

• …

Subjective, e.g.:

• Parent_child (taxonomic placement)

• Synonym (heterotypic synonymy)

• …

Page 21: Building a names backbone

Objective relationships “stronger” than

subjective

Page 22: Building a names backbone

Supporting versioning

We keep all relationships, modifications to the data just

mark relationships as no longer current.

We can always resurrect the state of the graph

== persistent identification of taxon concepts

Page 23: Building a names backbone

Versioning = name id +

classification + state

We can always resurrect the state of the graph.

Versioning enables remote curation of the data

Page 24: Building a names backbone

Versioning = name id +

classification + state

We can always resurrect the state of the graph.

Versioning enables remote curation of the data

Page 25: Building a names backbone

Versioning = name id +

classification + state

We can always resurrect the state of the graph.

Versioning enables remote curation of the data

State1, according to

WCS:

Xus yus Smith (A)

= Aus bus Jones

(S)State2, according to

WCS:

Xus zus White (A)

= Xus yus Smith

(S)

= Aus bus Jones

(S)

Page 26: Building a names backbone

What can be done with this kind of

data model?

• Client systems can reliably connect to a version of a

concept

• We can see how concepts change over time

• Researchers can query the data to compare

classifications and identify areas of dispute

Longer term:

• Examine the “computed acceptance” rules used in

TPL - could these be run on the relationships in the

names backbone?

Page 27: Building a names backbone

Building it: we first focussed on

the top two layers…

Page 28: Building a names backbone

… but we need a way to manage

the name occurrences

Page 29: Building a names backbone

Building the name occurrence layer:

Populating it:

• Seed it with authoritative set of names

• Add the version history of these names – how were

these names transcribed in the past?

Using it:

• Load candidate name occurrences and match them,

storing metrics on the match.

Reviewing – a “data improvement” team to:

• Verify the matches, focussing on ambiguity (that

which can’t be done computationally) == annotation

Page 30: Building a names backbone

Services: name occurrence layer

- Data input / output:

DwCA

-Linking and

reviewing links

-RSS feeds to

indicate activity

Page 31: Building a names backbone

Services: names layer

- Data input / output:

TCS

-Propose addition /

edit of names

-RSS feeds to

indicate activity

Page 32: Building a names backbone

- Data input / output:

TCS

-Create

classifications using

names

-Propose

addition / edit of

names to names

layer

-RSS feeds

Services: concepts layer

Page 33: Building a names backbone

The names backbone is an

extensible environment:

• Links “name occurrences” to names

• Separates curation of names and concepts

• Supports building concepts on the same objective

basis: enables sharing and reuse of foundation data.

• Allow many relationships to form concepts – supports

multiple overlapping classifications

• Allows distributed curation of the concepts.