data modeling goal: agree on data modeling process and ontology

20
Data modeling Goal: Agree on data modeling process and ontology

Upload: leslie-morgan

Post on 16-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Data modeling Goal: Agree on data modeling process and ontology

Data modeling

Goal:Agree on data modeling process and ontology

Page 2: Data modeling Goal: Agree on data modeling process and ontology

Agenda

1. Scope2. Provenance/ Governance (briefly)3. Identifiers4. Guiding Principles, Terms, Concepts5. Controlled Vocabularies

Page 3: Data modeling Goal: Agree on data modeling process and ontology

Scope

Current model is based on PRONOM 6 and UDFR

Is there a useful distinction between “fact” and “institutional policy”?What should be contained in the registry?

Fact Assessment Policy

JPG2000 is an image compression format.

JPEG2000 is a well-adopted standard.

JPG2000 is acceptable by CDL for reformatting photographs

Page 4: Data modeling Goal: Agree on data modeling process and ontology

Scope

Are there other aspects of PRONOM 7 we want to include in the registry?

Page 5: Data modeling Goal: Agree on data modeling process and ontology

Provenance (briefly)

What is the proper granularity for provenance and technical review, per-property or per-aggregate entity (e.g., format, agent, document, etc)

Representation within the model is statements about the provenance Statements about the formats, rather than who stated those facts.

Provenance about the registry information itself can be managed byOpen Provenance Vocabulary whether as reified statements or statementsabout particular triples or graphs.

Page 6: Data modeling Goal: Agree on data modeling process and ontology

Governance (briefly)

What level of technical review should/will contributed information be subject, and by whom?

What are the criteria for contributor eligibility? Anonymous? Public, but known? Self-nominated, but vetted? Invited?

More food for thought (to be extended tomorrow):

Page 7: Data modeling Goal: Agree on data modeling process and ontology

Identifiers (1)

There are multiple identifiers that are defined in the model:

1. PRONOM ID (PUID)2. GDFR Identifier3. UDFR Identifier4. UDFR SystemID (internal registry ID)

Page 8: Data modeling Goal: Agree on data modeling process and ontology

Identifiers (2)

UDFR Identifier: • A globally unique identifier across registry instances• A persistent identifier

• Can be ported to persistent space at later time• Non-opaque

• identical or mappable to URI local name• machine-actionable

Should UDFR identifier be opaque or transparent?

Page 9: Data modeling Goal: Agree on data modeling process and ontology

Identifiers (3)

NodeCreate a zero-padded numeric sequence for organizational node ids (e.g. “001”) to be used within the identifier.

FormatKeep version information as it is defined idiosyncractically by the original format creator. Parse it to reveal family and other useful categorizations.

Page 10: Data modeling Goal: Agree on data modeling process and ontology

Identifiers (4)UDFRID = (addressable-prefix , “/” , identifier )| (addressable-prefix , “#” , identifier);addressable-prefix = “http://udfr.org/udfr” | (“http://n2t.net/” , udfr-ezid) ;udfr-ezid = 5 * digit ;identifier = node-id , “/” , entity-code , “/” , local-id , “/” , version-id ;node-id = 3 * digit ; entity-code = “f” | “n”local-id = alpha , {alphanumeric-with-slash} ;version-iddigit = [0 – 9] ;alpha = [a-zA-Z] ;alphanumeric = [alpha | digit]alphanumeric-with-slash = [alphanumeric | “/”] . For example: http://udfr.org/udfr/001/f/pdf/a/1

http://udfr.org/udfr/001/f/pdf/1.7

Page 11: Data modeling Goal: Agree on data modeling process and ontology

Goals and guiding principles

1. Support existing functionality and use cases2. Reuse and map to existing ontologies where it

makes sense (“linked data”)3. Primarily be a descriptive ontology, with the goal of

expanding to machine-actionable semantic representations where needed

4. Create natural partitions to modularize5. Enable for expansion6. Be consistent7. Have the application be model-driven (yet domain

model-agnostic) as much as possible

Page 12: Data modeling Goal: Agree on data modeling process and ontology

Terms

Resource An object or element expressed in RDF. A resource is identified by a URI.

Class Typically represents a concept. A set of individuals which may possess a set of properties or relationships.

Instance An individual member of a class.

Property Represents a relationship or attribute. Owl divides properties into Object Properties, which relate two resources and Datatype Properties, which relate a resource to a datatype.

Page 13: Data modeling Goal: Agree on data modeling process and ontology

Conceptual Entities

SimpleBaseEntity – Contains all basic provenance/governance properties such as:

• administrativeStatus• baseNote• identifier• creationDate, modificationDate• veriticationDate, verificationStatus, verifiedBy

Page 14: Data modeling Goal: Agree on data modeling process and ontology

Conceptual Entities

CoreEntity – Classes where the circumstance of its creation are meaningful:

• Assessment•Document•File•Format: CharacterEncoding, CompressionTechnique,

FileFormat•Holding•Identifier•IntellectualPropertyRightsClaim•Product: Hardware and Software Products

Has additional properties relating to release information and agents who created them.

Page 15: Data modeling Goal: Agree on data modeling process and ontology

Conceptual Entities

EnumeratedTypes – Class of Enumerated Type Classes (List of Values) as well as the GDFR Facets. Examples include:

• ByteOrderType•CompressionFamilyTeyp•CountryCode•DisclosureType•DocumentIntentType•FormatRoleType•LanguageCode•MediaType

Page 16: Data modeling Goal: Agree on data modeling process and ontology

Conceptual Entities

Format – use GDFR definition of Format to include:• File Format•Character Encoding•Compression Technique

Most properties are defined at Format level (to be inherited by subclasses)

Should we use GDFR definition of Format?

Page 17: Data modeling Goal: Agree on data modeling process and ontology

Properties

Should the registry support actionable inheritance of properties?

For example, should BWF automatically inherit all properties defined for “generic” WAVE?

When should inference take place? At UI entry time?

Current relationships from GDFR (restricted, extended, …) may be difficult to formalize. Shall we just replace with “isDerivedFrom” property?

Page 18: Data modeling Goal: Agree on data modeling process and ontology

Controlled Vocabularies

Semantic: RDF, RDFS, OWL

Vocabulary/Thesaurus: SKOS

Metadata: DC, DCTERMS

Agents: FOAF

Provenance: OPMV (Open Provenance model Vocabulary)

Country Codes/ Language

Codes

Organization IDs

MIME Types ?Governance

Page 19: Data modeling Goal: Agree on data modeling process and ontology

PDF PDF 1.0 PDF 1.4PDF 1.3PDF 1.2PDF 1.1 PDF 1.6PDF 1.5 PDF 1.7 ISO 3200-1

PDF/A-1 PDF/A-2

PDF/X-1a:2001

PDF/X-2:2002

PDF/X-1a:2003

PDF/X-2:2003

PDF/X-4

PDF/X-5

PDF/X-4:2010

PDF/A-3

TIFF TIFF 4 TIFF 5 TIFF 6

TIFF class Y

TIFF class P

TIFF class R

TIFF class G

TIFF class B

TIFF/EP

DNGGeoTIFF

TIFF/IT

TIFF/IT/P2/BL

TIFF/IT/P2/BP

TIFF/IT/P2/CT

TIFF/IT/P2/FP

TIFF/IT/P2/HC

TIFF/IT/P2/LW

TIFF/IT/P2/MP

TIFF/IT/P1/SD

TIFF/IT/P1

TIFF/IT/P2

TIFF/IT/P1/BL

TIFF/IT/P1/BP

TIFF/IT/P1/CT

TIFF/IT/P1/FP

TIFF/IT/P1/HC

TIFF/IT/P1/LW

TIFF/IT/P1/MP

TIFF/IT/P1/SD

Page 20: Data modeling Goal: Agree on data modeling process and ontology

Questions/ Concerns ?