towards bootstrapping knowledge-based archives*

11
Towards Bootstrapping Knowledge-Based Archives* Bertram Ludäscher Richard Marciano Reagan Moore San Diego Supercomputer Center {ludaesch,marciano,moore}@sdsc .edu *Towards Self-Validating Knowledge-Based Archives, Bertram Ludäscher, Richard Marciano, Reagan Moore, 11th Workshop on Research Issues in Data Engineering (RIDE) , Heidelberg, IEEE Computer Society, April 2001

Upload: sybill-miranda

Post on 30-Dec-2015

13 views

Category:

Documents


0 download

DESCRIPTION

Towards Bootstrapping Knowledge-Based Archives*. Bertram Lud ä scher Richard Marciano Reagan Moore San Diego Supercomputer Center {ludaesch,marciano,moore}@sdsc.edu. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Towards Bootstrapping Knowledge-Based Archives*

Towards Bootstrapping Knowledge-Based Archives*

Bertram Ludäscher

Richard Marciano

Reagan Moore

San Diego Supercomputer Center

{ludaesch,marciano,moore}@sdsc.edu

*Towards Self-Validating Knowledge-Based Archives, Bertram Ludäscher, Richard Marciano, Reagan Moore, 11th Workshop on Research Issues in Data Engineering (RIDE), Heidelberg, IEEE Computer Society, April 2001

Page 2: Towards Bootstrapping Knowledge-Based Archives*

Archival Processes and Functions• Data submission/accessioning:

– loop: information producer <==> "archival engineer"

• Ingestion:– a sequence of information preserving transformations is applied to submitted

"raw data" => ingestion network

• Migration:– ... as time goes by ...

– ... migrate to new physical media, maybe data formats, information model ...

– "easy migration" <=> "good" archival format & model

• Instantiation/Access:– revive/reanimate the archive => queryable collection/database

• GOAL: preserve information! • Right!?

Page 3: Towards Bootstrapping Knowledge-Based Archives*

What is it that we try to archive??

• Information hierarchies:– data ... information ... knowledge ... (aka: the big picture!)

– instance ... schema ... model ... metamodel ... metametamodel ...

– linear syntax ... data structure ... data model ... conceptual model ...

• Static vs. dynamic information:– extensional data ... intensional/virtual/derived data (facts/rules)

– data ... functions/programs

• Managing complexity– layered approach: "protocol stack" (cf. ISO/OSI, "SemanticWeb",

communication in general) ==going up==> aggregate/abstract

Page 4: Towards Bootstrapping Knowledge-Based Archives*

OAIS (Open Archival Information System) Information Model

• info(rmation)_object ~ data_object + representation_info

• data_object ~ digital_object + physical_object

• digital_object ~ [bits]

• representation_info ~ structure_info + semantic_info

• representation_info is_interpreted_using representation_info

• an AIP (archival information package) contains content info_objects + PDI (preservation description information)

• knowledge-level extension:data objects (e.g., RTF/HTML/... formatted objects)

=wrapping/tagging=> information objects (e.g., XML docs + DTD/Schema)

=knowledge extraction/semantic annotation=> semantic/conceptual objects (e.g., declarative OO model + rules)

Page 5: Towards Bootstrapping Knowledge-Based Archives*

Ingestion Networks

• asking for "=" at the level of raw (unwrapped) data may be too strict:

=> lift to the information level; make sure information is preserved there

e.g., mapping back to HTML using XSL(T) can give the same "look and feel" as the raw data; but presentational HTML "noise" (irregularities) is removed

Transformation t is information preserving, if there is an inverse transformation t_inv, s.t., for all d in dom(t):

t_inv( t( d ) ) = d .

Page 6: Towards Bootstrapping Knowledge-Based Archives*

Ingestion Network: Senate Collection

.RTF

.HTML

.XML .OAV

.XML

.DOCsave-as

save-as

OmniMark Perl

consolidate

.TM

.XML

archive

decompose

S1

S2

S3 S4

S5S6

S7

S0

generategenerate

DIP SIP AIPLegend (stages):

Page 7: Towards Bootstrapping Knowledge-Based Archives*

From XML-Based to Knowledge-Based Archives...

• XML/collection-based archival: save data "as is" plus...– ... separate content from presentation

– ... tag your data (take a lift in the info hierarchy)

– ... use a self-describing, semistructured data format (XML)

• Knowledge-based archival: add ...– ... conceptual level information

– ... integrity constraints

– ... explanations/derivation rules: • archiving only results y=f(x) vs. archiving the rules/function "f"

(e.g. f = Florida ...)

– … knowledge representation (rules) ~ metadata on steroids ...

Page 8: Towards Bootstrapping Knowledge-Based Archives*

... to Self-Validating, Self-Instantiating Knowledge-Based Archives

• Goal: self-contained archive

• Limitations: how much context can you drag into your archive to make it self-contained?? (...Dublin Core … human history)

• Using open, infrastructure independent representations...

=> make the archive as self-contained as you can ...

… pay for …

Page 9: Towards Bootstrapping Knowledge-Based Archives*

Maximizing “Self-Containedness”

• Self-validating archives: add ...– ... "executable knowledge" (=rules)

– "helping (bugging?) the data provider"

=> add the functionality and meaning of DTD (+Schema+IC+...) validation to the AIP

=> package the validator!

• Self-instantiating archives: add ...– ... "executable ingestion process"

– “helping the archival engineer (aka archivist)”

– …here is looking over your shoulder…

=> add the functionality of database transformations to the AIP

=> package the transformers!

• BUT packaging validators and transformers increases infrastructure dependence!

Page 10: Towards Bootstrapping Knowledge-Based Archives*

Towards Bootstrapping Knowledge-Based Archives

Baron von Münchhausen, pulling himself out of the

swamp

• enable addition of semantic annotations ("knowledge") via logic rules to AIPs

• add executable specifications of semantics => AIP += KP (knowledge package, i.e., rules)

=> self-validating archive

• add executable specifications of the ingestion network => AIP += IN (ingestion network, ...more rules)

=> self-instantiating archive

=> a bootstrapping knowledge-based archive with DTD/Schema/IC validation and ingestion transformations all expressed in a declarative logic program

• from the 2do list: build a prototype (BARON) based on rule languages for domain semantics and (self-validation) and ingestion transformations (self-instantiation)

Page 11: Towards Bootstrapping Knowledge-Based Archives*

• Towards Self-Validating Knowledge-Based Archives, Bertram Ludäscher, Richard Marciano, Reagan Moore, 11th Workshop on Research Issues in Data Engineering (RIDE), Heidelberg, IEEE Computer Society, April 2001, SDSC TR-2001-1, January 18, 2001. • Knowledge-Based Persistent Archives, Reagan Moore, SDSC TR-2001-7, January 18, 2001

• The Senate Legislative Activities Collection (SLA): a Case Study Infrastructure Research to Support Preservation Strategies, Richard Marciano, Bertram Ludäscher, Reagan Moore, SDSC TR-2001-5, January 18, 2001

• Reference Model for an Open Archival Information System (OAIS), Draft Recommendation, Consultative Committee for Space Data Systems, CCSDS 650.0-R-1, May 1999.

• Digital Rosetta Stone: A Conceptual Model for Maintaining Long-term Access to Digital Documents, Alan R. Heminger, Steven B. Robertson

References