archivists’ toolkit preliminaries: architecture, db leslie myrick nyu

20
Archivists’ Toolkit Preliminaries: Architecture, DB Leslie Myrick NYU

Upload: clare-hicks

Post on 02-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Archivists’ Toolkit Preliminaries: Architecture, DB Leslie Myrick NYU

Archivists’ ToolkitPreliminaries: Architecture, DB

Leslie Myrick

NYU

Page 2: Archivists’ Toolkit Preliminaries: Architecture, DB Leslie Myrick NYU

Possible Java Architecture

• JSP Model 2 Architecture– Servlet Controller

• Handles requests, View selection, instantiates beans

– JSPs update the View in the browser– JavaBeans used to represent the object in

memory; access DB using JDBC• manage the Model

– JDBC connection to the data source

Page 3: Archivists’ Toolkit Preliminaries: Architecture, DB Leslie Myrick NYU

Similar Use of Servlet/JSP Modelin Digital Library Applications

• Dspace

• UC Berkeley’s GenX system

• CDL Preservation Repository

Page 4: Archivists’ Toolkit Preliminaries: Architecture, DB Leslie Myrick NYU

JSP Model 2

• Cleanest separation of presentation and content– Clear delineation of roles of developers and designers

• Takes advantage of strengths of servlets and JSPs for serving dynamic content– JSP for presentation layer

– Servlets for performing process-intensive tasks• Servlet as Controller in charge of request processing, creation

of beans or objects used by JSPs to forward request

• No processing logic in JSPs -- simply responsible for retrieving objects or beans instantiated by servlets

Page 5: Archivists’ Toolkit Preliminaries: Architecture, DB Leslie Myrick NYU

JSP Model 2 Architecture

                                                                                                  

Page 6: Archivists’ Toolkit Preliminaries: Architecture, DB Leslie Myrick NYU

JSP Model 1

• Bulk of processing performed by JSP– Process requests and draw view

• Fine for simple applications

Page 7: Archivists’ Toolkit Preliminaries: Architecture, DB Leslie Myrick NYU

JSP Model 1 Architecture

                                                                                                  

Page 8: Archivists’ Toolkit Preliminaries: Architecture, DB Leslie Myrick NYU

MySQL vs postgreSQL

• Both ACID compliant (transaction safe)• Both support referential integrity (as of MySQL 4.x)• MySQL faster; postgreSQL more robust• Finer grained locking in postgreSQL

– MultiVersion Currency Control in postgreSQL

• Want triggers? Views? Inheritance? For now go with postgreSQL

• MySQL has built-in full-text search capability• Ease of installation and maintenance – MySQL hands

down.

Page 9: Archivists’ Toolkit Preliminaries: Architecture, DB Leslie Myrick NYU

The ACID test

• Atomicity - All elements of a given transaction take place or none do.

• Consistency - Each transaction transforms the database from one valid state to another valid state.

• Isolation - The effects of a transaction are not visible to other transactions in the system until it is complete.

• Durability - Once a transaction has been committed, it's effects are permanent-- even if the system crashes, or a disk dies.

Page 10: Archivists’ Toolkit Preliminaries: Architecture, DB Leslie Myrick NYU

Proposed DB Schema: Archaeology / Genealogy

• Ultimately based on MOA II model

• With refinements to NYU’s zeroDB schema for digital object metadata

• Torqued to describe archival objects and their digital surrogates

• Same essential hook: pure Aristotelian hierarchy

Page 11: Archivists’ Toolkit Preliminaries: Architecture, DB Leslie Myrick NYU

It all comes down to object

• Pivotal entity is object nesting other objects– objectType can be fonds, collection, component– componentType can be series, file, item,

accretion

• Object hierarchy maintained through:– objectID, parentID, nextSibID

Page 12: Archivists’ Toolkit Preliminaries: Architecture, DB Leslie Myrick NYU

Object Table

object

PK objectID

objectTypeIDcomponentTypeIDparentIDnextSibIDhasChildren

FK1 rightsIDFK4 accessionIDFK5 provenanceIDFK2 physDescID

processFinalFK3 physLocID

Page 13: Archivists’ Toolkit Preliminaries: Architecture, DB Leslie Myrick NYU

Accession Table

accession

PK accessionID

accessionTypeIDresourceIDrecordCollectionTypeIDcollectionSurveyprocessingPlanprocessingNoteacqinfoaccrualsappraisalabstractgeneralNotescopecontentarrangementaccessrestrictpreservationNoteconservationNoteotherfindaidtransferFinal

Page 14: Archivists’ Toolkit Preliminaries: Architecture, DB Leslie Myrick NYU

Provenance Table

provenance

PK provenanceID

bioghistbibliographycustodhistfileplandonorNoteprovenanceNote

Page 15: Archivists’ Toolkit Preliminaries: Architecture, DB Leslie Myrick NYU

Physical Location TablesphysLoc

PK physLocID

FK1 physLocLevelIDFK2 physLocTypeID

physLocisPublicobjectID

physLocLevel

PK physLocLevelID

physLocLevel

physLocType

PK physLocTypeID

physLocType

CREATE TABLE physLoc (

physLocID int(11) NOT NULL auto_increment,

physLocLevelID int(11) not NULL default '0',

physLocTypeID int(11) NOT NULL default '0',

physLoc varchar(128) NOT NULL default '',

isPublic tinyint(1) unsigned NOT NULL default '0',

PRIMARY KEY (physLocID)

);--

-- Data for table physLocLevel

--

INSERT INTO physLocLevel (physLocLevel) VALUES ('repository');

INSERT INTO physLocLevel (physLocLevel) VALUES ('internal location');

INSERT INTO physLocLevel (physLocLevel) VALUES ('physical container');

--

-- Data for table 'physLocType'

--

INSERT INTO physLocType (physLocType) VALUES ('accession location');

INSERT INTO physLocType (physLocType) VALUES ('processing location');

INSERT INTO physLocType (physLocType) VALUES ('shelflist location');

INSERT INTO physLocType (physLocType) VALUES ('offsite location');

Page 16: Archivists’ Toolkit Preliminaries: Architecture, DB Leslie Myrick NYU

Ingest of Legacy Datafrom marcxml

• Student Programmers’ Assignment

• Probably involve JAXP/DOM

• Already undertaken conversion of records from Innopac iiirecord dtd to marc21slim schema; tape .mrc to marcxml using marc4J

Page 17: Archivists’ Toolkit Preliminaries: Architecture, DB Leslie Myrick NYU

Ingest of Legacy Data from EAD

• Testbed creation tool

• XSLT with Java Extensions using Xalan– Get nextID from database– Extensions instantiate and increment DBID,

parentID, nextSibID for each component in <dsc>

– Write out to .sql file to dump into DB

Page 18: Archivists’ Toolkit Preliminaries: Architecture, DB Leslie Myrick NYU

<xalan:component prefix="counter" elements="init incr" functions="read"> <xalan:script lang="javaclass" src="xalan://MyCounter"/> </xalan:component>

<xsl:template match="/"> <counter:init name="index"/>

<xsl:template name="dsc"><xsl:for-each select="ead/archdesc/dsc">

<xsl:variable name="dsc-parentID"><xsl:value-of select="counter:read('index')"/></xsl:variable> <counter:incr name="index"/><xsl:for-each select="c01">DBID: <xsl:value-of select="counter:read('index')"/>PARENTID <xsl:value-of select="$dsc-parentID"/>Series: c01-<xsl:number/>Unittitle: <xsl:apply-templates select="did/unittitle"/>Abstract: <xsl:apply-templates select="did/abstract"/><xsl:if test="./child::scopecontent">Scopecontent:<xsl:for-each select="scopecontent/p"><xsl:apply-templates select="."/></xsl:for-each></xsl:if>

Page 19: Archivists’ Toolkit Preliminaries: Architecture, DB Leslie Myrick NYU

DBID: 3PARENTID 2Series: c01-1Unittitle: Series I: Documentary Material

DBID: 4PARENTID:3Subseries: c02-1Unittitle: Subseries A: Subjects

DBID:5PARENTID: 4Subseries: c03-1Box: 1Folder: 1Unittitle: AdvertisingUnitdate:undated

DBID:6PARENTID: 4Subseries: c03-2Box: 1Folder: 2-6Unittitle: Art & CollectingUnitdate: undated

Page 20: Archivists’ Toolkit Preliminaries: Architecture, DB Leslie Myrick NYU

INSERT INTO OBJECT (objectID, parentID, nextSibID, hasChildren, componentTypeID)VALUES (3,2,126,1,1);

INSERT INTO TITLE (titleID, titleTypeID, title, objectID)VALUES (NULL,1,"Series I: Documentary Material",3)

DBID: 3PARENTID: 2NEXTSIBID: 126Series: c01-1Unittitle: Series I: Documentary Material