Transcript

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

1

EAS at Harvard EASisasystemthatenablesingest,managementandbasicpreservationofemailandalsopavesthewayforaccesstoemail.Itprovidesfeaturestoidentifypolicyandcuratorialissuese.g.rightsmanagement,eventstrackingetc.EASdoesnotaddressthecaptureofemailnordoesitaddressdiscoveryoremaildeliveryforendusers.Itfocusesonthecurationofemailinpreparationforlongtermpreservation.Theprojectwasdevelopedinconjunctionwith3corepartnersatHarvardUniversity(SchlesingerLibrary,HUArchivesandCountwayLibrary)with2additionalparticipants(HarvardArtMuseumsandGSDLoebLibrary).EASwasbuilttofulfilltheneedsoftheHarvardUniversitypartnersandisintegratedwithseveralotherHarvardUniversitysystems–AMS,Policy,WordshackandDRS.

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

2

Archiving Email

The lifecycle 1. Collectiondevelopment

a. Pre-acquisitionappraisal✗2. Accessioning

a. Capture✗b. Normalization.✔

3. ArchivalProcessinga. Item-levelprocessing✔b. Bulkprocessing✔c. Intellectualarrangement✔d. Searchcapability✔e. Personal/Sensitiveinformationprocessing✔

4. Preservationa. Packaging✔b. Repositoryþ

5. OnlineDiscovery✗6. Access✗

✔-EASsupportsþ-EASsupportsviaDRS2✗-EASdoesnotsupportNoteveryinstitutionwillwanttofollowtheentirelifecycle.

The community and the Tools InJune2015therewasanArchivingEmailSymposiumhostedbytheLibraryofCongresswithover150attendees.AttendeesincludedpeoplefromTheSmithsonianInstitute,NARA,EmoryUniversity,StanfordUniversityamongstothers.Therewasinterestintoolstohelpinpreservingemail.Itwasalsoapparentthatthereisnoonetoolthatcoverstheentirelifecycle.Acombinationoftoolsmayhelpinstitutionsintheireffortstoarchiveemailforlongtermpreservation.InfactmanyinstitutionsusedAid4Mailand/orEmailchemytoconvertemailtostandardmboxoremlformatbeforeusingthatoutputasinputtothenexttool.

Open Sourcing EAS ByopensourcingEASitismorelikelythatotherinstitutionswillcollaborateinmakingtheirtoolsinteroperablewithEAS.ThiswouldbeadvantageoustoHarvardUniversitywheresomeEASusershaveexpressedaninterestintheuseofePADDasadonorappraisaltoolwhoseoutputmightthenbeimportedintoEAS.Itwouldalso

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

3

beadvantageoustothecommunitywhoarelackingatoollikeEASwhichcouldbeusedstandalone.

EAS Technology EASiswritteninJava.EASi,theuserinterfacetoEAS,isajavaStruts2webapplicationthatrunsinTomcat.MostofthesoftwareusedinEASisopensourcewiththenotableexceptionoftheuseofinternalLTSsoftwarelibraries,theuseofOracleasthedatabaseandEmailchemyasthesoftwareusedtoconvertemailfromclosed,proprietaryfileformatstoastandardEMLformat.EASdoesnotprovideanAPIforusebyothers.

• Tomcat8• Java8• Struts2• Ant• Gradle• Maven• Oracle12(commercial)• Hibernate5.3.7• Emailchemyembeddedversion(commercial)• Mime4j0.6• Solr4.10.1• Solrj• jQuery1.8.2• jQueryUIThemeRoller• ajax-solr• flexigrid• YUIGrids• JSP• CSS• LTSutilities(proprietary)• FITS0.8• Springbatch4.0.1

SinceEASmaycontainsensitiveinformation,includingHRCI,asecurityarchitecturewascreatedtoprotectthisdata.Thissecurityarchitectureismainlyinfrastructureforexamplethroughtheuseofsecurenetworksandsshmountingoffilesystems.

Commented [RG1]: Securitywhenindocker

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

4

EAS Integration with other Harvard Systems

EASintegrateswithseveralHarvardLibrarysystemsasshowinthediagramabove.

AMS (Access Management System) AMSisanLTSsystemthatprovidesauthenticationandbasicauthorizationservicesforlibrarysystems.AMSinturninteractswithHarvardKeyandLDAP.Itisawebapplicationthatmakesuseofcookiesandbrowserredirects.EASredirectsusers’browserstoAMSandinspectsencryptedcookiesthatAMScreates.EASmakesuseofanAccessclientjarinordertomanagethis.

Policy Policyisusedforauthorizationtolibrarysystems.EASmakesuseofaPolicyclientjarthatisusedtoperformdirectdatabasequeries.

Wordshack Wordshackistheauthoritycontrol/vocabularymanagerforEASandforDRS2.Wordshackmanagesadmincategory,adminflag,emailaddress,person,organization,softwareandtopicterms.ThesetermsareusedthroughoutEAS.InteractionwithWordshackisviaaRESTfulapi,howeverforperformancereasonstermsarestoredlocallyintheEASdatabase.EASmakesuseoftheclientjarfilesprovidedbyWordshackforinteractingwithWordshack.

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

5

DRS (Digital Repository Service) EASusestheDRSversion2(DRS2)asthelongtermpreservationrepositoryforemails.EASwritesDRS2specificbatchestothefilesystemwhenpushingitemstoDRS2forlongtermpreservation.EASalsointeractswithDRS2viaaRESTfulapi,makinguseoftwoclientjarfilesprovidedbyDRS2.ThisRESTfulapiisusedforseveralinteractionswiththeDRS2including:CollectionsarecreatedinDRS2viaEASiAccountsareretrievedfromDRS2foruseinEASBillingcodesareretrievedfromDRS2foruseinEAS

One code base to serve us all EASistocontinuetofirstandforemostservetheHarvardUniversitycommunity.Thisrequiresit’scontinueduseofLTSspecificsystems.TomakeEASopensourceandusefulbyothersoutsideofHarvardUniversityitisnecessarytodisentangleEASfromotherLTSsystemsandfromcommercialorproprietarysoftware.FortheinitialreleaseofEASasOSSweareaimingforaminimumviableproduct–itwillcontaincorefeatureswhichwillpermitittobedeployedandbeusablewithlimitedfunctionality.Itisproposedtomanagethisthroughadependencymanagementbuildtoolandconfigurationmanagement.Therewillbeonecodebasewithoneoftwobuildversionsproduced–theLTSbuildversionandtheOpenSourcebuildversion.ThebuildfilefortheLTSversionwillbeexcludedfromtheopensourcegithubsourcecontrolrepository.InternalLTSjarfilesshouldonlybeusedintheHarvardUniversityversionofthebuiltsystemandexcludedfromtheopensourcedependencies.Theopensourceversionshouldonlyrequireopensourcedependenciesandshouldresultinastandalonebuiltsystem.Alaterphasewilladdressintegration/interoperabilitywithothertools.EASmakesuseofEmailchemyforconversionofemailstothestandardEMLformat.Thisisacommercialproduct.ItwouldbebeneficialtorefactorEAStopermitit’susewithoutthiscommercialproduct.ThiswouldfacilitatepackagingEASinaDockercontainersinceitwouldbeabreachoflicensetoincludeEmailchemyinapubliclyavailableDockercontainer.EASalsomakesuseofthecommercialOracledatabase.EASdoesnotmakeuseofOraclespecificfeaturesandcouldbeconfiguredtoalsoworkwiththeopensourcePostgreSQLdatabase.ThiswouldlowerthebarrierforadoptionofEASandalsopermitpackagingaprepopulateddemoofEASinaDockercontainer.

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

6

EAS initial refactoring with Anti-corruption layers between Bounded Contexts

AnAnti-corruptionlayerisaconceptfromDomainDrivenDesign.InthecaseofEASthereareseveralboundedcontexts(authentication,accesscontrol,controlledvocabulary,collectionsmanagementetc.)thatcouldbenefitfromthislayer,permittingfutureimplementationstobepluggedinmoreeasily.OnewayoforganizingthedesignoftheAnti-corruptionlayerisasacombinationofFacades,Adaptorsandtranslators,alongwiththecommunicationandtransportmechanismsusuallyneededtotalkbetweensystems.UsingdependencyresolutionandconfigurationmanagementeithertheLTSspecificorthedefaultOSSspecificimplementationswillbeavailable.Configurationwillbeusedtomanagefeaturetogglesandfeaturegates:

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

7

Example code for use of feature toggle and feature gate:

If (gateClient.isUngated(“my.feature.name”)){ doFeatureCode(); }

Future Possibilities

Integration with ePADD ePaddconsistsof4modules,inorderofworkflowusagetheyare:appraisal,processing,discoveryanddelivery.Mboxfilesarefedintotheappraisalmoduleandtoproceedtothenextmoduleitisnecessarytoexporttoanarchive,aninternalePADDnon-standardartifact.Conceptuallyanarchiveisacollectionofindexedmessagesalongwithablobstore.Thisarchivethenneedstobeimportedtothenextmoduleandtheprocessrepeatsforeachmodule.Thedeliverymoduledoesprovidetheabilitytoexportemailstomboxformat,butitmaynotbelossless.TheApril2016releaseofePADDisplannedtopermittheexportofemailstomboxformatfromtheappraisalmodule–againitmaynotbelossless.TheintentoftheePADDappraisalmoduleisforusestandaloneonadonor’sworkstation.AtHarvardUniversity,itisdesirablefordonorstobeabletousetheappraisalmoduleofePADDandforcuratorstousetheresultofthatprocessinginEAS.Somepossibleapproachesforthatachievingthatfollowbelow.WiththeApril2016releasethedonorwillbeabletoexporttheresultoftheirprocessingtomboxformatwhichcouldthenbeimportedintoEAS.EAScouldsplitthesemboxfilesintoemlfilesitself,withouttheuseofEmailchemy(itiseasytoidentifythestartofeachnewmessagebythepresenceofthe“From_line”).Thiswouldneedsomemechanismforcontrollingthis.AlternativelythemboxcouldberunthroughsomesoftwaretoproduceemlfileswhichcouldthenbeimportedintoEAS.LTSwouldrequirethatthisapproachberecorded–viaeventsandclientagent.Asanalternative,ePADDcouldprovideaclientjarfileforextractingemailsfromanarchiveintomboxorevenemlformat.EAScouldusethistoprocessanePADDarchive.Thedisadvantageofthisapproachisthatitwouldonlyworkwithjavaapplications.TheePADDarchivecontainsserializedobjectswhichcanonlybereliablyreconstitutedbyusingthejavalanguagetodoso–thislimitshowportablethesearchivesare.

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

8

Potential Roles LTSprojectmanagerformanagingossinfrastructure,frameworkandformovingeastooss..HLprojectmanagerforliaisingwithHLcommunity,externalcommunityandHLleadership.LTSdevelopers

EASDRS Discovery Access/DeliveryWordshack

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

9

Open Source Software Checklist BasicFactors Explanation Remarks R A C IUsefulness Softwareshould

beusefulmoreorless“asis”.

OSSversionshouldnotincludeLTSspecificjars.OSSversionshouldnotrequirecommercialproducts.

LTS ? ? ?

Interoperability Ifthesoftwareinteroperateswithothersoftwaretools,theopensourceprojectshouldhavewelldocumented,preferablystandardsbased,interfacestoexternalcode-webservices,classinterfaces,orotherpoints.

EASneedstoberefactoredtoprovideabstraction/anti-corruptionlayerswherealternateimplementationsmaybepluggedin.

LTS ? ? ?

License Thesoftwareshouldbereleasedwithalicensestatemente.g.Apache2,GPL,LGPL,MIT,BSD,AGPLv3.

Choiceislimitedbydependencyonsoftwarewithrestrictivelicenses.Ifagivendependencyisoptionalhowdoesthataffectthelicenserequirement?

LTS ? OCG&HUITCTO

HL

ContributorLicenseAgreement

Manyopensourceprojectsrequirethis.

LTS ? OGC&HUITCTO

HL

Copyright AttopofeachClass

LTS LTS OGC&Provost

HL

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

10

Office&HUITCTO

Patent Somesoftwareincludesapatentinadditiontothesoftware.

Anexampleisthefacebookreactjslibrary.InvestigateHUpoliciesgoverningthis.

LTS LTS OGC&ProvostOffice&HUITCTO

HL

UserDocumentation

Forusebyusers ? ? ? ?

DeveloperDocumentation

Forusebydevelopers

LTS ? ? ?

CodeDocumentation

Classlevelataminimum

LTS ? ? ?

Sourcecontrol Github LTS ? ? ?IssueTracking Github LTSusesjira.

HowdowesynchronizeGithubissueswithinternalLTSjiraissues?

LTS ? ? ?

Deploymentpackaging

Shouldweprovideareadytorunimplementation?Thiswouldenableeasieradoptionbyothers.

ePADDprovidesapackagedversionreadyfordeploymentonWindowsorMac.EASusessomelinuxspecificfunctionality.WecouldprovideaDockerizedversionconfiguredforquicksetup.

LTS ? ? ?

Demo Shouldweprovideaself-containeddemo?

Providealotofexamplesandconcentrateonhavingsomereallyshinyonestoimpressusers/developersenoughtotakea

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

11

closerlook.Contributions Whoshould

decidewhatcontributionstoaccept?

Contributionsincludeideas.

LTS? HL? ? ?

Committers InitiallyLTSonly LTS ? ? ?Tests Docontributors

needtoprovidetestsforcontributedcode:Unittests,integrationtests,functionaltests?

Initiallywillonlyacceptideasandnotcode.

LTS ? ? ?

Documentation Whatlevelofdocumentationwouldwerequireforcontributedcode?

Initiallywillnotbeacceptingcode.

LTS ? ? ?

Support Needaforumfordiscussingfeatures,technicalissuesetc.Whatforum?

RequiresanEmaillist/Googlegroupetc

LTS ? ? ?

Outreachandcommunications

Whatforumsdowewanttoposton?Whateventsdowewanttopresentat?

HL ? ? ?

R:Responsible–whoisassignedtodotheworkA:Accountable–whomakesthefinaldecisionandhasultimateownershipC:Consulted–whomustbeconsultedbeforeadecisionoractionistakenI:Informed–whomustbeinformedthatadecisionhasbeentakenHL:HarvardLibraryLTS:LibraryTechnologyServicesOGC:OfficeoftheGeneralCounsel

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

12

Proposed work for Open Sourcing EAS

Miscellaneous : FieldsthataremandatoryinEASduetoitsintegrationwithDRS2willremainmandatoryandwillbepopulatedwithdefaultvaluesintheOSSversion.Future–maymakethesemandatoryfieldsconfigurableinfuture.1 CreateanewbuildprocesswithdependencyresolutionDescription ThecurrentbuildprocessforEASusesANTwithnodependency

resolution.AlternativesareIvy,MavenorGradle.FirstchoiceisGradle,secondchoiceisIvy.Mavenisstubbornly“opinionated”andwouldnotaccommodatemanyofourexistingLTSprojects.

Update Movetomavenanddockerandpossiblyansible–ongoingNeedtoupdateversionofdocker/dockercomposeTheLTSchangecontrolprocesshaschangedandisinfluxduetotheintroductionofdockerandansible.Implementationdetail-JavaserviceLoadermaybeusedtofacilitateswitchingimplementationsofservices.Mavencanthenbeusedtopullinthecorrectimplementationjartothebuild.

Comments ThisenablesacustomizedbuildforLTSversustheOSSversion.ThismustworkwiththeLTSchangecontrolprocess.LTSproprietaryjarsshouldbeexcludedfromtheOSSbuilddependencies.Jarsfromdependentprojects(e.g.hibernate)shouldbepulledinusingdependencymanagementduringthebuild.Question:Fitsincludesots.jarwhichisaproprietaryLTSjar.Whatistheimplicationofthis?EAScurrentlyuses93jarfilesinadditiontothoseusedbyFitsandSolr.SeeGrouperBuild/DependencyManagementforsomereasoningonthechoiceofabuildtool.Thisisanabsolutepre-requisiteforthisproject.Itisnot

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

13

possibletoexcludejarfileswithoutthis.Buildsystemneedstobesetupinordertocontinuedevelopment.

Future Dependencies LTSArtifactoryinstanceshouldbepopulatedwithrequiredjarsFeedback RS–thisisatechnicaldebtprojectandthereforedoesnot

belonginthisprojectbutratherina“technicaldebt”projectProposedphase Phase12 Abstract out authenticationDescription Abstractoutauthenticationsothatitcanbeconfiguredto

1. UseAMSforauthentication2. UseauthenticationinformationfromanXMLfile3. Facilitateplugginginofnewauthenticationmechanismin

futureUpdate SwitchfromAMStoCAS.

DoincludeXML/Json–usejsonschemaforvalidationofdata.Comments Authenticationiscurrentlycloselytiedtotheuser’sHUID,which

isusedthroughoutthesystem.Theuser’semailaddresswhichAMSreturnsviaanLDAPlookupisalsoused.Forsecurityreasons,theLTSversionmustonlyworkwithAMSandavalidHUID.TheOSSversionshouldnotbeconfigurabletouseAMSandshouldnotincludetheaccess.jarfile.Itshouldfailgracefullyifmisconfigured.

Future InternaldatabaseOAuthShibbolethCASLDAPActiveDirectoryOpenConnect

Dependencies (1)Feedback AM–implementLDAPforphase3

GR–dependingonfeedbackfromcommunitydecideonwhichimplementationtouseforphase3

ProposedPhase Phase13 AbstractoutauthorizationDescription Abstractoutauthorizationsothatitcanbeconfiguredto

1. UsePolicyforauthorization

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

14

2. UseauthorizationinformationfromanXMLfile3. Facilitateplugginginofnewauthorizationmechanismin

futureUpdate TalkwithIAMaboutmakingdirectAPIcallstogrouper

NeedtosetupgroupergroupsforEAS–runthembyLTSSupportIfIAMwon’tpermitdirectAPIcalls–staywithusingpolicyStillneedtousePolicyforDRSdepositorlookupForOSSUseauthorizationinformationfromanXML/Jsonfile(usejsonschemaforjsonvalidation).

Comments CurrentlytheuserHUIDisusedtolookuppolicyinformation.Forsecurityreasons,theLTSversionmustonlyworkwithPolicyandavalidHUIDTheOSSversionshouldnotbeconfigurabletousePolicy.Itshouldfailgracefullyifmisconfigured.

Future InternalDatabaseGrouperLDAP

Dependencies (1)(2)Feedback AM–implementLDAPforphase3

GR–dependingonfeedbackfromcommunitydecideonwhichimplementationtouseforphase3

Proposedphase Phase14 EnableconfigurationtousePostgreSQLinsteadofOracleDescription EAScurrentlyisconfiguredtouseOracle.Itmakesnouseof

OraclespecificfeaturesandcouldworkwithPostgreSQLviaminorconfigurationchangessinceEASusesHibernateORM.

Update SwitchallversionstousePostgeSQL(RDS)WillinvolveworkfromSharon’sgroupWillalsoinvolveinputfromAM/SharontoensureitremainsattherightsecuritylevelforHRCI.

Comments UseofPostgreSQLremovesadependencyonacommercialdatabase.Thiseliminatesconstraintsconcerninglicenserestrictions.UseofPosgreSQL:

• LowersthebarriertoadoptEAS(nolicensetopay)• Permitsthecreationofaselfcontained,pre-populated

EASDemoinaDockerContainer(ItisabreachoftheOraclelicensetodeploythedatabaseinaDockerContainer)

TheLTSversionofEASshouldcontinuetouseOraclefor

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

15

performanceandoperationalreasons.TheOSSversionofEASshouldbeconfigurabletouseeitherOracleorPostreSQL.

Future Dependencies (1)Feedback Proposedphase Phase15 AbstractoutAccounts(DRSownercodes)andBillingCodesDescription Ownercodes(Accounts)arestoredinDRS2andalocalcopyis

createdintheEASdatabasewhenenabledforuseinEAS.BillingcodesarestoredinDRS2andretrievedforuseinEAS.AbstractoutAccountsandBillingcodessothatEAScanbeconfiguredtouse:

• AccountsandBillingCodesfromDRS2• AccountsandBillingCodesfromanXMLfile• Facilitateplugginginofothermeansofretrieving

AccountsandBillingCodesinfuture

Update OssuseXMLorJSonconfiguration–usejsonschemaforvalidation.

Comments TheLTSversionshouldonlyworkwithAccountsandBillingcodesfromDRS2.OSSversionshouldnotbeconfigurabletouseAccountsandBillingCodesfromDRS2.

Future AM-Makeitconfigurabletomakeaccountsandbillingcodesoptional.

Dependencies (1)(2)(3)Feedback RS-shouldworkoutimpactontimeitmighttaketoimplement

bullet3above.GR–ifaremakingitconfigurabletoreadthisinformationfromanxmlfile,thenweneedtocreateanabstractionlayeranyhow.GR–regardingthefutureoptionofmakingthisinformationoptional,thisinformationismandatoryinthedatabaseandsolrindexbecauseitisveryimportantforLTS.Usingdummyvaluesfromadefaultxmlfilewillnotinconvenienceuserswhodonotneedthisinformation.Thesystemhasnotbeenarchitectedforconfiguringoptionalityofdatabasetables/fieldsanddoingsowouldrequiresignificantwork.

Proposedphase Phase1

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

16

6 AbstractoutDRSCollectionsDescription CollectionsarecreatedinDRS2viatheEASuserinterface.

MinimalcollectioninformationisstoredintheEASdatabaseandtheEASSolrindex.NeedtheabilitytoconfigureEAStocreateCollections:

• InDRS2withminimalinformationinEAS• MinimalinformationonlyinEAS

Comments TheLTSversionshouldonlyworkwithCollectionsinDRS2.

TheOSSversionshouldnotbeconfigurabletocreateCollectionsinDRS2

Future Aseparateprojectcouldmanagecollections.LibrarycloudhasaseparateprojectformanagingitscollectionswhichcouldbeusedasamodelforafutureEAScollectionsmanagementproject.ThiswouldrequireprovidinganapiinEASforupdatingthecorecollectioninformationintheSolrindexandthelocaldatabase.

Dependencies (1)(2)(3)Feedback WG–needtobeabletoassociateitemswithcollections.

Agreed-MakeEASconfigurabletoonlyrequiretitleforcollectionandnotcollectanyotherinformationonCollections.Theninfutureabstractoutcreationofcollectionsinothersystems.Reducedestimatebaseduponaboveagreement.

Proposedphase Phase17 AbstractoutWordshackTermsDescription EnableconfigurationofEAStocreateandusecontrolled

vocabularyterms• InWordshack• InanXMLfile• Facilitateplugginginofothermeansofmanaginga

controlledvocabulary

Update UseupdatableXML/JSonfile/store(sinceemailaddressesarecreatedduringimport)ORInOSSversioncouldjustcreatetermsdirectlyindatabase?Question–UIforcreatingtermsdirectlyindatabase

Comments Wordshacktermsareintricatelytiedintothesystem–• ontheserver• theuserinterface(itusesaWordshackwidget)in

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

17

conjunctionwithaproxyservletfilter• inthedatabase• intheSolrindex

TheXML/JSonfileshouldbekeptsimplefortheinitialrelease.TheLTSversionshouldonlyworkwithWordshackterms.Thephase1OSSversionshouldnotbeconfigurabletoworkwithWordshackandshouldnotincludethewordshackclientjar.ThesupportedemailclientsarerecordedinWordshackassoftwareterms.

Future PossiblyexpandtosupportothercontrolledvocabulariesDependencies (1)(2)(3)Feedback RS-IfWordshackwereavailableasopensourceitwouldmean

thatitcouldbeusedandsowemightnotneedtodothiswork.GR–wedonotwanttoforceOSSuserstouseourimplementationofacontrolledvocabulary.AlsowedonotwanttobuildinadependencyonanotherprojectbeingopensourcedinordertoopensourceEAS.

Proposedphase Phase18 RemoveFitsfromOSSVersionDescription EnableconfigurationofEAStoremoveFITSComments FitsisusedduringimportandpushtoDRS2.

Duringingestitisonlyrequiredinordertogetfileformatinformation.FortheOSSversionitcangetthefileformatinformationbyissuingthe“file”commandunderlinux(EASalreadyneedstorununderlinuxsothisintroducesmorenon-portablecode).ThisshouldstillbeconfigurableandfailgracefullyifconfiguredtouseFITSintheOSSversion.

Future See9

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

18

Dependencies (1)Feedback Proposedphase Phase19 ReplaceEASFitsServletwithOSSFitsServletDescription WhenEASwasimplementedFitswasincludedintheprojectand

implementedasawebapplication(similartoSolr).SincethenanopensourceversionoftheFitsServlethasbeendevelopedandisalmostreadyforuse.OncetheopensourceversionoftheFitsServlethasbeenreleasedthisshouldbeusedbyEASinsteadofit’sownimplementation.

Update Thishasalreadybeenimplemented–usingAM’sFITSdockerimage.However,theFITSDockerimageshouldbemadeavailableondockerhuh.

Comments UseoftheOSSversionoftheFitsServletwillmakeiteasiertokeepuptodatewiththelatestversionofFits.Thismaybemanagedbydependencyresolution.Thefits.jarfilewillstillberequiredbytheEASwebapplicationinordertoprocesstheoutputfromfitsduringimport(usedinordertopopulatethefileformatinformationforattachments).

Future Dependencies OSSversionofFitsServletmusthavebeenreleasedFeedback NeedtoalignlicensesProposedphase Phase2

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

19

10 ConfigureDisablingofPushtoDRSDescription ProvidetheabilitytoconfigureEASto

• PushtoDRS• DisablePushtoDRS

Update ForOSSversion-disablepushtoDRSbutenablepackage

creation.OSSversionmuststillcreatepackageCouldcreatebatchidenticaltoDRSbatchandOSSuserscouldmanipulateitthemselvestoproducewhattheywant.ChangedescriptorstobelessDRSspecificinOSSversion.LTSversionusesOTSwhichcontainsalotofLTSspecificconstants,LTSspecificvalidationetc.TODO–TriciaandSteveneedtoestablishwhatisacceptableinthedescriptors.Issueswithdescriptors:

• ContainWordshackURIs• ContainURNs• ContaindrsAdmindata(schema)• ContainhulEventExtension(schema)

ForOSSperhapssimpledescriptorsshouldbecreatedusingjaxbandnotusingOTS.

Comments ThroughuseoftheRESTfulapiinitem21itwillprovidetheabilityforotherprojectstopulltheinformationrequiredinordertocreateapackageforpreservation.Item22wouldprovidetheabilitytoactuallycreateabagforarchivingmakinguseoftheapiprovidedbyitem21.Item11providesforexportemailsandattachmentsbutnotmetadataasanmbox.

Future Dependencies (1)Feedback RS–needabilitytocreateaverysimplebag.

WG–needoutputsodoneedtoincludeabilitytocreateapreservationpackage.ThiscouldbeanappealingdeliverableforanIMLSgrant.Reducedestimatebasedondiscussions.OnfurtherdiscussionwithRSandWGwillnotcreateabaguntilknowwhatwouldbeusefulinthebag.Needtodiscussthiswith

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

20

communityduringtheworkshop.Proposedphase Phase111 AddabilitytoconfigureEAStonotuseEmailchemyDescription ThiswillremoveadependencyonaCommercialtool.

ByremovingthisdependencyitwillbepossibletopackageEASinaDockercontainer.ItalsoreducesthebarriertouseofEASibyremovingthenecessitytopayforsoftware.EASshouldfailearlyandgracefullyifEAShasbeenconfiguredtonotuseEmailchemyandifausersubmitsapackettypewhichcanonlybeprocessedbyEmailchemy.

Comments Mostofworkwillbearoundprojectbuildconfiguration.DonotwanttoresultinamoreonerousdeploymentinLTSsoneedtomakeitasautomatedaspossible.

Future Dependencies (1)Feedback Proposedphase Phase112 AddhandlingforemlfilesDescription Bypermittingthesubmissionofemlfilesinapacketuserswill

havetheoptionofusingwhatevertooltheyliketoconverttheiremailstoemlpriortousingEAS.

Comments Shouldthecreatoragentbeacombinationofemlandthetoolusedtoconverttoeml?Ifsoitshouldberecordedinthecontrolledvocabularyasasoftwareterm.

Future Dependencies (1)Feedback Proposedphase Phase113 AddhandlingformboxfilesbyEASitselfDescription Thiswouldpermithandlingofmboxfileswithoutrequiringthe

useofEmailchemy.Manymailboxescanbesavedfromemailserversetcinmboxformat.Itappearstoberelativelysimpletosplitanmboxfileintoindividualemlfiles–thestartofeachnewmessageisidentifiedbythe“From_line(useregexon/^From /lines).

Comments EASshouldberecordedastheagentinthenormalizationevent.

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

21

Future Dependencies (1)Feedback Proposedphase Phase114 DothoroughreviewoflibrariesusedinEASOSSversionDescription Thisisrequiredinordertoensurethatweareincompliance

withalllicensesoflibrariesusedinEASOSS.Partofthiswillbetolistallthelibrariesusedinthe:

• OSSversion• LTSversion

Thisisalsoarequiredstepinordertosetupdependencyresolutioncorrectly.

Comments Candependencymanagementalsohandlelicenses?Wemayneedtomanuallyincludelicensesetc

Future Dependencies Feedback Proposedphase Phase115 DothoroughcleanupoftestsDescription EAShasnumerousunitandintegrationtestswhicharecurrently

badlyorganized.Theseneedtobecleanedup.Withtherefactoringitmaymakesensetointroducetheuseofmocks.

Comments Future Dependencies Feedback Proposedphase Phase216 MakeUserinterfacechangesDescription Usefeaturerequesttogglestoenable/disableLTSspecific

language.Comments Future Dependencies Feedback Proposedphase Phase1

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

22

17 Reviewforuseofpublic/private/protected/packagelevel

methodsDescription TheaccessmodifiersonclasseswithinEASwerenotcarefully

managed.Leavingpublicmethodswhichshouldinfactbeprivatecanleadtomisuseofthosemethods.

Comments Future Dependencies Feedback Proposedphase Phase218 HandleconfigurationofotherjobsDescription ThereareseveraljobsusedinEAS.Thesewouldneedtobe

configurable(usingfeaturetoggles)fortheLTSversionortheOSSversion.

Comments TheLTSversionshouldpermitrunningofthesejobs:Loader,Importer,DRSprearchiver,DRSpostarchiver,DRSpacketeventsarchiver,accountmonthlystatistics.TheOSSversionshouldnotpermitrunningofDRSprearchiver,DRSpostarchiver,DRSpacketevents.

Future Dependencies Feedback Proposedphase Phase119 RemoveLTSproprietaryjarsDescription Theutil.jarLTSproprietaryjarprovidesfunctionalitythatis

mostlynowavailableincoreJavaorinopensourcelibraries.

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

23

WherepossiblethecodeshouldberefactoredtousetheseimplementationsinordertoremoverelianceonLTSproprietarycode.Theldap.jarLTSproprietaryjarisnotused.Boththesejarfilesshouldberemovedifpossible.

Comments UsersofOSSprojectsneedaccesstothesourcesoanyjarfilesusedintheprojectshouldalsobeopensource.

Future Dependencies (1)Feedback Proposedphase Phase120 ImplementRESTfulapiDescription TomakeEASmoreopenforuseitwouldbebeneficialtocreatea

RESTfulapiComments ThisRESTfulapicouldbeusedbyanotherapplicationtocreatea

bag(see21).TheRESTfulapicouldbeusedbyanotherapplicationtocreateandmanagecollections(see6).ThisapimustbeimplementedsothatitmaybeusedbyexternalclientsviaRESTandbyEASitselfinprocess.

Future Dependencies (1)Feedback Proposedphase Phase321 ImplementLOCBagcreationDescription Implementcreationofabagwhichmakesuseoftheinprocess

apifrom20above.Thisprocessshouldbetriggeredviatheuserinterface

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

24

Comments Usehttps://github.com/LibraryOfCongress/bagit-javatohelpinbagcreation.Question:whatshouldbeinthedescriptorfiles?METSseemstonotbepopular.Needfeedbackfromthecommunityonthis.WhenitemsaresuccessfullyarchivedtoDRStheyaredeletedfromEAS(withoutgeneratinganydeleteevents).Whatshouldhappenwhenabagiscreated?Creationofabagdoesnotmeanthattheitemshavebeensuccessfullyarchived.

Future Dependencies (18)Feedback Proposedphase Phase322 PackagefordeploymentDescription Toreducethebarriertoadoptionitisdesirabletoprovidea

deployableversionofEAS.Comments EASusessome“unixlike”osspecificcommands–andsowillnot

runonwindows(onereasonwasduetoabuginthejavaFileclasswhichdoesnothandlecertainspecialcharactersinthefilename).EAScouldbepackagedformacusingoracleAppBundlerwithhdiutil(ePADDdoesthis).ItmaybebesttoprovideitinaDockercontainer.

Future Dependencies Feedback Proposedphase Phase1

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

25

Proposed Roadmap

Prerequisites/phase1 Item1 Createanewbuildprocesswithdependencyresolution

Phase 1 Item2 AbstractoutauthenticationItem3 AbstractoutauthorizationItem5 AbstractoutAccounts(DRSownercodes)andBillingCodesItem6 AbstractoutDRSCollectionsItem7 AbstractoutWordshackTermsItem8 RemoveFITSfromOSSversionItem10 ConfigureDisablingofPushtoDRSItem11 AddabilitytoconfigureEAStonotuseEmailchemyItem12 AddhandlingforemlfilesItem13 AddhandlingformboxfilesbyEASitselfItem16 MakeUserinterfacechangesItem18 HandleconfigurationofotherjobsItem19 RemoveLTSproprietaryjarsItem4 EnableconfigurationtousePostgreSQLinsteadofOracleItem14 DothoroughreviewoflibrariesusedinEASOSSversionItem22 Packagefordeployment

Phase 2 Item15 DothoroughcleanupoftestsItem17 Reviewforuseofpublic/private/protected/packagelevelmethodsItem9 ReplaceEASFitsServletwithOSSFitsServlet

Phase 3 Item20 ImplementRESTfulapiItem21 ImplementLOCBagcreation

Phase 4 Detailsaretobedecidedbythecommunity.Interoperabilityistobeaslooselycoupledaspossible–e.g.viafileinterchange,restfulapisandthelike.MakeEASinteroperablewithePaddMakeEASinteroperablewithBitcurator(redaction)

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

26

Resources

Wordshack https://wiki.harvard.edu/confluence/display/LibraryTechServices/SysDev+-+WordShack

Access Management System https://wiki.harvard.edu/confluence/display/LibraryTechServices/SysDev+-+Access

Policy Server https://wiki.harvard.edu/confluence/display/LibraryTechServices/SysDev+-+Policy+Server

DRS2 https://wiki.harvard.edu/confluence/display/LibraryTechServices/SysDev+-+DRS2

Emailchemy http://www.weirdkid.com/products/emailchemy/

DArcMail http://www.digitalpreservation.gov/meetings/documents/aes15/1_LC_AES_SIA_EmailandCERP_DarcMail_20150602.pdfhttp://siarchives.si.edu/blog/yes-we%E2%80%99re-still-talking-about-emailhttp://www.history.ncdcr.gov/SHRAB/ar/emailpreservation/mail-account/mail-account_docs.html

Bitcurator http://www.bitcurator.net/

ePADD http://library.stanford.edu/projects/epaddhttps://github.com/ePADD/epaddhttps://github.com/ePADD/muse

Lifecycle Tools for Archival Email Stewardship (in progress) https://docs.google.com/spreadsheets/d/1V1N22xnr5e0EbDlZWx58bjYO6rkrMrYH9wGX9-CK8c4/edit?pli=1#gid=986222267

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

27

Archiving Email Symposium 2015 http://www.digitalpreservation.gov/meetings/archivingemailsymposium.html

Email related RFCs https://tools.ietf.org/html/rfc5322

Email formats http://www.digitalpreservation.gov/formats/fdd/fdd000388.shtml http://www.digitalpreservation.gov/formats/fdd/fdd000383.shtml

fits http://projects.iq.harvard.edu/fits https://github.com/harvard-lts/fits

Open Source https://wiki.harvard.edu/confluence/display/LibraryTechServices/LTS+Open+Source+ProjectsIntroducingtheOpenSourceMaturityModelMakinganOpenSourceProjectBloom

Licenses http://choosealicense.com/licenses/https://en.wikipedia.org/wiki/Comparison_of_free_and_open-source_software_licenseshttp://opensource.org/licenses/

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

28

Jar files used by EAS access.jar(LTSproprietary)activation.jarantlr-2.7.6.jaraopalliance-1.0.jarapache-mime4j-0.6.jaraspectjrt-1.6.8.jaraspectjweaver-1.6.8.jarc3p0-0.9.1.jarcglib-nodep-2.2.jarcom.ibm.jbatch-tck-spi-1.0.jarcommons-cli-1.1.jarcommons-codec-1.6.jarcommons-collections-3.1.jarcommons-configuration-1.5.jarcommons-fileupload-1.2.1.jarcommons-httpclient-3.1.jarcommons-io-2.3.jarcommons-lang-2.4.jarcommons-lang3-3.1.jarcommons-logging-1.1.3.jarcommons-pool2-2.2.jardom4j-1.6.1.jardrs2_services-dto.jar(LTSproprietary)drs2_services-util.jar(LTSproprietary)easi.jarehcache-1.5.0.jarfits.jarfluent-hc-4.3.5.jarfreemarker-2.3.15.jargeronimo-stax-api_1.0_spec-1.0.1.jarguava-15.0.jarhibernate-jpa-2.0-api-1.0.0.Final.jarhibernate-testing.jarhibernate-tools.jarhibernate3.jarhttpclient-4.3.5.jarhttpclient-cache-4.3.5.jarhttpcore-4.3.2.jarhttpmime-4.3.5.jarjavassist-3.9.0.GA.jarjaxen-core.jar

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

29

jaxen-jdom.jarjcl-over-slf4j-1.6.1.jarjdom.jarjettison-1.1.jarjstl.jarjta-1.1.jarldap.jar(LTSproprietary)log4j-1.2.17.jarmail.jarmina-core-1.1.7.jarnoggit-0.5.jarognl-2.7.3.jarojdbc14.jaroscache-2.1.jarots.jar((LTSproprietary)saxpath.jarservlet-api.jarslf4j-api-1.7.7.jarslf4j-log4j12-1.7.7.jarsolr-solrj-4.10.1.jarspring-aop-3.2.3.RELEASE.jarspring-batch-core-2.2.2.RELEASE.jarspring-batch-infrastructure-2.2.2.RELEASE.jarspring-batch-test-2.2.2.RELEASE.jarspring-beans-3.2.3.RELEASE.jarspring-context-3.2.3.RELEASE.jarspring-context-support-3.2.3.RELEASE.jarspring-core-3.2.3.RELEASE.jarspring-expression-3.2.3.RELEASE.jarspring-jdbc-3.2.0.RELEASE.jarspring-orm-3.0.5.RELEASE.jarspring-retry-1.0.2.RELEASE.jarspring-test-3.2.3.RELEASE.jarspring-tx-3.2.3.RELEASE.jarstandard.jarstax2-api-3.0.1.jarstaxmate-2.0.0.jarstruts2-core-2.1.8.1.jarstruts2-json-plugin-2.1.8.1.jarswarmcache-1.0RC2.jarutil.jar(LTSproprietary)velocity-1.4.jarvelocity-tools-generic-1.1.jarwoodstox-core-lgpl-4.0.7.jar

Grainne Reilly revision: June 6, 2019 original: Feb 12, 2016

Proposal for Electronic Archiving System (EAS) as Free Open Source Software

30

wordshack-client.jar(LTSproprietary)wstx-asl-3.2.7.jarxercesImpl.jarxml.jarxpp3_min-1.1.4c.jarxstream-1.3.jarxwork-core-2.1.6.jarzookeeper-3.4.6.jar


Top Related