An ontology-based context aware system for Selective Dissemination of Information in a digital library

Download An ontology-based context aware system for Selective Dissemination of Information in a digital library

Post on 26-Jun-2015

105 views

Category:

Education

0 download

Embed Size (px)

DESCRIPTION

ArticuloJournal of Computing; vol. 2, no. 5sers of Institutional Repositories and Digital Libraries are known by their needs for very specific information about one or more subjects. To characterize users profiles and offer them new documents and resources is one of the main challenges of today's libraries. In this paper, a Selective Dissemination of Information service is described, which proposes an Ontology-based Context Aware system for identifying user's context (research subjects, work team, areas of interest). This system enables librarians to broaden users profiles beyond the information that users have introduced by hand (such as institution, age and language). The system requires a context retrieval layer to capture user information and behavior, and an inference engine to support context inference from many information sources (selected documents and users' queries). Ver registro completo en: http://sedici.unlp.edu.ar/handle/10915/5526

TRANSCRIPT

<ul><li> 1. An Ontology-based Context Aware System for Selective Dissemination of Information in a Digital Library Marisa R. De Giusti, Gonzalo L. Villarreal, Agustn Vosou, Juan P. Martnez Abstract Users of Institutional Repositories and Digital Libraries are known by their needs for very specific information about one or more subjects. To characterize users profiles and offer them new documents and resources is one of the main challenges of today's libraries. In this paper, a Selective Dissemination of Information service is described, which proposes an Ontology-based Context Aware system for identifying user's context (research subjects, work team, areas of interest). This system enables librarians to broaden users profiles beyond the information that users have introduced by hand (such as institution, age and language). The system requires a context retrieval layer to capture user information and behavior, and an inference engine to support context inference from many information sources (selected documents and users' queries). Index terms Selective Dissemination of Information, Ontologies, Context aware, Digital Library MarisaRaquelDeGiustiiswithComisindeInvestigacionesCientficasdelaProvinciadeBuenosAires(CICPBA)andProyecto deEnlacedeBibliotecas(PrEBi).Email:marisa.degiusti@sedici.unlp.edu.ar GonzaloLujnVillarrealiswithConsejoNacionaldeInvestigacionesCientficasyTcnicas(CONICET)andProyectodeEnlacede Bibliotecas(PrEBi).Email:gonetil@sedici.unlp.edu.ar AgustnVosouiswithProyectodeEnlacedeBibliotecas(PrEBi).Email:agustinvosou@sedici.unlp.edu.ar JuanPabloMartneziswithProyectodeEnlacedeBibliotecas(PrEBi).Email:dajuam@sedici.unlp.edu.ar 1. INTRODUCTION NIVERSITIES, research centers and most teaching and researching institutions hold andproduceperiodicallyloadsofknowledge andinformation.Besidestheirownknowledge,these institutions are always looking for new means to accessdocumentsfromthesameareaskeptbackin otherresearchcenters.Thiswealthofknowledgemust beproperlyorganizedanddisseminatedinorderto maximize its use. Information and Communication Technologies(ICT)offereverydaynewandbetter tools for cataloging, storing, retrieving and distributing knowledge, and the current tendency seemstopointtoInstitutionalRepositories(IRs)or Digital Libraries (DLs), where information can be managedanduserscanaccesstoneworimproved services.Thecombinationoftheserepositorieswith mostcurrentlyactiveinitiativesforopenintellectual creationsharing suchasOpenAccessMovement (OA), Open Archives Initiative (OAI), Creative U Commons and Budapest Open Access Initiative (BOAI) results in the availability of millions of contentrecords.Onebigchallengeforrepositoriesis tofindwhichmechanismsarethemostsuitablefor obtaining,holdingandofferingtheserecordstotheir users. Incontrasttogeneralpurposesearchengines(suchas Google, Yahoo! and Bing), IRs usually offer high qualityscientificandacademicdocuments.DLsand IRs users researchers, professors and high level studentsarecharacterizedbytheirspecializationin one or more areas of knowledge, and by their permanentinterestforupdatedinformationinthose areas.UsersattentionarecapturedinDLsandIRsby differentservicesorientedtomakeiteasytopublish anddistributetheirwork,andtogainaccesstoother researchworksinsidethesamescope. The addition of new research works must be publicizedassoonaspossiblesointeresteduserscan takeadvantageofthem.Nowanewchallengearises fromthepastdefinition:howtoidentifywhichusers </li></ul><p> 2. mightbeinterestedinadeterminedwork(extended, of course, to thousands of users and millions of documents). Users'profilesarethemainsourcetofindoutwhich additions might be interested for them. A proper profilerepresentationwillenablethesystemtoobtain informationbeyondwhattheuserhasspecified.This way, it could be possible to apply inference mechanisms and comparisons against other users' profiles.Forthatreason,userprofilesmustbeseen like a complex feature rather than just a list of preferences;theremustbeconsideredawholecontext in which profiles exist and relate one to another. Alternativesincludekeyvaluerepresentations,object modelandontologies.Thisworkisfocusedontheuse ofontologiestorepresentusers profilesaspartofa context, because of its dynamic nature to extend profiles from and adapt profiles as the context changes.Advantagesandpossibilitiesgivenbythis representation are considered for a Selective Dissemination of Information (SDI) service in the Intellectual Creation and Dissemination Service (Servicio de Difusin de la Creacin Intelectual, SeDiCI), La Plata National University (UNLP) institutionalrepository. 2. SEDICI, UNLP INSTITUTIONAL REPOSITORY SeDiCIwasbornwiththepurposeofsocializethe knowledgegeneratedinallacademicareasofUNLP, aimingtogivebacktothecommunitytheeffortsput tothePublicUniversity.Thismainpurposecovers othersmorespecific,including: o toofferaservicefordigitaltheses, makingthempublictothewholelocaland internationalcommunityandgeneratinglinks amongresearchers; o to create a local culture about DL use,andtoencourageresearcherstoshare their work in a common space for all disciplines; o toincludeUNLPintootherexisting digitalresourcesnetworks. EventhoughSeDiCIwasfirstcreatedasathesesonly service,itwasalmostimmediatelyextendedtoallsort ofdigitaldocumentstosatisfyusers'needsfromthe differentSchools.Giventheheterogeneousnatureof UNLP,SeDiCIholdsavarietyofdocumentswhich includes scientific papers, pictures, musical documents, conference presentations, research projectsand,ofcourse,theses. 2.1 Inside SeDiCI: services for users SeDiCI users can be document authors, project directors,researchersorsimplyreaders.Alluserscan accessallexistingservicesinsideSeDiCI.Tomentiona fewofthem,userscancreatefoldersandputthere documentsselectedfromasearch.Therealsoexistan online chat, from which users can obtain SeDiCI administrators help on very specific information topics.Userscanalsosubscribetosearchesandthen automatically receive news about new documents added to the repository that match their query accordingtoafreetemporalscheme(every15days, everymonth,everyweek,etc);thiscanbeconsidered asainitialschemeforaSDIservice. 2.2 Outside SeDiCI: interfacing repositories around the world UNLP members and external SeDiCI users always needupdatedcontentsfrommultipledisciplines.To achievethispurpose,SeDiCIplaystheroleofOAI Service Provider over an increasing amount of externalrepositories,exceeding12millioninformation recordssofar.Ascounterpart,SeDiCIalsoplaysthe role ofData Provider, bywhich allworks created inside UNLP and stored/published by SeDiCI are offeredtoanyOAIServiceProvider. UNLPlibrariescanalsoaccessSeDiCIdocumentsvia webservices,andofferalargersetofresourcesto theirownusers,inacompletelytransparentfashion. 3. SDI IN DYNAMIC WEB ENVIRONMENTS InthefieldofDL,manysystemsforcontentbased recommendation have been designed, in which notifications are sent periodically or by request, informingusersaboutexistingresourcesaccordingto their interests. This kind of service, which try to satisfy highly specialized users with very specific needs,arebasedonapredefinedprofilecreatedinthe libraryforeachuser[1][2]. SDI is a process by which users express their informationneedseitherexplicitlyorinferredbythe system and then receive notifications through informationprovidersparticipatingintheSDI.Users profilesmaytakedifferentshapes,fromatextfree query,aSQLqueryorasetofrules. 3. ForasuccessfulSDIitisnecessaryasoliduserprofile configuration,consistingofaselectionoflanguages, documenttypes,publicationyears,countriesoforigin, authorsandanotificationmean.Besides,thesystem mustmakesurethateverytimeauserreceivesnews, he will quickly identify which recommendations correspondtowhichprofiles(giventhatusersmay havemorethanoneprofile). One of the biggest problemof SDI systems is the identificationofusersprofiles.Ingeneral,userscan specifycertainparameterssuchasPreferences,Areas of Interest and probably Subscription to Searches. Howeverthereismuchmorecontextualinformation which belongstotheuserprofile thatusersare notalwaysabletodefine,eitherbecauseofsystem limitations, complexity of the information itself, or simplybecauseusersdonotalwaysrecognizealltheir needs.Consequently itseems evident thatsystems must be extended to improve or optimize users profiles, capturing information from the context, inferring data from each profile and identifying opportunitiesbeyondusersexplicitpreferences. 4. CONTEXT-AWARE DIGITAL LIBRARIES 4.1 The Context Thereexistmanydefinitionsforcontext;mostofthem havesimilarcharacteristicsforthescopeofthiswork. The concept of context has been studied by philosophers,psychologists, linguisticsandrecently engineers.Fromthecomputersciencepointofview, contexthasbeendefinedasformal,abstractandfirst class free of representation citizens in Logic and ArtificialIntelligence;asroutinesoverasetofentities inProgrammingLanguages;andassortedsetofpairs withsomeoperationsamongtheminSystems[3]. RetoKrummenacheryThomasStrang[4]havedefined context as any information that can be used to characterizethesituationofanentity.Anentitycanbe aperson,aplaceoranobjectconsideredrelevantfor theinteractionbetweentheuserandtheapplication, includingtheuserandtheapplicationitself. InthepaperContextAwareRetrievalinWebBased Collaborations[5], authors define context as any information used to define the user environment, and they highlight the difference between Current Context(CC)andotherusercontexts. Ingeneral,contextdefinitionsincludeboththeuser and the information associated directly with each user,whichistheuserprofile.Thisisnottoodifferent inthefieldofdigitallibraries,sincethecontextofthe user is made up of a set of areas of interest (or researchareas),the user work groupmembers,all resources selectedor downloadedbythe user and evenallqueriesmadetothesystem. 4.2 Context representation through Ontologies Thereexistmanywaystorepresentthecontext: o viaanObjectModel; o usingamarkuplanguage; o withasetofkeyvaluepairs; o basedonlogic; o usinggraphics. Traditionally, the model of the context is created followingatopdownmechanism:firsttheapplication anditsfunctionalityisdefined,andthenthenecessary ontologiesforthecontextaredeveloped.Ontologies are commonly used to formalize taxonomies that representtypesandvaluesofsimpleproperties.But thereismorebehindtheontologybasedmodeling.A generic and reusable ontology will have a direct impact in the interoperability of contextaware systems,andthereforewillhaveadirectinfluenceon the speed to create, implement and integrate new applications.Awelldesignedmodelisakeyfactorto access the contextas well as to adapt tochanges, whichisverycommonindynamicsystems. IntheworkAContextModeling Survey[6], authors identify6mainrequirementsthatanycontextmodel appliedtoubiquitoussystemsmustaccomplish: o DistributedComposition:ubiquitoussystems derive directly from distributed systems. The compositionandadministration ofthe context modelanditsdatahasacleardynamismthat variesintermsoftime,networktopologyand sourceofinformation. o PartialValidation:itisdesirabletovalidate contextualknowledge bothinthestructure as wellasintheinstancelevel,evenifthemodelis not located in one single node because of its distributed nature. Given the complexity of interrelations in the context, it turns very importanttovalidateit. o Richness and quality of information: informationsourcestocharacterizeentitiesare 4. very different, but this should not affect the qualityoftheinformation. o Lack of complexity and ambiguity: in particular,iftheinformationisretrievedfrom sensorsorinferredfromotherinformationsets. The model should be able to interpolate incompletedata,onlyifthisinterpolationdoes notaffectdataquality. o Level of formality: to describe facts and relationshipsinapreciseandtraceableway. o Applicability to existing environments: a contextmodelmustallowitsuseinpreexisting computingenvironmentinfrastructure. Ontologiesareapowerfultooltospecifyconceptsand relationships. They provide formalizations to map reallife entities to computerenabled data constructions. In this sense, ontologies provide a uniformmethodologyforspecifyingmodelconcepts, subconcepts,relationships,propertiesandfacts,and all this provide the basis for context knowledge sharing and information reusing. With this information, computer applications can determine contextual compatibility, compare contextual facts, infernewfactsandevennewcontexts.Thepossibility toinfernewcontextresultsparticularlyinterestingfor thelackofcompletenessofthecontext. Althoughtheuseofontologieshasmanyadvantages, itisnotalwayseasytodifferentiatethemfromusing other methodologies. Object Oriented Models also provideclasshierarchy,andthereforetheypermitat leastalimitedformalizationofinstancesandclasses dependencymodels.Inadditiontothat,OOmodels justlikeontologybasedmodelspermittoachievethe distributed composition requirement; partial validationinOOmodelsisalsopossible,typically using a compiler at the structure level and an executionenvironmentattheinstancelevel[6].Thusit isnecessarytoanalyzetheneedsofthecomputing applicationsinvolvedineachcase.Conclusionslead toassertthatanimproveduserexperienceisgenerally basedondataprovidingfromsensorsanddifferent informationsources.Thismakesapplicationsstrongly boundtothecontextrequiretocopewithmoreand more heterogeneous data (which also counts for ambiguity, quality and contextual data validation problems).OOmodelsrequire,for instance,a low levelimplementationoftheserelationshipstoachieve interoperabilityandthereforetheynotalwaysresult properforknowledgesharinginopenanddynamic environments[7]. Thereexistmanyalternativestorepresentthecontext throughontologies.In [8]authors findnecessaryto normalize and combine knowledge coming from different domains, and they propose a highly normalizedandformalontologybasemodel.Inthe year 2003 the language CoOL (Context Ontology Language, [9]) was introduced. This language, derivedfromASC(AspectScaleContext)model,can beusedtoenablecontextsensibilityandcontextual interoperability during service discovering and execution.Theproposedarchitectureisdistributed, andhasamongitselementacorewithareasoner,able toinferconclusionsaboutthecontextbasedonan ontology defined with CoOL. This ability to infer information from preexisting data is particularly interestingforthescopeofthiswork,aswillbeseen below. FollowingasimilarlineisCONON(Wangetal.[9]). EventhoughtheideaisbasicallythesameasCoOL (knowledgereuseability,logicinference,knowledge s...</p>