ieee metadata-conf-1999-keynote-amit sheth

Click here to load reader

Upload: knoesis-center-wright-state-university

Post on 07-May-2015

375 views

Category:

Technology


0 download

DESCRIPTION

Amit Sheth, "Semantic Interoperability and Information Brokering in Global Information Systems," Keynote given at IEEE Meta-Data, Bathesda, MD, April 6 1999.

TRANSCRIPT

  • 1.Bethesda, Maryland, April 6, 1999Amit ShethLarge Scale Distributed Information Systems LabUniversity of Georgiahttp://lsdis.cs.uga.edu

2. Three perspectives to GlobISautonomy Information Integration Perspectivedistributionheterogeneity(terminological,semantic contextual) Information Brokering Perspective meta-datadata knowledge informationVision Perspectiveconnectivity computing data 3. Evolving targets and approaches in integratingdata and information (a personal perspective) a society for ubiquitous exchange of (tradeable)information in all digital forms of representation;information anywhere, anytime, any formsGeneration III ADEPT,DL-II projects 1997...InfoQuiltGeneration IIInfoSleuth, KMed, DL-I projectsVisualHarnessInfoscopes, HERMES, SIMS, 1990s InfoHarnessGarlic,TSIMMIS,Harvest, RUFUS,...Generation I MermaidMultibase, MRDSM, ADDS, 1980sDDTS IISS, Omnibase, ... 4. Generation IData recognized as corporate resource leverage it! Data predominantly in structured databases, different data models,transitioning from network and hierarchical to relational DBMSs Heterogeneity (system, modeling and schematic) as well as need tosupport autonomy posed main challenges;major issues were data access and connectivity Information integration through Federated architecture Support for corporate IS applications as the primary objective,update often required, data integrity important 5. Generation I(heterogeneity in FDBMSs)Database SystemSemantic HeterogeneityDifferences in DBMS data models (abstractions, constraints, query languages) 1980s System level support (concurrency control, commit, recovery) C Operating System o file system m naming, file types, operation m transaction support u IPC n 1970sHardware/System i c instruction seta data representation/coding t configurationi o n 6. Generation I(Federated Database Systems: Schema Architecture) External External Dimensions for Schema Schemainteroperability andintegration: Federated... distribution, autonomySchemaschemaand heterogeneityintegrationExport Export Export ...SchemaSchema SchemaModel Heterogeneity:Component ... Component Common/Canonical SchemaSchema Data Modelschema translationSchema TranslationLocal ... Local SchemaSchema Information sharingwhile preserving Component... Componentautonomy DBSDBS 7. Generation I(characterization of schematic conflicts in multidatabase systems)Schematic ConflictsDomain DefinitionData ValueAbstraction Level SchematicEntity Definition Incompatibility IncompatibilityIncompatibilityDiscrepanciesIncompatibility Naming ConflictsKnown GeneralizationData Value NamingInconsistencyConflictsAttribute ConflictsData RepresentationConflict Database ConflictsTemporalAggregationInconsistencyConflicts Entity AttributeIdentifier Data ScalingConflictsConflictConflicts AcceptableInconsistency Data Value SchemaData PrecisionIsomorphism Entity ConflictConflicts Conflicts Default Value Missing Data Conflicts BUT Items Conflictsthese techniques for dealing with schematic Attribute Integrity Sheth & Kashyap, Kim & SeoConstraint Conflictsheterogeneity do not directly map to dealing with much larger variety of heterogeneous media 8. Generation II Significant improvements in computing and connectivity (standardizationof protocol, public network, Internet/Web); remote data access as given; Increasing diversity in data formats, with focus on variety of textual dataand semi-structured documents Many more data sources, heterogeneous information sources,but not necessarily better understanding of data Use of data beyond traditional business applications:mining + warehousing, marketing, e-commerce Web search engines for keyword based querying against HTML pages;attribute-based querying available in a few search systems Use of metadata for information access; early work on ontology supportdistribution applied to metadata in some cases Mediator architecture for information management 9. Generation II(limited types of metadata, extractors, mappers, wrappers)NexisDigital VideosUPIAP ...... DocumentsData StoresGlobal/EnterpriseDigital MapsWeb Repositories...Digital ImagesDigital AudiosFind Marketing Manager positions in acompany that is within 15 miles of SanFrancisco and whose stock price hasbeen growing at a rate of at least 25%EXTRACTORSper year over the last three years Junglee, SIGMOD Record, Dec. 1997METADATA 10. Generation II(a metadata classification: the informartion pyramid) METADATA STANDARDSUserGeneral Purpose: OntologiesDublin Core, MCF ClassificationsMove in this Domain ModelsDomain/industry specific:direction to Geographic (FGDC, UDK, ), Domain Specific MetadatatackleLibrary (MARC,) area, population (Census),informationland-cover, relief (GIS),metadataoverload!! concept descriptions from ontologiesDomain Independent (structural) Metadata(C++ class-subclass relationships, HTML/SGMLDocument Type Definitions, C program structure...)Direct Content Based Metadata(inverted lists, document vectors, WAIS, Glimpse, LSI) Content Dependent Metadata(size, max colors, rows, columns...)Content Independent Metadata(creation-date, location, type-of-sensor...) Data(Heterogeneous Types/Media) 11. VisualHarness an example 12. Whats next (after comprehensive use of metadata)? Query processing and information requestsNOW traditional queries based on keywords attribute based queries content-based queriesNEXT high level information requests involvingontology-based, iconic, mixed-media, andmedia-independent information rrequests user selected ontology, use of profiles 13. GIS Data Representation Examplemultiple heterogeneous metadata models with different tag names for the same data in the same GIS domain Kansas StateFGDC Metadata Model UDK Metadata ModelTheme keywords: digital line graph,Search terms: digital line graph,hydrography, transportation...hydrography, transportation...Title: Dakota Aquifer Topic: Dakota AquiferOnline linkage:Adress Id: http://gisdasc.kgs.ukans.edu/dasc/ http://gisdasc.kgs.ukans.edu/dasc/Direct Spatial Reference Method: Vector Measuring Techniques: VectorHorizontal Coordinate System Definition: Co-ordinate System: Universal Transverse Mercator Universal Transverse Mercator ... ... 14. Generation III Increasing information overload and broader variety of informationcontent (video content, audio clips etc) with increasing amount of visualinformation, scientific/engineering data Continued standardization related to Web for representational and metadataissues (MCF, RDF, XML) Changes in Web architecture; distributed computing (CORBA, Java) Users demand simplicity, but complexities continue to rise Web is no longer just another information source, but decision support throughdata mining and information discovery, information fusion, informationdissemination, knowledge creation and management, information managementcomplemented by cooperation between the information system and humansInformation Brokering Architecture proposed for information management 15. Information Brokering: An Enabler for the InfocosmINFORMATION CONSUMERSarbitration between information People consumers and providers for resolvingCorporations Programsinformation impedanceUniversitiesGovernmentInformation Information InformationUser User UserRequest Request RequestQueryQueryQueryINFORMATION/DATAINFORMATION BROKERINGOVERLOADInformation DataInformation Information DataInformationSystemRepositorySystemSystemRepositorySystem Newswires Corporations dynamic reinterpretation of information requests for determination of relevant Universities Research Labs information services and products INFORMATION PROVIDERS dynamic creation and composition of information products 16. Information Brokering: Three DimensionsTHREE DIMENSIONS C O N S U M E R SB R O K E R S VOCABULARY M E T A D A T AP R O V I D E R S S E M A N T I C S D A T A S T R U C T U R ES Y N T A XS Y S T E M Objective:Reduce the problem of knowing structure and semantics of data in the hugenumber of information sources on a global scale to: understanding and navigating a significantly smaller number of domain ontologies 17. What else can Information Brokering do? W W W + Information Brokering WWW Domain Specific Ontologies asa confusing heterogeneity of media,semantic (Tower of Babel) formats conceptual views information correlation usingusing concept Information correlation physical (HREF) mappings at the extensional data level level links at the intensional concept Browsing of information using informationlocation dependent browsing of terminologicalusing physical (HREF) linksrelationships across ontologies user has to keep track of information content !! Higher level of abstraction, closer to user view of information !! 18. Concepts, tools and techniques to support semanticscontextsemantic proximity inter-ontological relationsmedia-independent information correlations ontologies(esp. domain-specific)profilesdomain-specific metadata 19. Tools to support semantics Context, context, context Media-independent information correlations Multiple ontologies Semantic Proximity (relationships between concepts within and across ontologies) using domain, context, modeling/abstraction/representation, state Characterizing Loss of Information incurred due to differences in vocabulary BIG challenge:identifying relationship orsimilarity between objects of different media,developed and managed by different persons and systems 20. Heterogeneity... is a Babel Tower!! SEMANTIC HETEROGENEITYmetadataontologies contextsSEMANTIC INTEROPERABILITY 21. The InfoQuilt Project THE INFOQUILT VISIONSemantic interoperability between systems, sharing knowledgeusing multiple ontologiesLogical correlation of informationMedia independent information processing REALIZATION OF THE VISIONfully distributed, adaptable, agent-based systeminformation/knowledgement supported by collaborativeprocesses http://lsdis.cs.uga.edu/proj/iq/iq.html 22. InfoQuilt Project: using the Metadata REFerence link MREF Complements HREF, creating a logical web through media independent ontology & metadata based correlationIt is a description of the information asset we want to retrieve Semantic Correlation using MREFMREF Concept constraints relations attributesModel for logical correlation usingdomain ontologiesontological termsMREFIQ_Asset ontology +and metadataextension ontologies Framework forRDFrepresenting MREFs MREFSerialization(one implementation XML keywordscontent attributeschoice) (color, scene cuts, ) http://lsdis.cs.uga.edu/proj/iq/iq.html 23. Domain Specific Correlation example Potential locations for a future shopping mall identified by allregionshaving apopulationgreater than 5000, andareagreater than 50 sq. ft. having an urbanland cover and moderaterelief 5000; area > 50;region-type = block; land-cover = urban; relief = moderate) can be viewed here domain specific metadata: terms chosen from domain specific ontologies Population: Area: =>media-independent relationshipsbetween domain Boundaries: specific metadata:population, Regions Land cover: area, land cover, relief (SQL):Image Features Relief: (image processingroutines)=>correlation between imageBoundaries and structured data at a higher domain specific level asopposed to physical link- chasing in the WWW Census DB TIGER/Line DB US Geological Survey 24. Domain Specific Correlation example 25. A DL II approach for Information Brokering Iscape 1 Iscape N CONSTRUCTING APPROPRIATE INFORMATION LANDSCAPESCONSTRUCTING ADDITIONAL META-INFORMATION RESOURCES DISCOVERING COLLECTIONS OF HETEROGENEOUS INFORMATION AND META-INFORMATION RESOURCES Domain SpecificDomainOntologies Independent ImagesData Stores Documents Digital MediaOntologies Physical/SimulationWorld 26. ADEPT Information Landscape Concept Prototype(a scenario for Digital Earth: learning in the context of the El Nio phenomenon)Sample Iscapes Requests: How does El Nio affect sea animals? Look for broadcast videos of less than 2 minutes. How are some regions affected by El Nio? Look at request information usingEast/West Pacific regions. keywords What disasters have been related to El Nio? domain-specific attributes What storm occurrencesattributes domain-independent are attributed to El Nio? Show reports related to El Nio that contain Clinton.TRY ISCAPE CONCEPT DEMO 27. Putting MREFs to workIQ_Asset ontology +extension ontologies domain ontologies MREF Builder MREF Userconstruct new MREFrepository MREF repositoryUserAgentUserProfileBroker Agent profilesManager 28. Context: the lynchpin of semantics CricketFor instance, if you were to use Yahoo! or Infoseek tosearch the web for pizza, your results would probablybe hundreds of matches for the word pizza. Many ofthese could be pizza parlors around the world. Yet ifyou run the same search within NeighborNet, you willallows you to order pizza to be delivered instead ofshipped.From a Press Resease of FutureOne, Inc. March 24, 1999http://home.futureone.com/about/pr/021699.asp 29. Constructing c-contexts from ontological terms C-CONTEXT: All documents stored in the database have been published by some agencyDATABASEOBJECTS=> Cdef(DOC) = AGENCY(RegNo, Name, Affiliation) C-Context = DOC(Id, Title, Agency)a collection of contextual coordinatesCi s(roles) and valuesVi s(concepts/concept descriptions)AgencyConcept Advantages:Document ConceptUse of ontologies for an intensionaldomain specific description of dataRepresentation of extra information Relationships between objects notONTOLOGICAL TERMSrepresented in the database schema Using terminological relationships in the ontology 30. Using c-contexts to reason about EXAMPLEinformation in database Cdef(DOC)CQ glb(Cdef(DOC), CQ) - Reasoning with c-contexts: glb(Cdef(DOC), CQ)- Ontological Inferences:- DocumentConcept- (hasOrganization, { USGS }) Challenge 1: use of multiple ontologies Challenge 2: estimating the loss of information 31. Estimating information loss for multi-ontology basedquery processing in the OBSERVER/InfoQuilt systemOBSERVER architectureData RepositoriesIRM OntologyServerMappings OntologiesInterontologiesTerminologicalQueryUserRelationships ProcessorQuery IRM NODEUSER NODECOMPONENT NODE COMPONENT NODEOntologyOntology ServerServerMappings MappingsQuery Ontologies QueryOntologiesProcessorProcessor Data Repositories Data Repositories Eduardo Mena (III98) 32. Estimating information loss for multi-ontology basedquery processing in the OBSERVER/InfoQuilt systemQuery construction - Example Get title and number of pages of books written by Carl Sagan User ontology: WN [name pages] for (AND book (FILLS creator Carl Sagan)) Target ontology: Stanford-I Integrated ontology WN-Stanford-I [title number-of-pages] for(AND book (FILLS doc-author-name Carl Sagan)) Ontologies sites: http://www.cogsci.princeton.edu/~wn/w3wn.html http://www-ksl.stanford.edu/knowledge-sharing/ontologies/html/bibliographic-data/Eduardo Mena (III98) 33. Estimating information loss for multi-ontology basedquery processing in the OBSERVER/InfoQuilt system Query construction - Example Re-use of Knowledge:Biblio-Thing Bibliography Data OntologyStanford-I Get title and number of pages of books written by Carl SaganDocument Conference AgentUser ontology: WNPersonOrganization [name pages] for AuthorBookTechnical-Report (AND book (FILLS creator Carl Sagan)) Publisher UniversityMiscellaneous-PublicationProceedingsTarget ontology: Stanford-IEdited-Book ThesisIntegrated ontology WN-Stanford-IPeriodical-PublicationTechnical-ManualCartographic-Map [title number-of-pages] forDoctoral-ThesisComputer-ProgramMultimedia-DocumentJournalNewspaper (AND book (FILLS doc-author-name Carl Sagan)) Master-Thesis ArtworkMagazine Ontologies sites: http://www.cogsci.princeton.edu/~wn/w3wn.html http://www-ksl.stanford.edu/knowledge-sharing/ontologies/html/bibliographic-data/Eduardo Mena (III98) 34. Estimating information loss for multi-ontology basedquery processing in the OBSERVER/InfoQuilt system Re-use of Knowledge:Query construction - Example Print-MediaA subset of WordNet 1.5 Get title and number of pages of books written by Carl JournalismPressPublication SaganUserNewspaper ontology: WNMagazine PeriodicalBook [name pages] for Journals Pictorial SeriesTrade-BookBrochure (AND book (FILLS creator Carl Sagan)) TextBook SongBookReference-BookPrayerBookTarget ontology: Stanford-I CookBook EncyclopediaIntegrated ontology WN-Stanford-I WordBook Instruction-Book HandBookDirectory Annual [title number-of-pages] forGuideBook(AND book (FILLS doc-author-name Carl Sagan))Manual Bible Ontologies sites: http://www.cogsci.princeton.edu/~wn/w3wn.htmlInstructions Reference-Manual http://www-ksl.stanford.edu/knowledge-sharing/ontologies/html/bibliographic-data/Eduardo Mena (III98) 35. Estimating information loss for multi-ontology based query processing in the OBSERVER/InfoQuilt systemWN ontology and user queryQuery construction - ExampleGet title and number of pages of books written by Carl Sagan User ontology: WN[name pages] for(AND book (FILLS creator Carl Sagan)) Target ontology: Stanford-I Integrated ontology WN-Stanford-I[title number-of-pages] for (AND book (FILLS doc-author-name Carl Sagan))Ontologies sites: http://www.cogsci.princeton.edu/~wn/w3wn.htmlhttp://www-ksl.stanford.edu/knowledge-sharing/ontologies/html/bibliographic-data/Eduardo Mena (III98) 36. Estimating information loss for multi-ontology basedquery processing in the OBSERVER/InfoQuilt systemEstimating the loss of information To choose the plan with the least loss To present a level of confidence in the answer Based on intensional information (terminological difference) Based on extensional information (precision and recall) Plans in the example User Query: (AND book (FILLS doc-author-name Carl Sagan)) Plan 1: (ANDdocument(FILLS doc-author-name Carl Sagan)) Plan 2: (ANDperiodical-publication (FILLS doc-author-name Carl Sagan)) Plan 3: (ANDjournal(FILLS doc-author-name Carl Sagan)) Plan 4: (ANDUNION(book, proceedings, thesis, misc-publication, technical-report)(FILLS doc-author-name Carl Sagan)) Eduardo Mena (III98) 37. Estimating information loss for multi-ontology basedquery processing in the OBSERVER/InfoQuilt systemLoss of information based on intensional information User Query: (AND book (FILLS doc-author-name Carl Sagan)) Plan 1: (ANDdocument (FILLS doc-author-name Carl Sagan)) book:=(AND publication (AT-LEAST 1 ISBN)) publication:=(AND document (AT-LEAST 1 place-of-publication)) Loss:Instead of books written by Carl Sagan, OBSERVER is providing all the documents written by Carl Sagan (even if they do not have an ISBN and place of publication) Eduardo Mena (III98) 38. Estimating information loss for multi-ontology basedquery processing in the OBSERVER/InfoQuilt systemExample: loss for the plansPlan 1:(AND document (FILLS doc-author-name Carl Sagan))[case 2]91.57% < (1-Loss) < 91.75%Plan 2: (AND periodical-publication (FILLS doc-author-name Carl Sagan))94.03% < (1-Loss) < 100%[case 3]Plan 3: (AND journal (FILLS doc-author-name Carl Sagan))[case 3]98.56% < (1-Loss) < 100%Plan 4: (AND UNION(book, proceedings, thesis, misc-publication, technical-report) (FILLS doc-author-name Carl Sagan)) [case 1]0% < (1-Loss) < 7.22% Eduardo Mena (III98) 39. Summary Knowledge Mgmt., Visual,Information Knowledge Semantic Scientific/Eng. Brokering,Cooperative IS Structural, Mediator,Semi-structuredMetadata SchematicFederated ISTextSyntax, DataFederated DBStructured DatabasesSystem 40. Agenda for researchInteroperation not at systems level, but at informational andpossibly knowledge level traditional database and information retrieval solutions do not suffice need to understand context; measures of similaritiesNeed to increase impetus on semantic level issues involvingterminological and contextual differences, possible perceptualor cognitive differences in future information systems and humans need to cooperate, possible involving a coordination and collaborative processes 41. Related Reading Books: Information Brokering for Digital Media, Kashyap and Sheth, Kluwer, 1999 (to appear) Multimedia Data Management: Using Metadata to Integrate and Apply Digital Media, Sheth and Klas Eds, McGraw-Hill, 1998 Cooperative Information Systems, Papazoglou and Schlageter Eds., Academic Press, 1998 Management of Heterogeneous and Autonomous Database Systems, Elmagarmid, Rusinkiewica, Sheth Eds, Morgan Kaufmann, 1998. Special Issues and Proceedings: Formal Ontologies in Information Systems, Guarino Ed., IOS Press, 1998 Semantic Interoperability in Global Information Systems, Ouksel and Sheth, SIGMOD Record, March 1999. http://lsdis.cs.uga.edu Acknowledgements: [See publications on Metadata, Semantics,Context, Tarcisio Lima InfoHarness/InfoQuilt] Vipul Kashyap [email protected]