achieving semantic interoperability at the world bank designing the information architecture and...

20
Achieving Semantic Achieving Semantic Interoperability at the Interoperability at the World Bank World Bank Designing the Information Architecture Designing the Information Architecture and Programmatically Processing and Programmatically Processing Information Information Denise Bedford Denise Bedford June 28, 2005 June 28, 2005

Upload: sibyl-gardner

Post on 17-Jan-2018

222 views

Category:

Documents


0 download

DESCRIPTION

Functional Dimension ► Higher level information access functionality  Recommender systems, content syndication, personalization, visualization, etc. focused on user interests and content – regardless of the system context ► Enterprise search and browse  across systems, across all types of content, across languages, functional requirements for enterprise search ► Enterprise publishing from any system to any other system  Without redundancy of content, maintaining respect for intellectual attribution, records management compliance, disclosure compliance, … ► Enterprise content creation & management  Implementing semantic interoperability and information quality standards at the point when an object is born digital throughout its life cycle ► Publishing and sharing our domain-based semantic networks with others working in the same domain to support collaboration

TRANSCRIPT

Page 1: Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford

Achieving Semantic Achieving Semantic Interoperability at the Interoperability at the

World BankWorld BankDesigning the Information Architecture and Designing the Information Architecture and Programmatically Processing InformationProgrammatically Processing Information

Denise BedfordDenise BedfordJune 28, 2005 June 28, 2005

Page 2: Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford

Presentation OverviewPresentation Overview► What semantic interoperability means to usWhat semantic interoperability means to us

Multiple dimensions – functional dimension, domain dimensionMultiple dimensions – functional dimension, domain dimension Multiple challenges – semantics, syntax, data structure Multiple challenges – semantics, syntax, data structure

► Strategy for achieving SIStrategy for achieving SI Ensure that the tools and technologies are used in the most Ensure that the tools and technologies are used in the most

effective and efficient wayeffective and efficient way Ensure that domain experts are involved in the workEnsure that domain experts are involved in the work Maintain an open and highly flexible information environmentMaintain an open and highly flexible information environment Build IQ standards into reference structures which can be Build IQ standards into reference structures which can be

leveraged across many applications and all kinds of contentleveraged across many applications and all kinds of content

► Practice and current status of workPractice and current status of work

Page 3: Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford

Functional DimensionFunctional Dimension► Higher level information access functionalityHigher level information access functionality

Recommender systems, content syndication, personalization, visualization, Recommender systems, content syndication, personalization, visualization, etc. focused on user interests and content – regardless of the system contextetc. focused on user interests and content – regardless of the system context

► Enterprise search and browseEnterprise search and browse across systems, across all types of content, across languages, functional across systems, across all types of content, across languages, functional

requirements for enterprise searchrequirements for enterprise search

► Enterprise publishing from any system to any other system Enterprise publishing from any system to any other system Without redundancy of content, maintaining respect for intellectual Without redundancy of content, maintaining respect for intellectual

attribution, records management compliance, disclosure compliance, …attribution, records management compliance, disclosure compliance, …

► Enterprise content creation & managementEnterprise content creation & management Implementing semantic interoperability and information quality standards at Implementing semantic interoperability and information quality standards at

the point when an object is born digital throughout its life cyclethe point when an object is born digital throughout its life cycle

► Publishing and sharing our domain-based semantic networks with others Publishing and sharing our domain-based semantic networks with others working in the same domain to support collaborationworking in the same domain to support collaboration

Page 4: Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford

StrategyStrategy► Begin with the conceptual modeling task (Erwin models)Begin with the conceptual modeling task (Erwin models)

► Work at the attribute level (attribute reference maps, Work at the attribute level (attribute reference maps, specifications)specifications)

► Identify and reconcile the syntax problems (among the biggest Identify and reconcile the syntax problems (among the biggest challenges initially)challenges initially)

► Address the semantic problems from a master and reference data Address the semantic problems from a master and reference data store perspective store perspective

► Build the enterprise metadata repository and enterprise search Build the enterprise metadata repository and enterprise search systemsystem

► Establish governance processes at the attribute level – each Establish governance processes at the attribute level – each attribute has a different type of behavior, steward – governance attribute has a different type of behavior, steward – governance follows behaviorfollows behavior

Page 5: Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford

SI Practice LevelSI Practice Level► Key to success at the practical level is understanding how to use the Key to success at the practical level is understanding how to use the

technologies to greatest advantagetechnologies to greatest advantage

► UML to model entities and relationshipsUML to model entities and relationships you have to start with a baseline idea of entities and relationships which you you have to start with a baseline idea of entities and relationships which you

refine over timerefine over time In addition you must have an information architecture to frame your SI solutions In addition you must have an information architecture to frame your SI solutions Without the information architecture framework you’re just doing testing and Without the information architecture framework you’re just doing testing and

exploration exploration

► Concept and entity extraction to:Concept and entity extraction to: form base domain vocabularies (both entity and relation vocabularies for domains)form base domain vocabularies (both entity and relation vocabularies for domains) Help scope and define the boundaries of the domainHelp scope and define the boundaries of the domain Define pattern matching rules for some types of entitiesDefine pattern matching rules for some types of entities

► ‘‘Seeded clustering’ to understand and build semantic relationships among Seeded clustering’ to understand and build semantic relationships among concepts and entities within a well-defined domainconcepts and entities within a well-defined domain

► Categorization tools to define concept-level profiles for domains in order to Categorization tools to define concept-level profiles for domains in order to programmatically classify content to domainsprogrammatically classify content to domains

► Summarization and gisting technologies to support human relevance Summarization and gisting technologies to support human relevance judgments and publishingjudgments and publishing

Page 6: Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford

Smart Use of TechnologiesSmart Use of Technologies► Sample structureSample structure

Oracle data classes used to represent Topic Classification Oracle data classes used to represent Topic Classification scheme scheme ► hierarchical taxonomy as reference source for the attribute – hierarchical taxonomy as reference source for the attribute –

TopicTopic► used for Browse, Search, Content Syndication, Personalizationused for Browse, Search, Content Syndication, Personalization

11stst challenge is to architect the hierarchy correctly challenge is to architect the hierarchy correctly ► 3 distinct data classes, not a tree structure with inheritance3 distinct data classes, not a tree structure with inheritance► Allows you to use the three data classes for distinct functions Allows you to use the three data classes for distinct functions

across systems but still enforce relationships across the classesacross systems but still enforce relationships across the classes

Example: Topic, Subtopic, Subsubtopic structure in OracleExample: Topic, Subtopic, Subsubtopic structure in Oracle

Page 7: Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford

3 OracleData

classes

Page 8: Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford

Topic data class

Page 9: Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford

SubtopicData Class

Page 10: Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford

SubsubtopicData class

Page 11: Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford

Relationships across data

classes

Page 12: Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford

Leveraging the StructureLeveraging the Structure► Each subtopic is a knowledge domainEach subtopic is a knowledge domain

► Each subtopic has an extensive concept level definition (1,000 Each subtopic has an extensive concept level definition (1,000 – 5,000+ concepts)– 5,000+ concepts)

► Concepts are controlled vocabularies in their raw formConcepts are controlled vocabularies in their raw form

► Concepts with relationships (extensive per new Z39.19 Concepts with relationships (extensive per new Z39.19 standard) comprise semantic networkstandard) comprise semantic network

► Categorization tools work with topic structure & concept Categorization tools work with topic structure & concept definitions to categorize and index content definitions to categorize and index content

► The following screen shot illustrates how that same structure The following screen shot illustrates how that same structure is embedded into Teragram profile to support categorizationis embedded into Teragram profile to support categorization

Page 13: Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford

Subtopics

Domain concepts

Page 14: Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford

Extensive operators allow us to write

grammatical rules to manage typical semantic

problems

Page 15: Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford

Concept based rules engine allows us to define patterns to

capture other kinds of data

Page 16: Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford

Example of use of Authority Control to capture country

names but extract ‘authorized’ version of

country name

Example of use of a gazetteer + concept

extraction + rules engine to support semantic

interoperability

Page 17: Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford

Use of concept extraction + rules engine to capture Loan #, Credit #,

Project ID#

Page 18: Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford

Processed ContentProcessed Content► Let’s look at some examples of content which has Let’s look at some examples of content which has

been programmatically processedbeen programmatically processed

► Topic classification, geographical region assignment, Topic classification, geographical region assignment, keywording exampleskeywording examples

► Can apply this approach to any kind of content Can apply this approach to any kind of content

► Enables us to build a robust metadata repository Enables us to build a robust metadata repository model, with strong metadata quality, to move model, with strong metadata quality, to move towards SI at the functional leveltowards SI at the functional level

► Also note that we can do this across many languagesAlso note that we can do this across many languages

Page 19: Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford

Impacts & OutcomesImpacts & Outcomes► Information Access impactsInformation Access impacts

Increased precision of searchIncreased precision of search Better control over recall Better control over recall Searching like we talk Searching like we talk Exact match searching – known item searching now a realityExact match searching – known item searching now a reality Metadata based searching now begins to resemble full-text Metadata based searching now begins to resemble full-text

searching but with all the advantages of structure & context, and a searching but with all the advantages of structure & context, and a significant reduction in the amount of noisesignificant reduction in the amount of noise

► Productivity ImprovementsProductivity Improvements Can now assign deep metadata to all kinds of content Can now assign deep metadata to all kinds of content Remove the human review aspect from the metadata captureRemove the human review aspect from the metadata capture Reduce unit times where human review is still usedReduce unit times where human review is still used

► Information Quality impactsInformation Quality impacts All metadata carries the information architecture with itAll metadata carries the information architecture with it Apply quality metrics at the metadata level to eliminate need to Apply quality metrics at the metadata level to eliminate need to

build ‘fuzzy search architectures’ – these rarely scale or improve in build ‘fuzzy search architectures’ – these rarely scale or improve in performanceperformance

Use the technologies to identify and fix problems with our dataUse the technologies to identify and fix problems with our data

Page 20: Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford

Progress To DateProgress To Date► Operational in two systems – document management and library of Operational in two systems – document management and library of

learninglearning

► Retrospectively processed 60,000 documents in 30 hours last Retrospectively processed 60,000 documents in 30 hours last weekend – dramatic improvement to access, quality and increased weekend – dramatic improvement to access, quality and increased semantic interoperability potentialsemantic interoperability potential

► Beginning the reprocessing of 3.7+ million documents in our records Beginning the reprocessing of 3.7+ million documents in our records management system – adding metadata to support search, enable management system – adding metadata to support search, enable browse/search, capture metadata in language of the document to browse/search, capture metadata in language of the document to support cross-language searching (expected 3 month duration)support cross-language searching (expected 3 month duration)

► Reprocessing web content by adding deep metadata following the Reprocessing web content by adding deep metadata following the records management system project records management system project

► System by system, implementing rich enterprise profile, we achieve System by system, implementing rich enterprise profile, we achieve a very high quality degree of semantic interoperabilitya very high quality degree of semantic interoperability