implementing semantic search
DESCRIPTION
Semantic search helps business people find answers to pressing questions by wading through oceans of information to find nuggets of meaningful information. In this presentation we’ll discuss how semantic search and content analysis technologies are starting to appear in the marketplace today. We’ll provide a recap of what semantic search is and what the key benefits are, then we’ll answer the following questions: • Is semantic search a feature, an application, or enterprise system? • How can I add semantic search to my existing work processes? • Will I need to replace my existing content technologies? • What will I need to do to prepare my content for semantic search? • Is semantic search just for documents or can I search my data too? • Can I use semantic search to find information on the internet and other public data sources? • Are there standards to consider?TRANSCRIPT
![Page 1: Implementing Semantic Search](https://reader036.vdocuments.net/reader036/viewer/2022062303/5551d215b4c905922b8b5234/html5/thumbnails/1.jpg)
Implementing Semantic Search in the Enterprise
Paul WlodarczykDirector of Consulting Services
Earley & Associates
Amber Swope
1
![Page 2: Implementing Semantic Search](https://reader036.vdocuments.net/reader036/viewer/2022062303/5551d215b4c905922b8b5234/html5/thumbnails/2.jpg)
Questions we will answer today
• What is Semantic Search?• How is Enterprise Search different from Internet
Search?• Why Semantic Enterprise Search?• How do you implement enterprise semantic search?
Examine people, process, technology, and content.• How do I prepare my content to enable semantic
search?• What technologies are there and how do they
differ?• What can I search?
2
![Page 3: Implementing Semantic Search](https://reader036.vdocuments.net/reader036/viewer/2022062303/5551d215b4c905922b8b5234/html5/thumbnails/3.jpg)
What is Semantic Search?
semantic adj. Of or relating to meaning in language or communications.•Semantic search uses language processing to assess the “meaning” of content (documents or web pages) and the “meaning” of search queries to return more relevant results (better matches in meaning)
Key concepts: – Taxonomy, Named Entity, Ontology, Tag
3
![Page 4: Implementing Semantic Search](https://reader036.vdocuments.net/reader036/viewer/2022062303/5551d215b4c905922b8b5234/html5/thumbnails/4.jpg)
Key concept: Taxonomy
taxonomy n. A categorization scheme for content, often hierarchical. Example: the animal kingdom
•Most often, taxonomies show “is a” relationshipsExample:
• A mammal is a vertebrate• A rodent is a mammal• A rabbit is a rodent
4
![Page 5: Implementing Semantic Search](https://reader036.vdocuments.net/reader036/viewer/2022062303/5551d215b4c905922b8b5234/html5/thumbnails/5.jpg)
Key concept: Named Entity
named entity n. A person, organization, place, thing, or event identified in a body of text Entities are distinct from terms in that they are unambiguous.
– e.g. “Washington” is a term that is ambiguous to an entity (the first President, the city, the state, the US Government, the monument).
– A tagged named entity is unambiguous
5
![Page 6: Implementing Semantic Search](https://reader036.vdocuments.net/reader036/viewer/2022062303/5551d215b4c905922b8b5234/html5/thumbnails/6.jpg)
Example: Named Entities
6
![Page 7: Implementing Semantic Search](https://reader036.vdocuments.net/reader036/viewer/2022062303/5551d215b4c905922b8b5234/html5/thumbnails/7.jpg)
Key concept: Ontology
ontology n. A set of relationships between entities. •Often these are in subject-predicate-object [triple] format. •Often ontologies relate entities that exist in multiple taxonomies.
Example: A food chain is a set of relationships (predator/prey) between entities (animals, plants) that exist in different taxonomies (kingdoms). The relationships are triples:
– Rodents eat seeds of grasses. – Fox eats rodents. – Kangaroo rat is a rodent. – Rye is a grass. Etc.
7
![Page 8: Implementing Semantic Search](https://reader036.vdocuments.net/reader036/viewer/2022062303/5551d215b4c905922b8b5234/html5/thumbnails/8.jpg)
How does semantic search work?
• Assess meaning of documents – Identify named entities and
relationships (triples) OR– Categorize documents to
taxonomies OR– Score each document with a
“signature” or “graph”• “Tag” documents for meaning
(categories, entities, triples, semantic signatures, graphs, etc.)
• Index the documents• Assess meaning of search
terms• Match documents to search
terms via common meaning
MeaningMeaning
[search term]
MeaningMeaning
MeaningMeaning
8
![Page 9: Implementing Semantic Search](https://reader036.vdocuments.net/reader036/viewer/2022062303/5551d215b4c905922b8b5234/html5/thumbnails/9.jpg)
Enterprise Search vs. Web Search
Web Search Enterprise Search
Search corpus
Every public webpage – the whole internet
Public documents in the enterprise, departmental docs, plus local docs (My Documents)
Context Generic : Shopping or seeking news and information
Company-specific: Executing a role in a business process
Taxonomies /
categories
Generic – Open Directory Project, Wikipedia, News, etc.
Domain Specific (customers, organization, products, technologies, processes)
Info Security
Information is public Information is secure with role-based access controls
Search algorithms
• Keyword and Link-based• Links = relevancy• Popularity = relevancy• Professionally tagged
• Keyword & tag-based• No links! • No traffic! • Inconsistent metadata tags!
Perfect result
Most popular content Highest quality content
9
![Page 10: Implementing Semantic Search](https://reader036.vdocuments.net/reader036/viewer/2022062303/5551d215b4c905922b8b5234/html5/thumbnails/10.jpg)
Why Semantic Enterprise Search?
• Semantic analysis can provide the context, relevancy, and consistency that is lacking in enterprise content creation and search – Enterprise content lacks the
connectedness that internet search exploits
– “Traffic” is not a clue to relevancy in enterprise search
– Enterprise users do not consistently tag content with metadata
10
![Page 11: Implementing Semantic Search](https://reader036.vdocuments.net/reader036/viewer/2022062303/5551d215b4c905922b8b5234/html5/thumbnails/11.jpg)
Another key difference in Enterprise Search: Social Context
In “enterprise search” is that we know a lot more about “who” is searching and “who” has authored “what”We understand the community a lot better in the enterprise
11
![Page 12: Implementing Semantic Search](https://reader036.vdocuments.net/reader036/viewer/2022062303/5551d215b4c905922b8b5234/html5/thumbnails/12.jpg)
Roadmap for implementing semantic search
1. Implement Enterprise Content Management2. Implement Enterprise Search3. Layer-in semantic analysis to improve search
relevancy
Semantic search isn’t a replacement to Semantic search isn’t a replacement to ECM and enterprise search. It’s a ECM and enterprise search. It’s a “sweetener.”“sweetener.”
ImplementImplementECMECM
ImplementImplementEnterpriseEnterprise
SearchSearch
ExploitExploitSemanticSemanticSearchSearch
12
![Page 13: Implementing Semantic Search](https://reader036.vdocuments.net/reader036/viewer/2022062303/5551d215b4c905922b8b5234/html5/thumbnails/13.jpg)
ECM and Enterprise Search Roll-out
Strategy & Plan
Implement Deploy Maintain
People Use cases and User Experience
Job Redesign, Communities
Training Incentives for participation
Process Content Lifecycle Analysis
Workflow, bus. rules, process redesign
Governance Evergreen process for maintaining IA
Technology
Business & system req’ts, technical architecture
ECM and Search Implementation
Desktop integration (classification, search)
Social tech (ratings, tags, bookmarks)
Content Content Analysis, Information Architecture, Taxonomy dev’t
Content migration
Content classification tools, search tools
Taxonomy maintenance, folksonomy
Strategy & PlanStrategy & Plan ImplementImplement DeployDeploy MaintainMaintain
13
![Page 14: Implementing Semantic Search](https://reader036.vdocuments.net/reader036/viewer/2022062303/5551d215b4c905922b8b5234/html5/thumbnails/14.jpg)
Layer-in Semantic Enterprise Search
Strategy & Plan
Implement Deploy Maintain
People Use cases and User Experience
Job Redesign Training Incentives for participation
Process Content Lifecycle Analysis
Workflow, bus. rules, process redesign
Governance Evergreen process for maintaining IA
Technology
Business & system req’ts, technical architecture
ECM and Search Implementation, Semantic search implementation
Desktop integration (classification, search)
Social tech (ratings, tags, bookmarks), machine learning
Content Content Analysis, Information Architecture, Taxonomy dev’t
Content migration, build triple stores, semantic training sets
Content classification tools, search tools
Taxonomy maintenance, folksonomy
Strategy & PlanStrategy & Plan ImplementImplement DeployDeploy MaintainMaintain
Semantic technologies play a role in content classification – from defining taxonomies and ontologies, to tagging documents, to improving search terms and hits – as well as in search and discovery
Semantic technologies play a role in content classification – from defining taxonomies and ontologies, to tagging documents, to improving search terms and hits – as well as in search and discovery
14
![Page 15: Implementing Semantic Search](https://reader036.vdocuments.net/reader036/viewer/2022062303/5551d215b4c905922b8b5234/html5/thumbnails/15.jpg)
Classify, Navigate, Search, Retrieve Content within the Enterprise
Content Content AuthorAuthor
Check-in & Classify Document or
Content Object
Retrieve Documentor Content
Object
RetrieveUnformatt
edContent
EndEndUserUser
Retrieve Formatted Content
Retrieve Documen
t
EndEndUserUser
EndEndUserUser
15
![Page 16: Implementing Semantic Search](https://reader036.vdocuments.net/reader036/viewer/2022062303/5551d215b4c905922b8b5234/html5/thumbnails/16.jpg)
Strategy and Plan: Key Activities
• Business Objectives: Understand the key business problems that must be solved
• People: Understand actors, roles, and use cases (who creates, who files, who searches, etc.)
• Process: Understand content lifecycle: how you create, maintain, reuse, and publish content
• Technology: Understand existing technology and new requirements for all use cases
• Content: Understand existing content, classification, policies, reuse, multichannel, etc.
16
![Page 17: Implementing Semantic Search](https://reader036.vdocuments.net/reader036/viewer/2022062303/5551d215b4c905922b8b5234/html5/thumbnails/17.jpg)
Strategy and Plan: Deliverables
• Business Objectives: Define the ROI in terms of the key metrics and how they will trend
• People: Actors, roles, and Use Cases elaborated into System And Business Requirements
• Process: Desired state Content Lifecycle defined • Technology: Systems Architecture completed
and new technology modules defined, integration points with existing technology defined
• Content: Information Architecture: How content will be structured, classified, managed, reused, and searched
17
![Page 18: Implementing Semantic Search](https://reader036.vdocuments.net/reader036/viewer/2022062303/5551d215b4c905922b8b5234/html5/thumbnails/18.jpg)
Strategy and Plan: Semantic Search Considerations: Technology
Semantic technologies need to be considered and evaluated as part of the technical architecture, including:
– Categorizers (for auto-tagging, clustering)
– Entity extraction– Triple stores and inference engine– Tag servers– Desktop integration (expose UX into
authoring and search tools)
18
![Page 19: Implementing Semantic Search](https://reader036.vdocuments.net/reader036/viewer/2022062303/5551d215b4c905922b8b5234/html5/thumbnails/19.jpg)
Strategy and Plan: Semantic Search Considerations: Content
• Semantic tools can aid content analysis activities including taxonomy, ontology, and name directory development
• Knowing which semantic approaches will be used for navigation, search, and retrieval (taxonomy, named entity, ontology) will inform the information architecture analysis and content classification
19
![Page 20: Implementing Semantic Search](https://reader036.vdocuments.net/reader036/viewer/2022062303/5551d215b4c905922b8b5234/html5/thumbnails/20.jpg)
Preparing Content for Semantic Search
Strategy & PlanStrategy & Plan ImplementImplement DeployDeploy MaintainMaintain
20
![Page 21: Implementing Semantic Search](https://reader036.vdocuments.net/reader036/viewer/2022062303/5551d215b4c905922b8b5234/html5/thumbnails/21.jpg)
Analyze existing content
• Know what you have– Number of retrievable units?– Size of each retrievable unit?– Current retrieval method?
• Understand its use– Who retrieves it?– When they need it?– How they find it?– How often need it?
• Determine the relationships between retrievable units
21
![Page 22: Implementing Semantic Search](https://reader036.vdocuments.net/reader036/viewer/2022062303/5551d215b4c905922b8b5234/html5/thumbnails/22.jpg)
Key Considerations
• Search Objectives – Who is searching for what? How do they search? How
do they expect to see results? How do they rank quality and relevance?
• Content– Where is it? Federation? What types of documents?
Security issues? Is XML or other special content types involved? Component documents or content reuse?
• User Experience (UX)– What is a balance between user expectations and an
effective UI design? Are you involving users in the design? How can you embed the UX into daily tools (mail, desktop, browser, CMS)?
22
![Page 23: Implementing Semantic Search](https://reader036.vdocuments.net/reader036/viewer/2022062303/5551d215b4c905922b8b5234/html5/thumbnails/23.jpg)
Define content structure
• Define authoring units– Size?– File format?
• Define storage units– Size?– Relationships between
units?
• Define retrieval units– Documents– Components– Topics/chunks
23
![Page 24: Implementing Semantic Search](https://reader036.vdocuments.net/reader036/viewer/2022062303/5551d215b4c905922b8b5234/html5/thumbnails/24.jpg)
Classify content
• Define terms and thesauri• Develop taxonomies
– How many?– Relationship between them?– Where/how stored?
• Apply taxonomy values to content– When are values applied?– Who is responsible for
applying/reviewing?– What can be automated?
• Develop ontologies (if using triples)
24
![Page 25: Implementing Semantic Search](https://reader036.vdocuments.net/reader036/viewer/2022062303/5551d215b4c905922b8b5234/html5/thumbnails/25.jpg)
Define metadata
• Identify what data is needed • Define the values
– How used?– Where/how stored?
• Apply metadata values to content– When are values applied?– Who is responsible for applying/reviewing?– What can be automated?
25
![Page 26: Implementing Semantic Search](https://reader036.vdocuments.net/reader036/viewer/2022062303/5551d215b4c905922b8b5234/html5/thumbnails/26.jpg)
Control content
• Identify relationship between Identify relationship between storage, retrieval and display storage, retrieval and display mechanismsmechanisms– Same?Same?– Different?Different?– Relationship between them?Relationship between them?
• Define storage strategyDefine storage strategy– Where is content stored?Where is content stored?– Where is metadata stored?Where is metadata stored?– Where are deliverables stored (if Where are deliverables stored (if
generated)?generated)?– How many repositories?How many repositories?– Who needs access to each one?Who needs access to each one?
26
![Page 27: Implementing Semantic Search](https://reader036.vdocuments.net/reader036/viewer/2022062303/5551d215b4c905922b8b5234/html5/thumbnails/27.jpg)
Information Architecture for Semantic Search
• Information Architecture
• Structure content for retrieval
• Apply retrieval support at appropriate level
27
![Page 28: Implementing Semantic Search](https://reader036.vdocuments.net/reader036/viewer/2022062303/5551d215b4c905922b8b5234/html5/thumbnails/28.jpg)
What technology does semantic search implementation require?
• Semantic Tagging Technology– “Train” a system to auto-categorize documents; taxonomy server– Named entity extraction; directory server– Analyze against “triples”; triple stores plus inference engines– Augment automatic tags with user tags and refinements
• Semantic Search Technology– Disambiguate search terms to their meaning– Map “meaning” of search term to “meaning” of document– Refine “meaning” of search terms (clustering / similarity: “more like
this”)
• Integration Technology– User experience for check-in, classification and NS&R– Desktop integration with browsers, email, and authoring tools– Integration frameworks to tie semantic services with existing
enterprise search and content management
28
![Page 29: Implementing Semantic Search](https://reader036.vdocuments.net/reader036/viewer/2022062303/5551d215b4c905922b8b5234/html5/thumbnails/29.jpg)
What can I search?
• Content in ECM– By using semantic tags in
ECM metadata
• Content on your desktop– By semantically tagging
and indexing
• Content on the web– By searching semantic
metadata (e.g. RDF, linked data URIs)
• Databases– By using XML Data Stores
to make relational data available as a “document” that can be tagged
29
![Page 30: Implementing Semantic Search](https://reader036.vdocuments.net/reader036/viewer/2022062303/5551d215b4c905922b8b5234/html5/thumbnails/30.jpg)
Standards
• Resource Description Framework (RDF)– Make statements about
resources in triples format
• W3C Semantic Web Standards (“linked data”)– Use URIs to point to
data in the web– Turn web pages into
databases
30
![Page 31: Implementing Semantic Search](https://reader036.vdocuments.net/reader036/viewer/2022062303/5551d215b4c905922b8b5234/html5/thumbnails/31.jpg)
Recap
• Semantic search improves search relevance by matching meaning of search terms to meaning of documents
• Semantic technologies include categorizers, entity extractors, and linguistic analysis of relationships between entities (triplets)
• Semantic technologies are available as plug-ins to enterprise systems, or “baked in” to enterprise systems
• Semantic search requires extra steps along the way in implementing ECM and enterprise search
31