solving real world challenges with enterprise search
DESCRIPTION
Enterprise Search is complex, even in theory. But when you implement your search solution and everything turns to reality, you’ll find some new, never-seen challenges. In this session, I’ll collect the best, biggest and most exciting challenges from my experience, including real world customer scenarios and solutions. Regardless of the SharePoint version you use (SharePoint 2010, FAST Search for SharePoint, SharePoint 2013), this session is for you if you want to prepare for these “unexpected” scenarios.TRANSCRIPT
sponsors
SHAREPOINT AND PROJECT CONFERENCE ADRIATICS 2013 ZAGREB, NOVEMBER 27-28 2013
Real World Challenges in SP SearchAGNES MOLNARINDEPENDENT CONSULTANT,SHAREPOINT SERVER MVP – HUNGARY
Introduction – Agnes MolnarInternational SharePoint Consultant• 10+ Years SharePoint Experience• Information Architecture & ECM• Search
SharePoint Server MVP• 6 Years SharePoint Server MVP• 5+ Years Speaking at Conferences Around the
World• Numerous Books, White Papers, Articles
Contact• E-mail: [email protected]
• Blog: http://aghy.hu
• Twitter: @molnaragnes
Agenda
Challenges
Requirements
Gathering
Compexity of Search
Content Inventory
and Metadata
Security
Sizing and Capacity Planning
Search Analytics
Source - http://financiallyeliteblog.com/wp-content/uploads/2011/04/information-overload.jpg
Information Overload OR Filter Failure?
Enterprise SearchSearch Technology
that your organization owns and controls
Search is Easy…Find is the real challenge!
Source: http://www.domorewithsearch.com
Search as an Application
• Search is no longer the white box• Content lives in disparate locations• Structured and unstructured content lives in different
locations
• Need to aggregate content according to • Process• Context• Customer• Goal• Program• Parameter of any of the above
Search as an Application
User – Context – Content
Context
UsersContent
• Context: Business models & goals, corporate culture, resources
• [Where information is used]
• Content: Document types Objects, structure, attributes, Meta-information
• [How to describe the information]
• Users: Information needs, audience types, expertise, tasks
• [How to Use the Information]
Requirements Gathering
Types of Content
Types of Users
Users’ Behavior
Content Sources Metadata Actions to
Take
Amount of Content
Current “Pain
Points”
The Complexity of Enterprise Information
What we give to the search engine… What the search engine sees…
Title Author Created Date
Modified Date
File Type …
Overview of SharePoint 2013 Preview Installation and Configuration
Alex Yarrow
06/21/2012 10/16/2012 docx …
Explicit metadata versus implicit metadata
DEF Company
Support
ABC Company
ABC shall provide first level technical support to all Licensed Product end users and/or Sublicensed Product customers/users. DEF will provide second level support. DEF shall provide to ABC a primary and a secondary support person to act as the primary interface with ABC’s technical and customer support team. DEF shall provide direct technical support to ABC for all uses of the DEF Software. Support level definitions and responsibilities are set forth in Exhibit C. An “SLA Failure” as defined in Exhibit C shall qualify as a Release Condition sufficient to authorize the Escrow Agent to release to Source Code to ABC pursuant to Section 7 and the Escrow Agreement.
LicenseContent Type =
Organization =
ABCcustomerscustomer supportcustomer support teamDEFDEF softwareend usersescrow agreement.escrow agentexhibit clicensed product
release conditionsection 7secondary supportSLASLA failuresoftwaresource codesupport levelsublicensed producttechnical support
Topic = Forward Index – Words per documentInverted Index – Documents per word
Explicit metadata
Implicit metadata
The Complexity of Search
Data Source
Data Source
Data Source
Data Source
Content Source
Content Source
Content Source
Result Source
Result Source
Remote Search index
Local Search Index
Indexing
Federation
Query RuleQuery Rule Query Rule
Result Block
Result Block
Display Templates
Refinement Panel
Hover Panel
metadata
Result Set
Requirements Gathering
Information-Seeking Patterns
• „I know what I’m searching for and know how to do that”
• „I know what I’m searching for but I don’t know how to do that”
• „I don’t know what I’m searching for”
• „Am I Searching?...”
REAL WORLD EXPECTATIONS
Content Inventory• “I have a lot of content, but I don’t know what to do with
them…”
Content Inventory• SharePoint content (2013, 2010, …)
• Intranet• Department sites• Project sites• Internal KB
• File shares• Sales repository (RFPs, proposals, etc.)• Marketing documents (DMs, brochures, etc.)
• Web sites• Company public web site• Professional Know-How Web Sites
(finance, IT, development, etc.)• Common interest
(stock, management, etc.)
• Exchange Public Folders• Internal communication
• Business Data• Data from databases
• Custom connector• SAP data• CRM data
Search Federation
Crawl or Federate? – Where to get the content from?
• Crawl + Use Local Index:• Examples:
• Intranet• Company file shares
• Pros:• Full control over the index (crawl schedule, metadata included, etc.) and ranking model• Results can be aggregated into one result set• Common refiners (facets)
• Cons:• Needs resources for the crawling process• Needs storage to store the index
• Federate:• Examples:
• Professional know-how web sites (TechNet, MSDN, etc.)• Internet results for a specific topic (financial news, stock information, etc.)• 3rd party Content Management System
• Pros:• Doesn’t need resources to crawl / store the index
• Cons:• Live Internet connection is required• No control over the index• No control over the ranking model• No real aggregation with other result sources
Content Source InventoryName Type Location Owner Volume of
ContentFrequency of Updates
Intranet SharePoint http://intranet Intranet Team 200K items 100-300/hr
Project Sites SharePoint http://projects Delivery 200K items 100-200/hr
Sales share File share \\X:\Sales Sales 500K docs 300-500/hr
Marketing share File share \\X:\Marketing Marketing 200K docs 300-500/hr
Company web site
Web site http://mycompany.com Marketing/Publishing Team
<100K pages 1-10/day
Competitor’s web site
Web site http://competitor.com [external] <100K pages 1-10/day
Professional Know-How
Web site http://www.mykb.com [external] <100K pages 5-10/week
Company Announcements
Exchange Public Folder
Exchange/Public Folders/Announcements
Marketing/Internal Comm. Team
<100K items 5-10/day
HR data Business Data (SQL)
SQL database HR <100K items 10-100/day
CRM data Custom Connector
CRM system Sales 500K entries 500-1000/hr
Metadata in Search• The “glue” of Search Applications
• Crawled property: metadata extracted from the documents/items during the crawl.
• Managed property: mapped to crawled properties, controlled by Search Admins, helping users perform more efficient and successful queries:
• Refiners• Displayed in Search Results• Sorting Properties
Metadata in SearchCrawled Property Managed
Property
Author
CreatedBy
From
Author
Usage
Refiner
Display on Result Set
Display on Hover Panel
Sorting by
Using Managed Properties
Refinement
Result Type & Display Template
On Hover Panel
In Query Rules
Security
Users can see what they have access to.
vs.
Users cannot see what they don’t have access to.
The Search Security ParadoxAs Search is deployed further and further into the
Enterprise, the likelihood of having a security problem increases.
Sizing and Capacity Planning• “Sounds good, but I’m not sure if we have resources for
this…”
Scaling Factors
Content characterist
ics
Search features
Query performanc
e
Document freshness
High availability
Components – Scaling cheat sheet
Component CPU Network Disk Memory
Search administration « « « «
Crawling «« ««« «« ««
Content processing (CPC) ««« «« «««
Analytics processing (APC)
«« ««« «« ««
Index ««« «« ««« «««
Query processing (QPC) « «« ««
Sorting the Results – Relevance Ranking• Requirements:
“I’d like to see ALL the relevant results.”
vs.
“I don’t want to see anything that is not relevant (to me, in this context).”
Sorting the Results – Relevance Ranking
Element Description
Freshness Age of a document compared to the time when the query is issued
Authority Importance of a document determined by the links to it from other documents
Quality Assigned importance of a document, independent of the query
Geo Importance of geographical distance between a document’s associated latitude/longitude and a target location specified in a query
Context Importance of matching a query in a given document field
Proximity For multi-term queries: the shorter the distance between query terms in a document, the higher the document’s rank value
Position The earlier a query term occurs in a field, the higher the document’s rank value
Frequency The more frequent a query term occurs in a document, the higher the document’s rank value
Completeness The greater the number of query terms present in the same field of a matching document, the higher the document’s rank value
Number For multi-term queries; the more query terms matched in a document, the higher the document’s rank value
Reference: Okapi BM25http://en.wikipedia.org/wiki/Probabilistic_relevance_model_(BM25)
Search Analytics“How to Improve the Search Experience?”
Search Analytics in SharePoint 2013• Usage Events – As users interact with content in SharePoint, actions are captured and
stored as events (click a link, press a button, view or open a document).
• Access and create experiences using data captured in the analytics database.
Search Analytics – Examples
Search Analytics – Examples
Conclusions
questions?
HTTP://AGHY.HU
@MOLNARAGNES
thank you.
SHAREPOINT AND PROJECT CONFERENCE ADRIATICS 2013
ZAGREB, NOVEMBER 27-28 2013