Content Enrichment in SharePoint Search
#SPSBE12Steven Van de CraenApril 26th, 2014
Thanks to our sponsors!
Gold
Silver
About me Steven Van de Craen
SharePoint
enthousiast
Ventigrate
Since 2005
Overview What is it?
A FAST history
SharePoint 2013
#demo
Practical use
#demo
WCF Routing Technology (.NET 4.0)
#demo
Wrap-up
Resources
What is it?
“Content Enrichment is about manipulating crawled content before it is added to the search index.”
Add or modify properties of crawled items
Add information from an external system
Advanced processing on raw data
A FAST history
Enterprise-class search
SharePoint 2001 Search
SharePoint 2003 Search
MOSS 2007 Search
SharePoint 2010 Search
FAST ESP
FAST Search Server 2010 for SharePoint
SharePoint 2013 Search
A FAST history
Custom pipeline extensibility
Registration via XML
A FAST history
Custom pipeline extensibility
“Callout” = executable
A FAST history
Custom pipeline extensibility
For each crawled item
Synchronous
Optimize for performance
Visual Studio Profiling Tools
Startup penalty
200ms to
process a
single item
10 million
items
23 days
SharePoint 2013
Content Enrichment web service (CEWS) callout
“Callout” = web service
Conditionally via triggers
Synchronous
Process properties or raw data
High Availability / Load Balancing
Optimize for performance
Startup penalty is minimized
SharePoint 2013
Content Enrichment web service (CEWS) callout
Registration via PowerShell
SharePoint 2013
Content Enrichment web service (CEWS) callout
Registration via PowerShell
Configuration property Description Default valueEndpoint Specifies the URL of the external web service. Empty.
InputProperties The managed properties that the external web service receives. Empty.
OutputProperties The managed properties that the external web service returns. Empty.
Timeout The amount of time until the web service times out in milliseconds. Depending on FailureMode, the item fails to be processed or a warning is written to the ULS log.
5000 milliseconds; Valid range [100, 30000].
SendRawData Enables or disables sending raw data to the web service. False.
MaxRawDataSize The maximum size of raw data sent to the web service in kilobytes (KB). If the binary data of an item exceeds this limit, the item is not sent. This does not prevent the InputProperties from being sent, and the OutputProperties from being received.
5120 kilobytes.
FailureMode
Controls the behavior of the web service client when errors occur. When FailureMode is set to ERROR, any problems that occur during content enrichment processing send a failed callback for that particular item. When FailureMode is set to WARNING, the item is indexed, without any modifications by the web service and a warning is written to the ULS log.
Error.
DebugMode
A mode that when set to true enables the content enrichment client to send all managed properties to the client without expecting any properties in return. Any configured Trigger property, InputProperties property, and OutputProperties property are ignored.
False.
Trigger A Boolean predicate that is executed on every crawled item. If the predicate evaluates to true, the record is sent to the web service. Otherwise, the item is passed through to the search index.
Empty.
SharePoint 2013
Content Enrichment web service (CEWS) callout
Trigger conditions
Determine if a callout is needed
Uses Managed Properties, Operators, Constants and Functions
Property1 > Property2
Property1 > 600
IsNull(Property2)
StartsWith(Property1, “sample”) AND Property2 != 18
IsDay(Property1, 2014, 04, 26)
SharePoint 2013
Content Enrichment web service (CEWS) callout
SOAP-based WCF service implementing IContentProcessingEnrichmentService
Microsoft.Office.Server.Search.ContentProcessingEnrichment.dll
C:\Program Files\Microsoft Office Servers\15.0\Search\Applications\External
SharePoint 2013
Content Enrichment web service (CEWS) callout
SharePoint 2013
Content Enrichment web service (CEWS) callout
Limitations
1 WCF per CEWS per SSA
Raw data message limit
#demo A taste of CEWS
Practical use OCR and data extraction
Image recognition and tagging
Barcode scanning
BBAN/IBAN number normalization
LOB data tagging/enrichment
#demo A real world example
WCF Routing Technology (.NET 4.0)
Enables development of complex routing logic, load-balancing, and fault tolerance.
Routing based on predefined or custom filters
Fault tolerance through backup endpoints
Load balancing through custom filters
#demo Breaking through the limit
Wrap-up Service oriented
Raw Data and/or Managed Properties
PowerShell
Synchronous
Routing Service
Trigger Expression Syntax
Resources Custom content processing with the Content
Enrichment web service callouthttp://bit.ly/1j1UEvH
How to: Use the Content Enrichment web service callout for SharePoint Serverhttp://bit.ly/1l3wLK3
Trigger expressions syntax in SharePoint 2013http://bit.ly/1hVSR97
Advanced Content Enrichment in SharePoint 2013 Searchhttp://bit.ly/1j25Ua9
Content enrichment service scaling and aggregationhttp://bit.ly/1h4HZpt
Routing Servicehttp://bit.ly/1jKIVAP
Message Filtershttp://bit.ly/1keu0ls