integrating google search appliance with mura cms
DESCRIPTION
An overview of integrating Google Search Appliance with Mura CMS. Presented at MuraCon 2012 by Ajay Sathuluri.TRANSCRIPT
Integrating Google Search Appliance
with Mura CMS
Ajay Sathuluri@sathuluri
Ajay Sathuluri Sr. Architect at ICF International Using ColdFusion since ’98 Server Tuning, Administration, Load Testing I like spending time with my kids and wife.
About Me
Google Search Appliance Configuring a Crawl Control Access to Content Configuring Database Crawl Collections / Front Ends Crawl Diagnostics
Configuring GSA with Mura CMS Plugin (FW/1) Search Search Results
What are we covering?
Google Search Appliance - Home
Before starting a crawl, you must configure the crawl path so that it only includes information that you wants to make available in search results.
Use the Crawl and Index > Crawl URLs page in the Admin Console to enter URLs
URLs are case-sensitive. Configure your network to disallow search appliance
connectivity outside of your intranet.
Configuring a Crawl
Google Search Appliance – Crawl URL
Demo
Configuring a Crawl
robot.txt meta tag no-crawl Directories
Control Access to Content
robot.txt The Google Search Appliance always obeys the rules in
robots.txt and it is not possible to override this feature. robots.txt file is not mandatory. It is located in the Web server's root directory. For the search appliance to be able to access the
robot.txt file, the file must be public. Includes one or more Disallow: or Allow: User-agent: gsa-crawler Disallow: /personal_records/ Disallow: /admin/ Allow: / Allow: /personal_records/mypersonal.doc
Control Access to Content (2)
meta tag Prevent the search appliance crawler (as well as
other crawlers) from indexing or following links in a specific HTML page.
Embed a robots meta tag in the head of the HTML page.
The search appliance crawler obeys the index, noindex, follow, and nofollow in meta tags.
<meta name="robots" content="index, nofollow"><meta name="robots" content="noindex, nofollow">
Control Access to Content (3)
no-crawl Directories The Google Search Appliance does not crawl any
directories named "no_crawl." You can prevent the search appliance from crawling files and directories by: Creating a directory called "no_crawl."
Putting the files and subdirectories you do not want crawled under the no_crawl directory.
Control Access to Content (4)
Database data source information enables the search appliance to access content stored in a database.
To configure a database crawl, provide database data source information.
Crawl and Index > Databases page in the Admin Console.
After you create a new database data source, click the Sync link to start a database crawl.
Configuring Database Crawl
Google Search Appliance – Databases
A collection lets you search over a specific part of the index.
For example, you may want to create a products collection or a faq collection that supports searches that are only within the products or faqs part of your index.
Maximum number of collections for a search appliance is 200.
Use the Crawl and Index > Collections - In the Collection Name text box, type a name for the new collection.
Manage collection by Editing a Collection Exporting and Importing a Collection Configuration Deleting a Collection
Collections
Google Search Appliance – Collections
A front end enables you to change the look and feel of the search and search result pages your users access.
You can customize these pages to display your organization's colors, fonts, and design. If you have multiple collections, you can make each front end appear in a different format, and have its own configuration options.
Use the Serving > Front Ends - In the Front End Name field, enter a name for the new front end.
Manage Front End by Editing a Front End Deleting a Front End
Front Ends
Google Search Appliance – Front Ends
Crawl diagnostics provide detailed information about appliance crawl status for a domain, host, directory, or URL.
Crawl Diagnostics
Google Search Appliance - Crawl Diagnostics
Google Search Appliance – Secret Recipe
"The appliance uses a sophisticated algorithm to generate the results
bla… bla ..."
Deploy Mura Plugin
Mura – Plugin
Search Code
GSA Plugin - Search
Search results code
GSA Plugin - Results
DEMO
GSA Plugin – DEMO
Google Search Appliance – Secret Recipe
http://docs.getmura.com/ http://www.getmura.com/marketplace/apps/
fw1-plugin-template/ https://developers.google.com/search-
appliance/documentation/614/ https://developers.google.com/search-
appliance/documentation/614/xml_reference http://www.robotstxt.org/meta.html http://muracms.com/forum/
Resources
Thanks to Oğuz Demirkapi for helping to prepare the presentation.
Acknowledgements
Q & A
?