company logo search engine hacking steve at snakeoillabs dot com
TRANSCRIPT
Search Engine HackingSearch Engine Hacking
1. What is SEH?1. What is SEH?
2. Tools Armoury2. Tools Armoury
3. Exploiting SEH 3. Exploiting SEH
4. Countermeasures4. Countermeasures
What is SEH?What is SEH?
Definition: Search Engine Hacking (SEH) Function: nounSEH is the malicious use of indexing technologies in order to identify, fingerprint and exploit at-risk systems, data and people.
In other words: Using Search Engines and other indexing facilities to find juicy information and 0wnable b0x3n/w4r3z/d00dz
What is SEH?What is SEH?
How much data are we talking about?
http://searchenginewatch.com/reports/article.php/2156481
What is SEH?What is SEH?
Only now there’s much more to contend with
IRC Search EnginesBit Torrent/P2P Search enginesFTP Search enginesFlickr.comBlogsYour.application.here/search/Oh, and Google
But there’s more…(Whaddya mean you only thought there was Google?)
Tools ArmouryTools Armoury
SiteDigger (http://www.foundstone.com)
•The ‘original’ Google Scanning tool (other than a web browser, of course)
•Requires a Google API Key
•Uses FSDB and GHDB
•Searches deliberately restricted
•The ‘Internet Scanner’ of SEH tools
Tools ArmouryTools Armoury
SiteDigger
•Pros•Slick Reporting•Well maintained•FSDB sometimes outdated, but well categorized
•Cons•Needs Google API Key•Google-Specific•Restricted searches means stuff gets missed
•Overall•A good tool, ultimately crippled by restrictions
Tools ArmouryTools Armoury
Apollo (http://worm.ccert.edu.cn/GoogleHacking/Apollo/)
•Written by Mimi & Spark of the Good Cat Studio.
•No Google Key required, but still Google only
•No restrictions on Search
•Similar functionality to SiteDigger, minus the snazzy reporting
Tools ArmouryTools Armoury
Apollo
•Pros
•No restrictions•No Google API Key needed•Auto update GHDB
•Cons•Google-Specific•Clunky interface•No direct link in results
•Overall•Better than SiteDigger, but needs better reporting interface
Tools ArmouryTools Armoury
Wikto (http://www.sensepost.com/research/wikto/)•Port of Nikto to Windows with bells and whistles
•Google Hacking functionality a la GooScan
•Needs Google API Key
•Site orientated
•Requires registration with Foundstone’s portal!!!!
Tools ArmouryTools Armoury
Wikto
•‘BackEnd’ module imports data from Googler for use in data mining…
Tools ArmouryTools Armoury
Wikto
•‘Wikto’ module functions as Nikto on other systems, with ability to import dirs from Googler and BackEnd
Tools ArmouryTools Armoury
Wikto
•‘GoogleHacks’ Module provides an automated GoogleDork searching facility
Tools ArmouryTools Armoury
Wikto
•Pros•Directory harvesting via Google•Wikto port
•Cons•Google Key required•Complicated•Google-Specific
•Overall•Feels like several tools bundled into one
Tools ArmouryTools Armoury
Athena (http://www.snakeoillabs.com)
•The ‘original’ Search Engine Hacking tool (other than a web browser, of course)
•No API Key required
•Features GHDB editor and extensive logging functionality
•Not Google Specific!
•Manual tool
Tools ArmouryTools Armoury
Athena
•Pros•Cool logging/note-taking functionality•Can edit GHDB information within Athena
•Use datagrid or raw XML editing facilities•Designed for non-techies as well as power users•Suitable for Yahoo, Altavista, <your search facility here>
•Cons•No automation•Tabbed browsing would be nice
•Overall•Unique … so far.
Exploiting SEHExploiting SEH
It’s easy as 1-2-3
• Load the GHDB.xml into Athena
• Select your query type(and enter any filters)
• Hit Search
Exploiting SEHExploiting SEH
Thinking of buying a digital camera?
• Load Digicams.xml into Athena
• Select your camera manufacturer(and enter any filters – e.g
wedding, holiday, ‘amateur’)
• Hit Go!
Exploiting non-Google SEHExploiting non-Google SEH
An example
•Create a Catalog in Indexing Server for file store
•Associate the Catalog with the default web site via the catalog properties
•Use the index server query object in ASP (ixsso.Query)
•Voila! Instant Search facility!
Exploiting non-Google SEHExploiting non-Google SEH
What happens when you’re not sure what you’re indexing?
Exploiting non-Google SEHExploiting non-Google SEH
Things to try on your own app
•.htaccess/.htpasswd stuff•GET POST•Deny from all
•IIS Indexing•REM (from autoexec.bat)•SELECT (from backup .asp and .aspx files)
•Other stuff•<?php•#!/usr/bin/perl•root:0:•.inc, .htm, .txt, .bak•</>•<div> (try other html tags)
CountermeasuresCountermeasures
Google-specific countermeasures
•Add the following to specific pages to be left out•<META NAME="GOOGLEBOT" CONTENT="NOINDEX, NOFOLLOW">
•Remove ‘snippets’ but still index link•<META NAME="GOOGLEBOT" CONTENT="NOSNIPPET">
•Stop archiving•<META NAME="GOOGLEBOT" CONTENT="NOARCHIVE">
•Remove my page NOW!•http://services.google.com:8882/urlconsole/controller•http://www.google.com/remove.html
CountermeasuresCountermeasures
HTTP Server configuration countermeasures
•Robots.txt•Some indexing systems obey it•Some don’t
•.htaccess/.htpasswd•Make sure it’s configured properly!
•Indexing Services•Make sure indexed files are held in a specific directory, not the web root!•Figure out what you’re indexing – you’re only indexing files with specific extensions, right?
CountermeasuresCountermeasures
Procedural countermeasures
•Newsgroups/Mailing lists•Use a hushmail/hotmail account•Use X-No-Archive: Yes headers in Usenet postings•Don’t post information about your systems, data or people
(e.g: specify Solaris rather than specific Solaris patch levels)
•Check for information leakage periodically•Don’t use site: restrictions – you want to find all occurrences that affect you, not just the ones on your site!
•Web sites•Ensure that backups, test data etc. is held outside of the web root.
CountermeasuresCountermeasures
Further Info/Resources
•Info•Google Hacking for Penetration Testers (Johnny Long)•Johnny.ihackstuff.com•www.searchlore.org
•Tools•SiteDigger: www.foundstone.com•Wikto: http://www.sensepost.com/research/wikto/•Apollo: http://worm.ccert.edu.cn/GoogleHacking/Apollo•Athena: www.snakeoillabs.com