advanced web searching, ifeg, 3rd april 2012
DESCRIPTION
Presentation on advanced search given to the Information for Energy Group (IFEG) Spring Symposium. Hosted by the Energy Institute, London, 3rd April 2012TRANSCRIPT
Advanced web searching Information for Energy Group (IFEG) Spring Seminar
Energy Institute, London3rd April 2012
Karen Blakeman
RBA Information Services
Slides are available at http://www.rba.co.uk/as/
Twitter: @karenblakeman
http://www.rba.co.uk/
This presentation is licensed under a Creative Commons Attribution 3.0 License
Energy Balance: 100 kW Hydroelectric Turbine at Mapledurham. http://ergobalance.blogspot.co.uk/2011/11/100-kw-hydroelectric-turbine-at.html
Trends in search
No longer straightforward text searching of web pages and documents
Localisation
Personalisation
Social
Mobile
11/04/23 www.rba.co.uk 2
General search tools
Google - 91% of UK market, 67% US market
Bing/Yahoo– Yahoo now uses Bing's database and search algorithms for
web, image, video and news search
Alternatives to the "big two"– DuckDuckGo
– Blekko
– Yandex.com
Increasingly need to be aware of specialist tools
11/04/23 www.rba.co.uk 3
Sanity checking Google
http://disruptivesearcher.wordpress.com/2012/02/27/sanity-checking-google/
“if I hadn’t searched across more than Google for data on a small, new company that I was asked to research recently, I
would have missed out on some very significant information that Google just wasn’t showing me.”
11/04/23 www.rba.co.uk 4
How Google started
11/04/23 www.rba.co.uk 5
11 November 1998The Internet Archive www.archive.org
How was Google different?
11/04/23 www.rba.co.uk 6
Links (citations) a major part of ordering search results
http://www.seobook.com/learn-seo/collateral-damage.php
Where is Google now?
11/04/23 www.rba.co.uk 7
2001 Revenues $86,426 thousandsNet Income $10,964 thousands
2011Revenues $37,905 millionsNet Income $9,737 millionshttp://investor.google.com/financial/tables.html
2011 – 96% of revenues are from advertising Google is mass market consumer oriented. Serious researchers wanting reliable, structured search are a miniscule fraction of their customer base.
New How People Spend Their Time Online – Stephen's Lighthouse http://stephenslighthouse.com/2012/03/14/how-people-spend-their-time-online/
11/04/23 www.rba.co.uk 8
How Google organises and sorts information
Has a primary index of higher "quality" documents and a secondary index. Only the primary index is searched when running straightforward searches. Secondary index comes into play with more complex searches and if a small number of results are found.
“Dear Bing, We Have 10,000 Ranking Signals To Your 1,000. Love, Google” http://searchengineland.com/bing-10000-ranking-signals-google-55473
Over 200 hundred “signals” and each may have over 50 variations11/04/23 www.rba.co.uk 9
How Google ranks and organises your results
11/04/23 www.rba.co.uk 10
Google personalizes and tailors your results depending on your location, computer/device, browser, past searches, what you have looked at in the past, your +1s, your Google+ account, what you had for breakfast...and anything else it can find by rummaging around in your Google dashboard
To see what's in your dashboard log in to your Google account and go to http://www.google.com/dashboard/ Also see Google personalisation: web history isn’t the only problem http://www.rba.co.uk/wordpress/2012/02/22/google-personalisation-web-history-isnt-the-only-problem/
11/04/23 www.rba.co.uk 11
Google to Launch Third-Party Commenting Platform http://thenextweb.com/google/2012/03/27/google-to-launch-third-party-commenting-platform-to-rival-facebook/
Bing does it all as well!
– "Adaptive search"
– links with Facebook
Sign out of accounts, clear cookies, switch off history
DuckDuckGo – no tracking, no personalisation
Private browsing/No-tracking options in browser
Use Chrome Incognito (Chrome owned by Google!)
11/04/23 www.rba.co.uk 12
Google's new Privacy Policy
11/04/23 www.rba.co.uk 13
“Our new Privacy Policy makes clear that, if you’re signed in, we may combine information you’ve provided from one service with information from other services. In short, we’ll treat you as a single user across all our products, which will mean a simpler, more intuitive Google experience.“
What I see on my screen for a search is not what you’ll see on yours.
11/04/23 www.rba.co.uk 14
Google totally loses the plotFor 10 days in February 2011: coots = lions
11/04/23 www.rba.co.uk 15
Google decides that coots are really lions– http://www.rba.co.uk/wordpress/2011/02/12/google-decides-that-co
ots-are-really-lions/
Update on coots vs. lions– http://www.rba.co.uk/wordpress/2011/02/21/update-on-coots-vs-lio
ns/
Coots = lions
11/04/23 www.rba.co.uk 16
Three search tricks
These three techniques can change what Google (and other search engines) decides to give you and also the order of the results.
Repeat important search termscoots coots mating behaviour (found coots)
Change the order of your termsmating behaviour coots (found coots)
Change one of your search termscoots mating behaviour (found lions)coots courtship behaviour (found coots)coots mating ritual (found coots)
11/04/23 www.rba.co.uk 17
Excluding pages containing words
Want to exclude pages containing a term? Place a - (minus sign) before the term
Use with care as may miss important material
Excluding lions from our bizarre coots search
coots mating behaviour –lions
gave us:
11/04/23 www.rba.co.uk 18
Coots=lions was an extreme example of how Google can work
We think Google was doing the following:
- assumed a typing error or was running a mobile/smartphone predictive text algorithm (coots=cats)?
- ran an automatic variation/synonym search on cats?
- used a search frequency rule and found that lions mating behaviour was requested more than cats?
11/04/23 www.rba.co.uk 19
Dear Google, stop messing with my search http://www.rba.co.uk/wordpress/2011/11/08/dear-google-stop-messing-with-my-search/
11/04/23www.rba.co.uk
20
Google no longer looks for all of your terms in a page
See what Google sees
11/04/23 www.rba.co.uk 21
Hover over a result and a "preview" of the page should appear to the right together with a Cached link – this is Google copy
“When you do a multi-term query on Google (even with quoted terms), the algorithm sometimes backs-off from hard ANDing all of the terms together.......it’s clear that people will often write long queries (with anywhere from 5 to 10 terms) for which there are no results. Google will then selectively remove the terms that are the lowest frequency to give you some results (rather than none)....Soft AND is a way to reduce the overall frustration and give the searcher something to examine (and with luck, a chance to reformulate their query).”
Dan Russell
http://www.rba.co.uk/wordpress/2011/11/08/dear-google-stop-messing-with-my-search/#comments
11/04/23 www.rba.co.uk 22
Verbatim
Forces Google to run an exact
match search. Run your search first
and then select Verbatim from the
left hand menu on your results page
Cannot be combined with time
options in the side bar
Google: Verbatim for exact match
search
http://www.rba.co.uk/wordpress/2011/11/18/google-verbatim-for-exact-match-search/
11/04/23 www.rba.co.uk 23
Google doing its own thing can be good
11/04/23 www.rba.co.uk 24
Google's new(ish) social network Google Plus (Google+)
http://plus.google.com/
Google trying forcing people to create a Google+ profile http://marketingland.com/google-now-forcing-all-new-users-to-create-google-enabled-accounts-3912
Search Plus Your World (SPYW) referred to as Search+ now available in Google.com and is the default. Gives priority to content from people in your Google+ network if you are signed in to your account.
(And the next Google killer is….Google! http://www.rba.co.uk/wordpress/2012/01/30/and-the-next-google-killer-is-google/)
11/04/23 www.rba.co.uk 25
Search gets Social - Resistance is Futile
11/04/23 www.rba.co.uk 26
11/04/23 www.rba.co.uk 27
Google.com [Not signed in to a Google account]
11/04/23 www.rba.co.uk 28
Signed in to Google account on Google.com
11/04/23 www.rba.co.uk 29
11/04/23 www.rba.co.uk 30
About 4,940,000 results (0.74 seconds) !!!
Google results side bar
These help you focus your search
"Everything" does NOT search everything
Vary depending on type of search e.g. web, news, images
Open up the "more" options to see everything
11/04/23 www.rba.co.uk 31
Google side bars
11/04/23 www.rba.co.uk 32
Images Videos News Books Blogs
Google Images
11/04/23 www.rba.co.uk 33
Similar images
Google Images
11/04/23 www.rba.co.uk 34
Google Images – use an existing image
11/04/23 www.rba.co.uk 35
Click on the camera icon in the search box and then either enter the URL of an image or upload it
Flickr
Flickr Creative Commons http://www.flickr.com/creativecommons or advanced search screen http://www.flickr.com/search/advanced/
11/04/23 www.rba.co.uk 36
Images - other sources for Creative Commons and public domain images
Wikimedia Commons http://commons.wikimedia.org/ (check the licence information towards the bottom of the page e.g. http://commons.wikimedia.org/wiki/File:Thomas_Beach_by_Thomas_Beach.jpg)
MorgueFile.com http://www.morguefile.com/ - public domain
Geograph http://www.geograph.org.uk/ Creative Commons 2.0
Most of the images on US government web sites are public domain (but do check)
NASA http://www.nasa.gov/ - public domain11 April 2023 Karen Blakeman www.rba.co.uk 37
Google Video
11/04/23 www.rba.co.uk 38
Not the same as Youtube
Video
Bing Videos
YouTube
Vimeo.com
DailyMotion.com
Blinkx - http://www.blinkx.com/
11 April 2023 Karen Blakeman www.rba.co.uk 39
Google News
11/04/23 www.rba.co.uk 40
Google News Archive
11/04/23 www.rba.co.uk 41
Silobreaker.com
11/04/23 www.rba.co.uk 42
Silobreaker.com
11/04/23 www.rba.co.uk 43
Google Books
11/04/23 www.rba.co.uk 44
Blogs
11 April 2023 Karen Blakeman www.rba.co.uk 45
Related searches
11/04/23 www.rba.co.uk 46
Translated foreign pages for a different perspective
Google suggests languages from context of search but you can choose your own
Your search is translated and the results are translated into your language
11/04/23 www.rba.co.uk 47
Advanced searchUse search commands or Advanced Search screen
http://www.google.co.uk/advanced_search
11/04/23 www.rba.co.uk 48
Problems finding information on a particular site?
Use Google's site: command
Can combine with date options in side menu
11/04/23 www.rba.co.uk 49
Or if you are interested in UK academic reports
11/04/23 www.rba.co.uk 50
Looking for a particular type of information for example statistics, research report, expert presentation?
Use the filetype: command
For statistics
world oil consumption filetype:xls world oil consumption filetype:xlsx world oil consumption filetype:xlsx OR filetype:xls
For government, research, industry reports
UK oil consumption forecasts filetype:pdf
For conference presentations or trying to locate an expert
renewable energy UK filetype:ppt renewable energy UK filetype:pptx renewable energy UK filetype:ppt site:ac.uk
11/04/23 www.rba.co.uk 51
Numerical range search
Anything to do with numbers
Use advanced search screen
or
1st number followed by two full stops followed by 2nd number followed by unit of measurement (if applicable)
– Norway oil production forecasts 2012..2020
– Norway oil production forecasts 2012..2020 filetype:xls OR filetype:xlsx
11/04/23 www.rba.co.uk 52
Advanced commands continued
inurl: for example
inurl:"carbon capture" targets
intitle: for example
intitle:"carbon capture" targets
asterisk (*) to search for terms separated by 1-5 words (may
have to use quotation marks)
solar * panels
"solar * panels"
Picks up solar PV panels, solar photovoltaic panels, solar
water heating panels
11/04/23 www.rba.co.uk 53
Synonyms
Google often looks for variations of your terms but you cannot rely on it always happening
Use the tilde ~ before a term to look for what Google considers are synonyms
– ~energy will pick up oil, fuel, gas, electricity
No information/documentation on how synonyms are created
Very general, consumer oriented rather than scientific
Can be used with Verbatim
11/04/23 www.rba.co.uk 54
Google alternatives - Bing and Yahoo
Yahoo now uses Bing's database and ranking
Many of the Advanced Search commands are similar to Google’s, see Search Tools Summary and Comparison http://www.rba.co.uk/search/compare.shtml
Most of the interesting developments and features are only available in the US version
Results tend to be more consumer/retail focused unless using advanced search features
Coverage not identical to Google’s - sometimes yields important unique content
Sometimes more up to date than Google
11/04/23 www.rba.co.uk 55
DuckDuckGo
http://duckduckgo.com/
DuckDuckGo – silly name but a neat little search tool http://www.rba.co.uk/wordpress/2011/11/07/duckduckgo-silly-name-but-a-neat-little-search-tool/
No tracking, no “filter bubble”
Commandssite: filetype: sort:date to sort by date (uses results from Blekko)
Syntax and keyboard shortcuts at http://duckduckgo.com/goodies.html
11/04/23 www.rba.co.uk 56
Yandex.com
11/04/23 www.rba.co.uk 57
Yandex.com advanced search
11/04/23 www.rba.co.uk 58
Blekko
http://blekko.com/
slashtags for sorting by date (/date), searching for images (/images) and videos (/videos)
create your own to search your specified list of sites (similar to Google Custom Search Engines)
wind turbine electricity generation /karenblakeman/renewable
“Musings about librarianship: Using Blekko to search across thousands of library sites” http://musingsaboutlibrarianship.blogspot.com/2010/11/using-blekko-to-search-across-thousands.html 11/04/23 www.rba.co.uk 59
Blekko
Cannot do filetype, inurl, intitle searchesDrop down menu next to page in results list for
– site search (or use /site), similar pages (or use /similar), inbound links to the page (or use /links)
11/04/23 www.rba.co.uk 60
Google Scholar
http://scholar.google.com/
A useful place to start your research or if you are looking for a specific paper but no source list, not comprehensive and omits many key scientific publications
Both peer-reviewed and un-reviewed articles, pre-prints, institutional repositories, references to books, citations
Does not use publishers’ meta data, author search unreliable, search on year of publication unreliable
Sometimes does strange things with your search terms but you can still use + before a term to force exact match search
Sometimes finds unique content
11/04/23 www.rba.co.uk 61
11/04/23 www.rba.co.uk 62
Authors encouraged to claim papers
11/04/23www.rba.co.uk
63
Microsoft Academic Search
http://academic.research.microsoft.com/
11/04/23 www.rba.co.uk 64
Microsoft Academic Search
11/04/23 www.rba.co.uk 65
Microsoft Academic Search
Problems– coverage– sometimes gets the author completely wrong
“Will the Real Scott Wilson Please Stand Up, Please Stand Up”–
http://ukwebfocus.wordpress.com/2011/09/20/will-the-real-scott-wilson-please-stand-up-please-stand-up/
11/04/23 www.rba.co.uk 66
M Edwards should be Martin Edwards not Maria-Benedicta Edwards
Mendeley.com
11/04/23 www.rba.co.uk 67
Mendeley.com
11/04/23 www.rba.co.uk 68
Google Public Data Explorer
http://www.google.com/publicdata/
11/04/23 www.rba.co.uk 69
11/04/23 www.rba.co.uk 70
Statistical Review of World Energy 2011 | BP http://www.bp.com/sectionbodycopy.do?categoryId=7500&contentId=7068481
Energy Export Databrowserhttp://mazamascience.com/OilExport/
11/04/23 www.rba.co.uk 71
GasTrends databrowser http://mazamascience.com/Energy/GasTrends/
11/04/23 www.rba.co.uk 72
Department of Energy and Climate Changehttp://www.decc.gov.uk/en/content/cms/statistics/energy_stats/energy_stats.aspx
11/04/23 www.rba.co.uk 73
RESTATS
https://restats.decc.gov.uk/cms/welcome-to-the-restats-web-site/
11/04/23 www.rba.co.uk 74
OFFSTATShttp://www.offstats.auckland.ac.nz/
11/04/23 www.rba.co.uk 75
OFFSTATShttp://www.offstats.auckland.ac.nz/
11/04/23 www.rba.co.uk 76
Priced industry and market research
Aggregators may not be comprehensive– use aggregators as an index to see who is publishing on your
topic, for example
http://www.marketresearch.com/
http://www.mindbranch.com/
http://www.researchandmarkets.com/
http://www.alacrastore.com/
http://www.reportbuyer.com/ (go to Energy and Utilities)
For emerging markets http://www.securities.com/
Many sites specialising in energy reports - just search on energy market research reports
11/04/23 www.rba.co.uk 77
LinkedIn.com
11 April 2023 Karen Blakeman www.rba.co.uk 78
LinkedIn.com
11/04/23 www.rba.co.uk 79
http://search.twitter.com/
11/04/23 www.rba.co.uk 80
Topsy.com
11/04/23 www.rba.co.uk 81
Icerocket.com
11/04/23 www.rba.co.uk 82
Paper.li - create your own newspaper
11/04/23 www.rba.co.uk 83
Paper.li – individual Twitterstreamhttp://paper.li/karenblakeman
11/04/23 www.rba.co.uk 84
Paper.li – keyword http://paper.li/karenblakeman/1321447614
11/04/23 www.rba.co.uk 85
Create your own Google custom search engine
http://www.google.com/cse/
For– regularly searched sites
– selected sites on a subject or type of organisation
Cannot include password protected sources or sites where you have to fill in a form to access the information
Information on setting up a Google Custom Search Engine (CSE)
http://www.rba.co.uk/as/GoogleCustomSearchEngines.doc
Google's blog on custom search http://googlecustomsearch.blogspot.com/
11 April 2023 Karen Blakeman www.rba.co.uk 86
1. Think about the type of information you are looking for – news, statistics, experts
2. Get to know the options in Google's sidebar
3. Get to know the advanced search commands for example site: filetype: intitle: numeric range
4. Get to know the alternative search tools and use them!
5. Keep up to date with changes to search tools and in particular Google
11/04/23 www.rba.co.uk 87
Keeping up to date
Inside Search http://insidesearch.blogspot.com/
Official Google Blog http://googleblog.blogspot.com/
Google Scholar Blog http://googlescholar.blogspot.com/
Search Engine Land http://searchengineland.com/
Search Engine Watch http://searchenginewatch.com/
Boolean Black Belt-Sourcing/Recruiting http://www.booleanblackbelt.com/
Karen Blakeman’s Blog http://www.rba.co.uk/wordpress/
Phil Bradley's weblog http://philbradley.typepad.com/
11/04/23 www.rba.co.uk 88