using search engines and web crawlers in social science research mike thelwall head, statistical...
Post on 21-Dec-2015
218 views
TRANSCRIPT
Using Search Engines and Web Crawlers in Social
Science Research
Mike Thelwall
Head, Statistical Cybermetrics Research Group
University of Wolverhampton, UK
http://linkanalysis.wlv.ac.ukRC33 August 2004
Link Analysis in Social Science Research Use to study web phenomena
E.g. NGO web site interlinking E.g. university web site interlinking
Use to study offline phenomena with web aspects E.g. scholarly communication E.g. the perception of news events
The web is a free, accessible massive data source for information about many aspects of life
What use is hyperlink data to qualitative researchers?
Part of a mixed methodology Numbers to back up theories To obtain samples of types of Web pages for
qualitative analyses Background information on how the Web
is used
Quick example 1:
UK universityinterlinkingwith geographicclusters indicated
Quick example 2:
Asia-Pacific university interlinking.
{Research with Alastair Smith, VUW, NZ}
Quick example 3:
Geographic interlinking trends for UK universities.
Talk overview A social science approach for link analysis Data collection with commercial search
engines Data collection and analysis with
SocSciBot
A social science approach for link analysis 1: Preliminary steps1. Formulate an appropriate research question,
taking into account existing knowledge of web structure
2. Conduct a pilot study3. Identify web pages or sites that are appropriate to
address a research question4. Collect link data from a commercial search
engine or a personal crawler taking appropriate safeguards to ensure that the
results obtained are accurate
A social science approach for link analysis 2: Validation
5. Partially validate the link count results through correlation tests
6. Partially validate the interpretation of the results through a link classification exercise or web author interviews
A social science approach for link analysis 3: Reporting8. Report results with an interpretation
consistent with link classification exercise include either a detailed description of the
classification or exemplars to illustrate the categories
9. Report the limitations of the study and parameters used in data collection and processing
Link data from commercial search engines
Commercial search engines can give information about the existence of links in the web Can be used for data collection Advanced interfaces are usually needed, or
special commands
Google Can find all links to a given web page with
the link: command E.g. link:http://www.siswo.uva.nl/rc33/
Yahoo! site-specific searches Yahoo! allows searching for links between
pairs of web sites/web spaces E.g. linkdomain:db.dk +site:ac.uk returns
web pages in the ac.uk domain that link to the db.dk site
…ac.uk/… …db.dk/…
SocSciBot Personal crawler for link research Available free at socscibot.wlv.ac.uk Crawls sets of web sites and analyses the
links between them, producing: Link lists Link counts Network diagrams
Reprise: Link Analysis in Social Science Research Use to study web phenomena
E.g. NGO web site interlinking E.g. university web site interlinking
Use to study offline phenomena with web aspects E.g. scholarly communication E.g. the perception of news events
The web is a free, accessible massive data source for information about many aspects of life
But don’t forget the need for validation!