extracting information from the links in academic webs mike thelwall statistical cybermetrics...
TRANSCRIPT
![Page 1: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/1.jpg)
Extracting Information from the Links in Academic Webs
Mike Thelwall
Statistical Cybermetrics Research Group
University of Wolverhampton, UK
An overview of methods and results
![Page 2: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/2.jpg)
Contents
1. Introduction to Webometrics2. Computer Science uses for Web links3. Main talk: analysing university Web links
1. Data collection2. Data processing3. Analysis4. Results
![Page 3: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/3.jpg)
Part 1:Introduction to Webometrics
A new area of Information Science
![Page 4: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/4.jpg)
infor-/biblio-/sciento-/cyber-/webo-/metrics
informetrics
bibliometricsscientometrics
webometrics
cybermetrics
© Lennart Björneborn 2001-2002
![Page 5: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/5.jpg)
Webometrics the study of quantitative aspects of the construction and use
of info. resources, structures and technologies on the Web, drawing on bibliometric and informetric methods – LB def.
four main research areas of Webometric concern: Web page contents link structures (e.g., Web Impact Factors, cohesion of link topologies, etc.) search engine performance users’ information behavior (searching, browsing, encountering, etc.)
cybermetrics = quantitative studies of the whole Internet i.e. chat, mailing lists, news groups, MUDs, etc. - and Web
© Lennart Björneborn 2001-2002
![Page 6: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/6.jpg)
Part 2:Computer Science uses for Web links
Search engine page ranking, topic identification and similarity matching
![Page 7: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/7.jpg)
PageRank Assumptions:
A page with many links to it is more likely to be useful than one with few links to it
The links from a page that itself is the target of many links are likely to be particularly important
![Page 8: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/8.jpg)
Example
Y
X
X seems to be the most important page since 2 important pages link to it
![Page 9: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/9.jpg)
Simple voting model: round 1
1
1
1
1
![Page 10: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/10.jpg)
Simple voting model: round 2
0
1
1.5
1.5
![Page 11: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/11.jpg)
Simple voting model: round 3
0
0
2
2
![Page 12: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/12.jpg)
Revised voting model: round 1
1
1
1
1
•Allocate 1 vote to each node after each voting round
•Remove votes from ‘leaf’ nodes
![Page 13: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/13.jpg)
Revised voting model: round 2
1
2
1.5
1.5
![Page 14: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/14.jpg)
Revised voting model: round 3
1
2
2
2
The middle node only has one link to it, but this does not share its votes with other nodes
![Page 15: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/15.jpg)
Revised voting model cycling problem
1
1
1
![Page 16: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/16.jpg)
PageRank Use a proportion of vote, redistribute the
rest If proportion is < 1 then no cycling will
occur Voting can also be performed by a matrix Find votes from principle left eigenvector
of matrix
![Page 17: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/17.jpg)
PageRank: round 1
1
1
1
1
•4 votes in system: allocate 20% of vote, redistribute 80% of each, plus the lost votes from leaf nodes = 3.6 votes
![Page 18: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/18.jpg)
PageRank: round 2
0.9
1.1
1
1
0.9+0.2 x 1
0.9+0.2 x 0.5 x 1
![Page 19: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/19.jpg)
PageRank: round 3
0.9
1.08
1.01
1.01
0.9+0.2 x 0.9
0.9+0.2 x 0.5 x 1.1
![Page 20: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/20.jpg)
PageRank summary The pages that get the highest PageRank
are those that are linked to by many pages or by important pages
Spammers try to exploit this by creating dummy sites to link to their main sites
![Page 21: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/21.jpg)
Kleinberg’s HITS Also uses link structures, but also uses
page content to identify pages that are useful for a coherent topic on the web
An Authority is a page that is linked to by many other pages from the same topic
A Hub is a page that links to many pages from the same topic
![Page 22: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/22.jpg)
Hubs and authorities
H
A
![Page 23: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/23.jpg)
The HITS algorithm Another iterative algorithm Each page has a hub value and an authority
value Unlike PageRank, is topic specific, and
potentially needs to be recomputed for each user query
![Page 24: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/24.jpg)
Link Algorithms - Overview The success of HITS and PageRank indicates the
importance of links as a new information source More needs to be known about patterns of linking But there is still no hard evidence that link
approaches work – academic paper report unscientific experiments or inconclusive results
![Page 25: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/25.jpg)
Small worlds
short cuts or ‘weak ties’ between otherwise ‘distant’ web clusters (e.g., subject domains, interest communities)
transversallink
’info. science’
’creativity research’
© Lennart Björneborn 2001-2002
![Page 26: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/26.jpg)
Part 3:Analysing University Link Structures
Information science approaches
![Page 27: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/27.jpg)
Why analyse university link structures? Analogies with citation studies Ensure that the Web is efficiently used for research
communication Identify trends in informal scholarly communication Suggest improvements in search tools Exploratory research: the Web is important and a
valid object for scientific study
![Page 28: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/28.jpg)
Methodologies: Data collection Web crawler AltaVista advanced querieshost:wlv.ac.uk AND link:albany.edu AllTheWeb advanced queries Google
Does not support same level of Boolean querying
![Page 29: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/29.jpg)
![Page 30: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/30.jpg)
Methodologies: Data processing 1 Link counts to target universities
Inter-site links only Colink counts
B and C are colinked Couplings
D and E are coupledB C
A D E
F
![Page 31: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/31.jpg)
Methodologies: Data processing 2 Alternative Document Models
E.g. count links between domains (ignoring multiple links) instead of pages
P1P2P3
P4P5P6
www.wlv.ac.uk www.albany.edu
![Page 32: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/32.jpg)
Methodologies: Data analysis Statistical techniques for evaluating results
Correlation with known research performance measures
Factor analysis, Multi-Dimensional Scaling, Cluster analysis for patterns
Simple graphical techniques Techniques from Communication
Networks research / Geography
![Page 33: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/33.jpg)
Results section 1 – Patterns of links between university Web sites
![Page 34: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/34.jpg)
Results 1: Links associate with research Counts of links to universities within a
country can correlate significantly with measures of research productivity
![Page 35: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/35.jpg)
Links to UK universities counted by domain
![Page 36: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/36.jpg)
Results 2: Links between universities in a country can be related to geography
![Page 37: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/37.jpg)
Results 3: Universities cluster by geographic region
This is clearest for Scotland but also for other groupings, including Manchester-based universities
Coherent clusters are difficult to extract because of overlapping trends
![Page 38: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/38.jpg)
A pathfinder networkof UK universityinterlinkingwith geographicclusters indicated
![Page 39: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/39.jpg)
Results section 2: Links and subject areas
![Page 40: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/40.jpg)
Results 4: Links to departments associate with research In the US, links to chemistry and psychology
departments from other departments associate with total research impact
No evidence of a significant geographic trend Disciplinary differences in the extent of
interlinking: history Web use is very low
{Research with Rong Tang}
![Page 41: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/41.jpg)
Results 5: Links for precision, colinks and couplings for recall For the UK academic Web, about 42% of
domains connected by links alone are similar, and about 43% connected by links, colinks and couplings
But over 100 times more domains are colinked or coupled than are directly linked
Colinks and couplings can help the task of finding additional subject-based pages
![Page 42: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/42.jpg)
Results 6: Most links are only loosely related to research
A random sample of links between UK university sites revealed over 90% had some connection with scholarly activity, including teaching and research.
Less than 1% were equivalent to citations
![Page 43: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/43.jpg)
Results section 3: International academic links
![Page 44: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/44.jpg)
Results 7: Linguistic factors in EU communication
English the dominant language for Web sites in the Western EU
In a typical country, 50% of pages are in the national language(s) and 50% in English
Non-English speaking extensively interlink in English
{Research with Rong Tang}
![Page 45: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/45.jpg)
Results 8: Can map patterns of international communicationCounts of links between Asia-Pacific universities are represented by arrow thickness.
{Research with Alastair Smith, VUW, NZ}
![Page 46: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/46.jpg)
Results section 4: The topology of national academic Webs
![Page 47: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/47.jpg)
Results 9: “Power laws” in the Web
Academic Webs have a topology dominated by power laws, including Counts of links to pages (inlink counts) Counts of links to pages (outlink counts) Groups of interconnected pages
Directed component sizes Undirected component sizes
![Page 48: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/48.jpg)
Results 9: “Power laws” in the Web
![Page 49: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/49.jpg)
Results 9: “Power laws” in the Web
![Page 50: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/50.jpg)
Results 10: Academic Web topology
A mess!
![Page 51: Extracting Information from the Links in Academic Webs Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK An overview](https://reader036.vdocuments.net/reader036/viewer/2022081520/5697bfc81a28abf838ca871f/html5/thumbnails/51.jpg)
The future Results of research leading into:
Improved Web-related policy making Improved Web information retrieval
algorithms Improved understanding of informal
scholarly communication on the Web More effective use of the Web by scholars, e.g.
via PhD training