the structure of computer science knowledge network
TRANSCRIPT
Lehrstuhl Informatik 5(Informationssysteme)
Prof. Dr. Matthias JarkeI5-PK-0810-1
Manh Cuong PhamRalf Klamma
TeLLNet
The Structure of the Computer Science Knowledge Network
Manh Cuong Pham, Ralf KlammaInformation Systems and Database Technology
RWTH Aachen, Germany
Odense, Denmark, August 09, 2010
ASONAM 2010
Lehrstuhl Informatik 5(Informationssysteme)
Prof. Dr. Matthias JarkeI5-PK-0810-2
Manh Cuong PhamRalf Klamma
TeLLNet
Agenda
Introduction SNA as a knowledge discovery method Data sets: DBLP and CiteSeerX Network visualization Venue ranking Conclusions and Outlook
Lehrstuhl Informatik 5(Informationssysteme)
Prof. Dr. Matthias JarkeI5-PK-0810-3
Manh Cuong PhamRalf Klamma
TeLLNet
Introduction
Digital libraries (in computer science)- DBLP, ACM DL, IEEE Explorer, CiteSeerX, etc.- Digital media for scientific knowledge conservation
- Publications- Venues
- Development of research communities & research areas
- Knowledge discovery: Citation analysis, usage-analysis, etc.
- Digital libraries in Web 2.0: Mendeley, ResearchGate etc.
Problems- Structure of computer science knowledge- Existing research fields - The interconnection between fields
VLDB community in 2006 (DBLP)
VLDB community in 1990 (DBLP)
Lehrstuhl Informatik 5(Informationssysteme)
Prof. Dr. Matthias JarkeI5-PK-0810-4
Manh Cuong PhamRalf Klamma
TeLLNet
Motivations Scientometrics
- Unit of analysis: journals- Knowledge mapping: building, visualizing and analyzing the knowledge network- Methods:
- Citation analysis [Boyack 2005]
- Content analysis- Log-data (usage data) analysis [Bollen 2009]
- Data sets: - Journal Citation Index (JCR)- Science Citation Index (SCI)- Social Science Citation Index (SSCI), etc.
Problem– Computer science conferences
Lehrstuhl Informatik 5(Informationssysteme)
Prof. Dr. Matthias JarkeI5-PK-0810-5
Manh Cuong PhamRalf Klamma
TeLLNet
Our Approach
Combination of large-scale digital libraries- DBLP- CiteSeer X
Citation analysis- Bibliographical coupling at venue level (conferences, journals) - Similarity measures
SNA as a knowledge discovery method- Visual analytics- Cluster analysis- SNA measures: PageRank, betweenness, hub, authority scores etc.
Lehrstuhl Informatik 5(Informationssysteme)
Prof. Dr. Matthias JarkeI5-PK-0810-6
Manh Cuong PhamRalf Klamma
TeLLNet
Data Sets DBLP (http://www.informatik.uni-trier.de/~ley/db/)
- 788,259 author’s names- 1,226,412 publications- 3,490 venues (conferences, workshops, journals)
CiteSeerX (http://citeseerx.ist.psu.edu/)- 7,385,652 publications (including publications in reference lists)- 22,735,240 citations- Over 4 million author’s names
Combination- Canopy clustering [McCallum 2000]- Result: 864,097 matched pairs - On average: venues cite 2306 and
are cited 2037 times
Lehrstuhl Informatik 5(Informationssysteme)
Prof. Dr. Matthias JarkeI5-PK-0810-7
Manh Cuong PhamRalf Klamma
TeLLNet
Network Creation and Pre-processing
Knowledge network- Aggregate bibliography coupling counts at venue level- Undirected graph G(V, E), where V: venues, E: edges weighted by cosine
similarity
- Threshold: - Clustering: density-based algorithm [Neuman 2004, Clauset 2004]- Network visualization: force-directed paradigm [Fruchterman 1991]
Knowledge flow network- Aggregate bibliography coupling counts at venue level- Threshold: citation counts >= 50 Domains from Microsoft Academic Search
(http://academic.research.microsoft.com/)
n
k kj
n
k ki
n
k kjki
ji
jiji
BB
BB
BB
BBC
1
2,1
2,
1 ,,
22
,
1.0, jiC
Lehrstuhl Informatik 5(Informationssysteme)
Prof. Dr. Matthias JarkeI5-PK-0810-8
Manh Cuong PhamRalf Klamma
TeLLNet
Knowledge Network:the Visualization
Lehrstuhl Informatik 5(Informationssysteme)
Prof. Dr. Matthias JarkeI5-PK-0810-9
Manh Cuong PhamRalf Klamma
TeLLNet
Knowledge Network:Clustering
Lehrstuhl Informatik 5(Informationssysteme)
Prof. Dr. Matthias JarkeI5-PK-0810-10
Manh Cuong PhamRalf Klamma
TeLLNet
Interdisciplinary Venues:Top Betweenness Centrality
Lehrstuhl Informatik 5(Informationssysteme)
Prof. Dr. Matthias JarkeI5-PK-0810-11
Manh Cuong PhamRalf Klamma
TeLLNet
High Prestige Series:Top PageRank
Lehrstuhl Informatik 5(Informationssysteme)
Prof. Dr. Matthias JarkeI5-PK-0810-12
Manh Cuong PhamRalf Klamma
TeLLNet
Conclusions and Future Research SNA does help to gain an insight into the computer science knowledge Knowledge network in computer science
- Highly clustered, large clusters form the core of computer science research- Research fields are interconnected- Interdisciplinary venues
Outlook- More digital libraries should be integrated: ACM, IEEE, CEUR-WS.org, etc.- Usage analysis- Dynamic analysis of knowledge network
Lehrstuhl Informatik 5(Informationssysteme)
Prof. Dr. Matthias JarkeI5-PK-0810-13
Manh Cuong PhamRalf Klamma
TeLLNet
Questions ?
http://bosch.informatik.rwth-aachen.de:5080/AERCS/