evaluating (scientific) knowledge for people, documents, organizations/activities/communities
DESCRIPTION
Evaluating (Scientific) Knowledge for people, documents, organizations/activities/communities. ICiS Workshop: Integrating, Representing and Reasoning over Human Knowledge Snowbird August 9 2010 Geoffrey Fox [email protected] http://www.infomall.org http://www.futuregrid.org http://pti.iu.edu/ - PowerPoint PPT PresentationTRANSCRIPT
Evaluating (Scientific) Knowledgefor people, documents, organizations/activities/communities
ICiS Workshop: Integrating, Representing and Reasoning over Human Knowledge Snowbird August 9 2010
Geoffrey [email protected]
http://www.infomall.org http://www.futuregrid.org http://pti.iu.edu/
Director, Digital Science Center, Pervasive Technology InstituteAssociate Dean for Research and Graduate Studies, School of Informatics and Computing
Indiana University Bloomington
My Role
• My research would be on building Cyberinfrastructure (MapReduce/Bigtable/Clouds/Information Visualization) for “Integrating, Representing and Reasoning over Human Knowledge”– Use FutureGrid to prototype Cloud/Grid environments
• Here I talk in role as frustrated journal editor and School bureaucrat responsible for advising faculty how to get NSF grants and tenure.
Knowledge Evaluation is Important?• Review of journal or conference paper.
– Several conference management systems but don’t offer reviewing tools• Supporting choice of panel reviewing proposals
– And proposal review itself• Supporting choice of Program Committee for conference• Supporting promotion and tenure process
– h-index appears in several referee reports• Supporting ranking of organizations such as Journals, Universities and
(Computer Science) departments.• Deciding if some activity is useful such as TeraGrid; particular agency
or agency program; a particular evaluation process (panel v individual reviews)
• Deciding if some concept useful such as multidisciplinary research, theory, computing …….
• Evaluation of Knowledge evaluation methodologies
“Policy Informatics”aka “Command & Control”
(military knowledge)
• In Data-Information-Knowledge-Wisdom-Decision(Evaluation) pipeline, some steps are “dynamic” (can be redone if you save raw data) but decisions are often “final” or “irreversible”
• We could (and do as preprints) publish everything as “disks free” and change our evaluations
• But Finite amount of research funding and finite number of tenure positions
Database
SS
SS
SS
SS
SS
SS
Sensor or DataInterchange
Service
AnotherGrid
Raw Data Data Information Knowledge Wisdom Decisions
SS
SS
AnotherService
SSAnother
Grid SS
AnotherGrid
SS
SS
SS
SS
SS
SS
SS
StorageCloud
ComputeCloud
SS
SS
SS
SS
FilterCloud
FilterCloud
FilterCloud
DiscoveryCloud
DiscoveryCloud
Filter Service fsfs
fs fs
fs fs
Filter Service fsfs
fs fs
fs fs
Filter Service fsfs
fs fs
fs fsFilterCloud
FilterCloud
FilterCloud
Filter Service fsfs
fs fs
fs fs
Traditional Grid with exposed services
SS
Citation Analysis• Use of Google Scholar (Publish or Perish) to analyze contribution
of individuals is well established• #papers, # citations, h-index, hc-index (contemporary), g-index
(square)…..• There is ambiguity as to “best metric” and if such metrics are
sound at all but in some cases, perhaps most serious problem is calculating them in unbiased fashion
• One can probably find metrics for “Geoffrey Fox” but it’s hard for more common names and for example most Asian names are hard– Google Scholar has crude approach to refine by including and excluding
names e.g. include “Indiana University” or exclude “GQ Fox” (not clear where words are?)
• “Automating” hard unless analysis for each name done by hand– Even the name nontrivial – need “GC Fox” and “Geoffrey Fox”
Evaluating Documents• As journal editor, I find choosing referees (and persuading them to
write report) as hardest problem– Especially with increasing number of non traditional authors
• Need to identify related work and find authors or previous referees of these related papers– Currently ScholarOne uses largely useless keyword system– Can also look at originality of articles examined from overlap in text between
a given article and some corpus (typically resubmitting conference paper unchanged)
• If unfamiliar with authors need to identify which of a multi-author paper is appropriate choice, where they are now and contact information– Current services DBLP, ACM Portal, LinkedIn, Facebook don’t tell you
necessary information• Need tools to quantify reliability of referees
Is High Performance Computing Useful for improving Knowledge
• Are papers that use TeraGrid “better” than those that don’t?– Does TeraGrid help enhance Knowledge?
• Correlate quality and type of papers with “use of TeraGrid”
• Possibly can be done by text analysis (does paper acknowledge TeraGrid)
• Here use indirect mapTeraGrid Projects/People Papers
TeraGrid Analysis I: Bollen
TeraGrid Analysis II: Bollen
TeraGrid Web of Science
Need a Freely Available Toolkit
• Firstly current tools as in Google Scholar and CiteSeer have insufficient scope– Google Scholar stuck in early stage of “perpetual beta (?alpha)” after
killing Windows Academic Live• Secondly need to enable customization so that can explore
evaluation choices– Current CS department rankings put Indiana in dungeon – partly because
Fox/Gannon papers are not counted as not in approved journals• Don’t want to let Thomson to control Impact Factors (relevant for
tenure especially in Asia?) without scientific scrutiny• As discussed ScholarOne (also Thomson) is dreadful but seems to
have growing adoption• Want to explore new ideas such as evaluating TeraGrid
Tools Needed• More accurate scientific profiles; ACM Portal says I have
3 publications; DBLP 250; Google Scholar 1000– Neither tells you my contact and professional information
• Unbundled CiteSeer/Google Scholar allowing more accurate document analysis– e.g. Analyze document on hand (as in conference
submission)– Open decomposition into Authors, Title, Institution, emails,
Abstract, Paper, citations• Analyzers of Citations and/or text to suggest referees• Analysis of novelty of document• Tool to produce accurate h-index (etc.)
Some Research Needed• Open analysis of concepts like Impact Factor, h-index,
Indexing services– Look at definitions and– Possibilities of making valid deductions
• How do we evaluate “groups” (research, departments) as opposed to individuals
• Can one automate the current time consuming manual steps– Identity confusion in Google Scholar– Research profiles
• Compare traditional ethnography approach to evaluation (do a bunch of interviews) versus data deluge enabled version
• Why are Web 2.0 tools like Delicious, Facebook etc. little used in science