evaluation of citation enhanced scholarly databases infopro 2005 keynote address dr. peter jacso...
TRANSCRIPT
Evaluation of Citation Enhanced Scholarly Databases
INFOPRO 2005 Keynote addressDr. Peter Jacso ProfessorUniversity of Hawaii, USA
Tokyo, November, 2005
Jacso
Japan in Science & Technology
Jacso
Japan in Science & Technology publication
Jacso
The birth of an idea
Eugene Garfield
Jacso
Thesaurus-based and free-text searching
• Nice in library schools but not in practice• Is it sciatica or ischialgia ?• Orthopedic or orthopaedic • Center or centre• Shiatsu or shiatzu• Student or pupil• Bad behavior or bad behaviour
Jacso
What is a citation?
• Citation or reference
• Citation indexing, indexes or indices
• Citation analysis or analyzis
• Or is it analyses?
Jacso
Jacso
Jacso
arXiv
Multi-disciplinary – discipline-focused
• WoS and Scopus largest multidisciplinary databases
• Google Scholar – it is free, and ….. ?
• Discipline-oriented databases
• arXiv - primarily physics
• NASA/ADS – astrophysics and some related
• PsycINFO - psychology
• CINAHL - nursing
• CiteSeer – computer science
• RePEC and its derivatives – economics
• SMEAL – businessJacso
Citation
collecting, parsing, indexing, matching, browsing, searching, sorting, ranking outputting linking
Jacso
The purpose of database evaluation
• We did it with print reference sources for a long time
• Content AND Software
• Practical and financial implication$$$$
• Thermometers, pulsometers, blood pressure meters are not
enough
• X-Ray, MRI, blood tests
• Quantitative and qualitative aspects
• Quantifiable, measurable vs. philosophical-ideal
• Can’t do it in your lunch break
• Going beyond PR-info from database publishers Jacso
CONTENT MEASURES
• Database size
• Database dimensions
• Scope
• Composition
• Source coverage
• Journal base
• The special aspects of cited references as data elements
Jacso
Database Size
• Guinness Book of World Records mentality
• Biggest, Greatest, Largest, Greenest
• Fastest, Strongest, Leanest, Meanest
• Where is quality?
• Aeroflot – the biggest airline …. and (one of) the worst
before glasnost
• Sports Discus – little muscle much flab
Jacso
Database Size
WoS 1980-2005 25.5 million records
WoS 1945-2005 36 million records
Wos Century of Science 37 million records
Scopus 26 million records
Google Scholar (GS) maybe 10 million records, but
mixing fine jizake with cheap wine
Jacso
Database Dimensions
• Absolute size is not everything
• Biggest is not always the best(est)
• HowHow is database A bigger than database B
• In what shape and form?
• How is the “body” of the database built
• Different disciplines - different preferences
Jacso
Bigger horizontally (wider) vs. vertically (taller)
Jacso
Number of records with cited references Scopus
Jacso
0
200,000
400,000
600,000
800,000
1,000,000
1,200,000
72 74 76 78 80 82 84 86 88 90 92 94 96 98 00 02 04 06
Dialog ISI subset & Scopusnumber of records with cited references
Jacso
0
200,000
400,000
600,000
800,000
1,000,000
1,200,000
72 74 76 78 80 82 84 86 88 90 92 94 96 98 00 02 04 06
The fate of two databases
0
10,000
20,000
30,000
40,000
50,000
60,000
70,000
80,000
90,000
100,000
1966
1968
1970
1972
1974
1976
1978
1980
1982
1984
1986
1988
1990
1992
1994
1996
1998
2000
2002
2004
PsycINFO
MHA
Jacso
Noticeable lack of currency
Jacso
Google Scholar’s innumeracy more for 1965 – 2005 than for 1955 - 2005??
Jacso
GS big in/on Japan - 86% of all of its 1955-2005 records?
bigger between 1965 - 2005 than 1955 - 2005?
Jacso
Subject Scope
• Not static, may have evolved in the past X years• Obvious subject dominance in Scopus at the
journal level
Jacso
Much more subject dominance at the article level
Jacso
Apparent Presence of Arts & Humanities in WoS
Jacso
Composition
Jacso
Composition
Current Science
Jacso
there are books & conference proceedings in Scopus
(but not enhanced with cited refs)
Jacso
All records26,731,691
with keywords21,706,112
with abstracts18,538,475
with refs8,442,048
Completeness of records
Jacso
Completeness of records Dialog ISI subset total items & items with cited references
Jacso
0
200,000
400,000
600,000
800,000
1,000,000
1,200,000
1,400,000
72 74 76 78 80 82 84 86 88 90 92 94 96 98 00 02 04 06
Scopus number of total items & the number of records with cited references
Jacso
0
200,000
400,000
600,000
800,000
1,000,000
1,200,000
1,400,000
72 74 76 78 80 82 84 86 88 90 92 94 96 98 00 02 04 06
The Scientist case
Jacso
The D-Lib Magazine claim
Jacso
The D-Lib Magazine’s bold claim
Jacso
Jacso
JASIS - JASIS&T case
The Scientist case
Jacso
The Scientist case
Jacso
The Scientist case
Jacso
The Scientist case
Jacso
JASIS - JASIS&T case
Jacso
Jacso
JASIS - JASIS&T case
Evaluation of Citation Enhanced Scholarly Databases
Part 2.
Dr. Peter Jacso Professor
University of Hawaii at Manoa, USA
Tokyo, November, 2005
Jacso
Software Issues
• Software capabilities can make or break a product
• Cited references represent new and unusual data element
• New challenge, few (WoS, Scopus, CSA) can do it well
• For researchers adding cited references to their paper is
the bane of publishing
• No universal standard for cited reference formats.
• Reference Management programs support more than 700
citation style formats
Jacso
ProCite
Jacso
Many chances for messing up cited references in digitization
• Who can mess them up?
• Authors, editors, copy editors at publisher
• Data entry operators at A/I services
• Programmers at database aggregators
• Programmers when extracting data from publishers’ archives
• Spoiled and careless programmers when doing anything
Jacso
Selected references
Jacso
Notes
Jacso
Jacso
Google Scholar
• Autonomous citation indexing is not perfect either• Google Scholar mightily managed to mix up many metadata elements• Is this an article published in 2006?• Has it been really cited 98 times already in October, 2005.
Jacso
• No, it’s the page number, a Hungarian postal code, or any 4-character digit
Jacso
Careless data entry/OCR-ing can cripple the links
Jacso
EBSCO
… or 20th century programming can serve references dead cold
Jacso
Jacso
as opposed to the native hot-linked WoS version
Jacso
or the hot and spicy Scopus version with “cited by” (citedness) score
Jacso
…as long that cited references have no misspellings
Jacso
• typos cripple the impressive “cited by” feature - the best of Scopus and CSA, which can’t undo the misspelling shown earlier or the one done by PsycINFO in the author name here – it is Jacso not Jasco, thank you
Jacso
in the cited references they are crippled in more way than one,
but we may feel warmed up by 6 “cited by” hot links to records which cite Moed’s article in PsycINFO …
Jacso
….that’s why there are relatively few as opposed to the 45 citedness score in Scopus shown earlier ….
Jacso
…. and the 55 citedness score in Wos for the cited 1995 article of Moed HF. The citedness scores of WoS and Scopus often get close for articles published since the mid 1990s, but not for the earlier ones
Jacso
Remember, to see the citedness score is yet a two-step process in WoS, but it likely will include soon the citedness score within the cited reference list directly as in CSA and Scopus.
Jacso
Browsing of citing (source) and cited (target) Author and Journal Names is a must. Still only few offer adequate browsing. Scopus only for source author and source title.
Jacso
Browsing/Looking up citing/cited authors & journals
Jacso
Inconsistencies and inaccuracies are rampant in source journal names as in PASCAL
Jacso
WoS can spell it consistently nearly 20,000 times as source journal - quality control, order instead of disorder
Jacso
In cited sources all hell breaks loose
Jacso
Without browsing and defensive searching you would miss a lot
In Dialog’s version of the ISI subset misspelled formats are not corrected
Dialog only updates (adds new records) does not RE-LOAD (to correct old ones)
Jacso
In CINAHL I have slim-chances without browsing the author and cited author fields before searching. Browsing is like looking in the pool before diving to see if there is water, and how much is thereBe savvy & browse, browse, browse if the software allows
Jacso
AND WHAT BROWSE OPTIONS Google Scholar offers?
None
Zilch
Nada
Zero
Kotonashi
Jacso
SEARCHING
• Rather limited options for cited author, cited title, cited journal
• Menu driven in WoS
• SAME (sentence) option in WoS, but …
• …. No searching in cited title in WoS
• Proximity and positional operators in Scopus
• Mostly command-driven in Advanced Mode in Scopus
• Useful but ugly prefixes in Scopus
• Good menus in EBSCO and Ovid
Jacso
SEARCHING
• No truncation when searching in REFxxx ?
Jacso
Result display and sorting
• Short result list for at-
a-glance impression
about sources, then
sorting by citedness
score!
Jacso
Jacso
Jacso
Sorting & relative citedness score
• CSA could sort but does not offer this feature by citedness
• Google Scholar used to rank the result by citedness score
• No one offers citedness by age adjusted score even if that would be the
most fair
• 10 year old versus 2 year old article had different chances for receiving
citations
• My tests showed big difference for some items in ranking by absolute vs
relative citedness score
Jacso
Jacso
The many dimensions of citedness scores
Citedness scores can be highly informative in estimating usefulness & perceived importance of a paper by peers in form of citations (=links).
Major differences because of the domains of citing sources
• In journal publishers’s archive gathered only from digitized journals of
the publisher
• At aggregators/facilitators from all databases hosted (except ...PsycINFO
for not so splendid isolation policy))
• In self-published databases gathered from the database itself
• In Scopus gathered from 1996 onward from >10,000++ journals
• In WoS gathered from 1900/1945/1980 forward from <10,000 journals
Jacso
All the above assume correct identification, matching & calculation.Enter Google Scholar – playing fast and loose with the numbersMake it very fast and very loose
Jacso
A half-page quickie interview with the author in The Scientist cited 7,380 times?
Jacso
You can scroll up and down in the purportedly citing Nucleic Acid Research article for the name of Kraulis and The Scientist and the title, you will not find them. Any of them.
Jacso
Jacso
Two articles by Kraulis, but neither is the 1993 piece in The Scientist
Jacso
But what do you expect from a software that cannot even do the most basic Boolean OR operation correctly
Jacso
Indeed, “citation data is subtle stuff” and requires competence
Jacso