institutional classification schemes in bibliometrics

27
Institutional classification schemes in bibliometrics Matthias Winterhager Bielefeld University euroCRIS Membership Meeting Bonn May13, 2013

Upload: zelig

Post on 04-Feb-2016

41 views

Category:

Documents


0 download

DESCRIPTION

Institutional classification schemes in bibliometrics. Matthias Winterhager Bielefeld University euroCRIS Membership Meeting Bonn May13, 2013. Institutions: a major object of bibliometric studies. Relevant data fields in bibliometric databases: the case of Web of Science - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Institutional classification schemes in bibliometrics

Institutional classification schemesin bibliometrics

Matthias WinterhagerBielefeld University

euroCRIS Membership MeetingBonn

May13, 2013

Page 2: Institutional classification schemes in bibliometrics

2

Institutions:a major object of bibliometric studies

Relevant data fields in bibliometric databases: the case of Web of Science

Institutional data: ready to use?The advent of identifiersProcessing institutional data: how to count?

Page 3: Institutional classification schemes in bibliometrics

3

Web of Science anno 1982

Page 4: Institutional classification schemes in bibliometrics

4

Hardcover times ...

Page 5: Institutional classification schemes in bibliometrics

5

Institutional address data:two dimensions

Page 6: Institutional classification schemes in bibliometrics

6

Web of Science anno 2013

Page 7: Institutional classification schemes in bibliometrics

7

Web of Science: Institutional Data

Page 8: Institutional classification schemes in bibliometrics

8

Web of Science Institutional Data

Reprint address

Authors' institutional affiliation(s) (from 2008

onwards linked to author names)Funding agenciesPublishers

Do not expect any institutional affiliation data

before 1965

Page 9: Institutional classification schemes in bibliometrics

9

Sample document(from Scientometrics 2008)

Page 10: Institutional classification schemes in bibliometrics

10

Address Data in Web of ScienceAuthor Affiliations („work done at“):

Tech Univ Denmark, Tech Knowledge Ctr Denmark, DARC

DTU Anal & Res Promot Ctr, Lyngby, Denmark Ctr Sci & Technol Studies CEST, Bern, Switzerland Inst Res Informat & Qual Assurance, Bonn, Germany

Reprint Address: Larsen, PO (reprint author), Marievej 10A,2, Hellerup,

Denmark

Present Address („current potential“): missing

Page 11: Institutional classification schemes in bibliometrics

11

Which bibliometric indicators do depend on address data?

Almost all: Every indicator that is (directly or indirectly) based

on distinct sets of national, organisational or geographic entities

Any normalised indicator (like observed vs. expected citation ratios) that takes into account regional (e.g. EU, country, state) instead of world averages

Any indicator on cross-national, -organizational or -institutional cooperation as measured via co-authorships

Page 12: Institutional classification schemes in bibliometrics

12

Institutional address data: Issues (1)

Substantial amounts of records come without any address data (they can or cannot be included in world total counts for expected ratios)

Different proportions of missing address data per discipline (humanities) and document types

Few records with address data in “backfiles” (before 1966)

Spelling variants, misspellings, erroneous entries (samples following)

Page 13: Institutional classification schemes in bibliometrics

13

Erroneous address data (1)

Page 14: Institutional classification schemes in bibliometrics

14

Erroneous address data (2)

Page 15: Institutional classification schemes in bibliometrics

15

Erroneous address data (3)

Page 16: Institutional classification schemes in bibliometrics

16

Uncontrolled affiliation (1)

Page 17: Institutional classification schemes in bibliometrics

17

Uncontrolled affiliation (2)

Page 18: Institutional classification schemes in bibliometrics

18

Institutional address data: Issues (2)

“Reality gap”: names for entities which never existed (fiction institutes) existed, but have been split, merged, renamed or

closed

Geographical and organisational aspects of an address can hint to different directions; borderline cases can be complicated to assign (Max-Planck-Institutes outside Germany, EMBL, CERN, KIT, Charité Berlin)

Page 19: Institutional classification schemes in bibliometrics

19

Abbreviations in address records

… can have different origins: author, publisher and database producer

„Corporate and institution names may or may not be abbreviated. To be comprehensive, search for the full name of the institution … as well as the abbreviation.“

„Abbreviations for corporate and institution names used in the product database are listed below. Other address elements … may also be abbreviated.“

(from: Web of Science Help)

Page 20: Institutional classification schemes in bibliometrics

20

Cleaning of German address data (1)

Project of the German competence centre for bibliometrics

Aim: assignment of (almost) every paper with at least one German address from Web of Science (or Scopus) to the relevant German institution(s)

Introduction of procedures to handle unstandardized, incomplete and incorrect address data

Testing different algorithms for reduction and standardization of data elements, extraction of cities and tree-structures

Use of geocoding services

Page 21: Institutional classification schemes in bibliometrics

21

Cleaning of German address data (2)

Maintaining a large base on institute-specific string patterns for thousands of German institutions

Assignment process still heavily based on pattern-matching procedures

Growing database of institutions (with “history”), currently ~2.000 main institutions (data from 1995-2011), mapping identifiers to external sources

Page 22: Institutional classification schemes in bibliometrics

22

Institutional Identifiers

Identifiers of the project are currently being mapped to “Research Explorer”, the research directory of the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) and of the German Academic Exchange Service (DAAD) in cooperation with the German Rectors' Conference (HRK).

Page 23: Institutional classification schemes in bibliometrics

23

Institutional Identifier (I2)

Initiatives are underway, but it will take several years to bring such standards into operation on a broad scale (as can be seen from the case of the author identifier initiative – ORCID)

Page 24: Institutional classification schemes in bibliometrics

24

Processing institutional data:How to count?

Challenges coming from international clinical trials and from high energy physics:

Hundreds of different addresses from thousands of authors on a single publication – how to attribute publication (and citation) counts in the right way?

Page 25: Institutional classification schemes in bibliometrics

25

Counting methods

Complete (C): each basic unit gets 1 credit complete-normalized (CN): all the basic units in a

publication share 1 credit Straight (S): the first basic unit gets 1 credit Whole (W): each unique basic unit gets a credit of 1 whole-normalized (WN): all the unique basic units 1

credit

Basic units can be countries, organisations, institutes.Normalized counts are often called “fractional”.

Page 26: Institutional classification schemes in bibliometrics

26

No „gold standard“ for counting

Small units may be favoured by whole counting (W) – but other effects make it difficult to give a general rule here.

„40 years of publication counting have not resulted in general agreement on definitions of methods and terminology nor in any kind of standardization.”

(Larsen, P.O: The state of art in publication counting, Scientometrics, 77(2), 2008, 235-251)

Page 27: Institutional classification schemes in bibliometrics

27

Thank you for listening!