institutional classification schemes in bibliometrics

Post on 04-Feb-2016

41 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Institutional classification schemes in bibliometrics. Matthias Winterhager Bielefeld University euroCRIS Membership Meeting Bonn May13, 2013. Institutions: a major object of bibliometric studies. Relevant data fields in bibliometric databases: the case of Web of Science - PowerPoint PPT Presentation

TRANSCRIPT

Institutional classification schemesin bibliometrics

Matthias WinterhagerBielefeld University

euroCRIS Membership MeetingBonn

May13, 2013

2

Institutions:a major object of bibliometric studies

Relevant data fields in bibliometric databases: the case of Web of Science

Institutional data: ready to use?The advent of identifiersProcessing institutional data: how to count?

3

Web of Science anno 1982

4

Hardcover times ...

5

Institutional address data:two dimensions

6

Web of Science anno 2013

7

Web of Science: Institutional Data

8

Web of Science Institutional Data

Reprint address

Authors' institutional affiliation(s) (from 2008

onwards linked to author names)Funding agenciesPublishers

Do not expect any institutional affiliation data

before 1965

9

Sample document(from Scientometrics 2008)

10

Address Data in Web of ScienceAuthor Affiliations („work done at“):

Tech Univ Denmark, Tech Knowledge Ctr Denmark, DARC

DTU Anal & Res Promot Ctr, Lyngby, Denmark Ctr Sci & Technol Studies CEST, Bern, Switzerland Inst Res Informat & Qual Assurance, Bonn, Germany

Reprint Address: Larsen, PO (reprint author), Marievej 10A,2, Hellerup,

Denmark

Present Address („current potential“): missing

11

Which bibliometric indicators do depend on address data?

Almost all: Every indicator that is (directly or indirectly) based

on distinct sets of national, organisational or geographic entities

Any normalised indicator (like observed vs. expected citation ratios) that takes into account regional (e.g. EU, country, state) instead of world averages

Any indicator on cross-national, -organizational or -institutional cooperation as measured via co-authorships

12

Institutional address data: Issues (1)

Substantial amounts of records come without any address data (they can or cannot be included in world total counts for expected ratios)

Different proportions of missing address data per discipline (humanities) and document types

Few records with address data in “backfiles” (before 1966)

Spelling variants, misspellings, erroneous entries (samples following)

13

Erroneous address data (1)

14

Erroneous address data (2)

15

Erroneous address data (3)

16

Uncontrolled affiliation (1)

17

Uncontrolled affiliation (2)

18

Institutional address data: Issues (2)

“Reality gap”: names for entities which never existed (fiction institutes) existed, but have been split, merged, renamed or

closed

Geographical and organisational aspects of an address can hint to different directions; borderline cases can be complicated to assign (Max-Planck-Institutes outside Germany, EMBL, CERN, KIT, Charité Berlin)

19

Abbreviations in address records

… can have different origins: author, publisher and database producer

„Corporate and institution names may or may not be abbreviated. To be comprehensive, search for the full name of the institution … as well as the abbreviation.“

„Abbreviations for corporate and institution names used in the product database are listed below. Other address elements … may also be abbreviated.“

(from: Web of Science Help)

20

Cleaning of German address data (1)

Project of the German competence centre for bibliometrics

Aim: assignment of (almost) every paper with at least one German address from Web of Science (or Scopus) to the relevant German institution(s)

Introduction of procedures to handle unstandardized, incomplete and incorrect address data

Testing different algorithms for reduction and standardization of data elements, extraction of cities and tree-structures

Use of geocoding services

21

Cleaning of German address data (2)

Maintaining a large base on institute-specific string patterns for thousands of German institutions

Assignment process still heavily based on pattern-matching procedures

Growing database of institutions (with “history”), currently ~2.000 main institutions (data from 1995-2011), mapping identifiers to external sources

22

Institutional Identifiers

Identifiers of the project are currently being mapped to “Research Explorer”, the research directory of the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) and of the German Academic Exchange Service (DAAD) in cooperation with the German Rectors' Conference (HRK).

23

Institutional Identifier (I2)

Initiatives are underway, but it will take several years to bring such standards into operation on a broad scale (as can be seen from the case of the author identifier initiative – ORCID)

24

Processing institutional data:How to count?

Challenges coming from international clinical trials and from high energy physics:

Hundreds of different addresses from thousands of authors on a single publication – how to attribute publication (and citation) counts in the right way?

25

Counting methods

Complete (C): each basic unit gets 1 credit complete-normalized (CN): all the basic units in a

publication share 1 credit Straight (S): the first basic unit gets 1 credit Whole (W): each unique basic unit gets a credit of 1 whole-normalized (WN): all the unique basic units 1

credit

Basic units can be countries, organisations, institutes.Normalized counts are often called “fractional”.

26

No „gold standard“ for counting

Small units may be favoured by whole counting (W) – but other effects make it difficult to give a general rule here.

„40 years of publication counting have not resulted in general agreement on definitions of methods and terminology nor in any kind of standardization.”

(Larsen, P.O: The state of art in publication counting, Scientometrics, 77(2), 2008, 235-251)

27

Thank you for listening!

top related