cwts bibliometric report for the university of oulu in the ... · cwts publication-based...

Post on 20-Jul-2020

8 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

CWTS Bibliometric report for the University of

Oulu in the context of RAE2020

Ed Noyons, CWTS, Leiden University

University of Oulu, March 17th 2020

Outline

• Bibliometrics

• Databases

• Methodology

• In practice

1

Bibliometrics

2

Key elements

• Bibliographic database(s)

• Publication output metadata

– Authors & affiliations

– Content descriptors

– Source (journal, series)

– Cited references

– (Link from cited reference to publication)

3

Indicators

• Output/ production

• Citedness/ impact

• Co-authorship/ collaboration

• Output distribution/ research focus/ profile

4

Requirements for responsible use

• Reliable infrastructure

• Up-to-date data

• Proper coverage

– Inclusive

– Objective

– Suitable for purpose

• (Manual) quality control/ desktop research

• Documentation

• Proper/ local interpretation

5

Database(s)

6

Multidisciplinary bibliographic data sources

• Web of Science:

– Launched in 1964 by the Institute of

Scientific Information (ISI) as the

Science Citation Index (SCI)

– Nowadays owned by Clarivate Analytics

– Requires subscription

• Scopus:

– Launched in 2004 by Elsevier

– Requires subscription

• Google Scholar:

– Launched in 2004 by Google

– Freely accessible

– No large-scale data access

• Dimensions:

– Launched in 2018 by Digital Science

– Free version and subscription version

• Microsoft Academic Graph:

– Launched in 2016 by Microsoft

– Freely accessible

– Large-scale data access (ODC-BY

license)

7

Web of Science: Citation indices

• Web of Science Core

Collection:

– Science Citation Index Expanded

– Social Sciences Citation Index

– Arts & Humanities Citation Index

– Emerging Sources Citation Index

– Book Citation Index

– Conference Proceedings Citation Index

• Regional Collection:

– Chinese Science Citation Database

– Russian Science Citation Index

– KCI Korean Journal Database

– SciELO Citation Index

• Specialist Collection

• Data Citation Index

• Derwent Innovations Index

8

Microsoft Academic Graph: Semantic search

9

Content selection policies

• Web of Science:

– Focus on selectivity

– Content selection by internal Editorial Development team

• Scopus:

– Focus on comprehensiveness; Scopus claims to be “the largest abstract and citation

database of peer-reviewed literature”

– Content selection by Content Selection and Advisory Board

• Dimensions:

– “The database should not be selective but rather should be open to encompassing all

scholarly content that is available for inclusion … The community should then be able to

choose the filter that they wish to apply to explore the data according to their use case.”

• Google Scholar and Microsoft Academic:

– Strong focus on comprehensiveness (web indexing combined with data from publishers) 10

WoS vs Scopus: coverage by length of reference list

11

Overlap

Surplus

Scopus

Coverage WoS

• Coverage depends on research field

– publication ‘behavior’

– Communication tradition

• Depending on the purpose (what is to be measured?)

• Data collection process (which criteria are used?)

12

WoS: External coverage vs Internal coverage

• External coverage: number of WoS publications divided by the number

of outputs (research publications)

• Internal coverage: average proportion of references in a WoS article,

also covered by WoS

13

Internal vs external coverage

140.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Ext ernalcoverage

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Inte

rna

lC

ove

rag

e

0.5000

• Larger volume, stronger correlation

• 0.5 IC => estimated majority covered

Methodology

15

Methodological elements

• Indicators

• Normalization

• Fractional counting

• Journal indicator

• Role

16

Indicators: Output

17

Indicators: P and P[fract]

• Production P indicates the number of WoS publications (articles &

reviews) in which an actor was involved (as co-author);

• P[fract] indicates the assessed contribution of an actor to the output:

– Each publication is divided by the number of organizations co-authoring,

e.g., fraction: 1/3 if three orgs are involved

– P[fract] is the sum of fractions of publications in which the actor is involved.

18

P and P[fract] do not correct for the number of

research FTE available

19

Indicators: Impact

20

Measuring impact by citations

• Citation practice and characteristics differ among fields;

• Each publication requires its own context to be compared;

• For this we need a proper classification, structure of all sciences.

21

MCS, MNCS, PP(top10%) indicators

• MCS: Average number of citation per publication

(P or P[fract])

• MNCS: MCS normalized by field and year

• PP(top10%): Proportion of P/P[fract] belonging to top 10% most cited

per field and year

Sensitive to

outliers

Not Sensitive

to outliers

Putting each publication in its proper environment:

WoS classification

23

Comparison of citation impact requires proper

normalization

• Citation practices differ among fields;

• In other words: a citation is ‘worth’ more in one field than in the

other;

Normalization subject classification (journal

classification)

• Web of Science;

• Journal classification (254 fields);

• Measure impact of one paper (x in category Y in year i);

• Measure impact all papers in same category (Y) and same year (i);

• x is normalized by Y in the same year.

Categorization based on journals

• Advantages:

– Easy to understand

– Stable structure;

• Problems:

– ‘Objectivity’

– Traditional fields

– Publications in multidisciplinary journals (e.g., Nature, Science, PNAS, PLOS ONE);

– Subject categories seem often too coarse.

Challenge: scope of category

(e.g. cardiac & cardiovascular systems)

Color coding indicates

impact (MCS)

Clinical research Basic research

Alternative:

publication based classification

• A structure of ‘all’ sciences (WoS) more fine-grained;

• Not based on journals;

• Publication-based clustering;

• ~4000 clusters of publications identified by network analysis of

citation relations;

• Each cluster has its own average to be used for normalization of

citation impact.

CWTS publication-based classification

• Citing relations among publications (WoS 2000-to date);

• Self-organized: clusters at different levels (current version)

– Lowest level of ~4,500 research areas;

• Disjoint clusters;

• Hierarchical.

Publication-based classification

• Advantages

– ‘Objective’

– Independent from journals

– Dynamic

– provides more detail

• Challenges

– Labeling

– Updates.

Map of all sciences

Each circle represents a

cluster of pubs

Surface represents

volume

Distance represents

relatedness

(citation traffic)

Information available for each cluster

• All info covered by publications (journals, authors, affiliations,

keywords, etc);

• Total volume (number of P whole period);

• Volume per year (trend);

• Internal coverage;

• Other average stats (e.g., # authors, # refs, # affiliations);

• Impact (overall and per year);

➢ All these indicators for any subset (e.g., a unit’s output).

Impact measurement and counting method

33

Counting methods

• Full (or whole) counting:

– A publication is fully assigned to each co-author

– Example: A publication co-authored by three universities is assigned to each university

with a weight of 1

• Fractional counting:

– A publication is fractionally assigned to each co-author

– Example: A publication co-authored by three universities is assigned to each university

with a weight of 1/3 = 0.33

34

Citation advantage of collaborative publications

35

Main issues regarding counting method and

impact measurement

• Collaborative publications are cited more frequently than non-

collaborative publications;

• Avg number of co-authors in a publication differ among fields;

• Full counting therefore yields higher citation scores than fractional

counting;

• Full counting is biased in favor of certain fields of science, in

particular biomedicine;

• Use fractional counting in analyses at the level of institutions or

countries.

36

Mean Normalized Journal Score (MNJS):

in between output and impact

37

MNCS and MNJS

380.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4

MNJS[f ract ]

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

MN

CS

[fr

act]

• Strong correlation between MNCS and MNJS

• Journal indicators on the aggregate level less

problematic than at the paper level (DORA)

Role

39

Beyond fractional counting

• Distinguishing the different roles;

• Rarely detailed information available in publication (meta) data (c.f.,

PLOS One);

• In this report we assessed role by corresponding author information

(lead vs other);

• Other role may be very diverse;

• Around 50/50 distribution.

40

Main statistics UOulu by Role

• 50/50 output

• Teams smaller when UOulu

in lead (P vs P[fract]);

• Teams more international

and involving industry

when UOulu not in lead;

• Impact higher when in

other role.

41

Profiles and other use of results

42

Collaboration profile

43

Research profile

44

Other key elements

• P fractional vs P

• Internal coverage

• PP[top10%] vs MNCS

• MNCS vs MNJS

• PP [Intl collab]

45

Trends

46

End

47

More information

• www.cwts.nl

• noyons@cwts.leidenuiv.nl

48

top related