cwts bibliometric report for the university of oulu in the ... · cwts publication-based...

CWTS Bibliometric report for the University of

Oulu in the context of RAE2020

Ed Noyons, CWTS, Leiden University

University of Oulu, March 17th 2020

Outline

• Bibliometrics

• Databases

• Methodology

• In practice

Bibliometrics

Key elements

• Bibliographic database(s)

• Publication output metadata

– Authors & affiliations

– Content descriptors

– Source (journal, series)

– Cited references

– (Link from cited reference to publication)

Indicators

• Output/ production

• Citedness/ impact

• Co-authorship/ collaboration

• Output distribution/ research focus/ profile

Requirements for responsible use

• Reliable infrastructure

• Up-to-date data

• Proper coverage

– Inclusive

– Objective

– Suitable for purpose

• (Manual) quality control/ desktop research

• Documentation

• Proper/ local interpretation

Database(s)

Multidisciplinary bibliographic data sources

• Web of Science:

– Launched in 1964 by the Institute of

Scientific Information (ISI) as the

Science Citation Index (SCI)

– Nowadays owned by Clarivate Analytics

– Requires subscription

• Scopus:

– Launched in 2004 by Elsevier

– Requires subscription

• Google Scholar:

– Launched in 2004 by Google

– Freely accessible

– No large-scale data access

• Dimensions:

– Launched in 2018 by Digital Science

– Free version and subscription version

• Microsoft Academic Graph:

– Launched in 2016 by Microsoft

– Freely accessible

– Large-scale data access (ODC-BY

license)

Web of Science: Citation indices

• Web of Science Core

Collection:

– Science Citation Index Expanded

– Social Sciences Citation Index

– Arts & Humanities Citation Index

– Emerging Sources Citation Index

– Book Citation Index

– Conference Proceedings Citation Index

• Regional Collection:

– Chinese Science Citation Database

– Russian Science Citation Index

– KCI Korean Journal Database

– SciELO Citation Index

• Specialist Collection

• Data Citation Index

• Derwent Innovations Index

Microsoft Academic Graph: Semantic search

Content selection policies

• Web of Science:

– Focus on selectivity

– Content selection by internal Editorial Development team

• Scopus:

– Focus on comprehensiveness; Scopus claims to be “the largest abstract and citation

database of peer-reviewed literature”

– Content selection by Content Selection and Advisory Board

• Dimensions:

– “The database should not be selective but rather should be open to encompassing all

scholarly content that is available for inclusion … The community should then be able to

choose the filter that they wish to apply to explore the data according to their use case.”

• Google Scholar and Microsoft Academic:

– Strong focus on comprehensiveness (web indexing combined with data from publishers) 10

WoS vs Scopus: coverage by length of reference list

Overlap

Surplus

Scopus

Coverage WoS

• Coverage depends on research field

– publication ‘behavior’

– Communication tradition

• Depending on the purpose (what is to be measured?)

• Data collection process (which criteria are used?)

WoS: External coverage vs Internal coverage

• External coverage: number of WoS publications divided by the number

of outputs (research publications)

• Internal coverage: average proportion of references in a WoS article,

also covered by WoS

Internal vs external coverage

140.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Ext ernalcoverage

0.5000

• Larger volume, stronger correlation

• 0.5 IC => estimated majority covered

Methodology

Methodological elements

• Indicators

• Normalization

• Fractional counting

• Journal indicator

• Role

Indicators: Output

Indicators: P and P[fract]

• Production P indicates the number of WoS publications (articles &

reviews) in which an actor was involved (as co-author);

• P[fract] indicates the assessed contribution of an actor to the output:

– Each publication is divided by the number of organizations co-authoring,

e.g., fraction: 1/3 if three orgs are involved

– P[fract] is the sum of fractions of publications in which the actor is involved.

P and P[fract] do not correct for the number of

research FTE available

Indicators: Impact

Measuring impact by citations

• Citation practice and characteristics differ among fields;

• Each publication requires its own context to be compared;

• For this we need a proper classification, structure of all sciences.

MCS, MNCS, PP(top10%) indicators

• MCS: Average number of citation per publication

(P or P[fract])

• MNCS: MCS normalized by field and year

• PP(top10%): Proportion of P/P[fract] belonging to top 10% most cited

per field and year

Sensitive to

outliers

Not Sensitive

to outliers

Putting each publication in its proper environment:

WoS classification

Comparison of citation impact requires proper

normalization

• Citation practices differ among fields;

• In other words: a citation is ‘worth’ more in one field than in the

other;

Normalization subject classification (journal

classification)

• Web of Science;

• Journal classification (254 fields);

• Measure impact of one paper (x in category Y in year i);

• Measure impact all papers in same category (Y) and same year (i);

• x is normalized by Y in the same year.

Categorization based on journals

• Advantages:

– Easy to understand

– Stable structure;

• Problems:

– ‘Objectivity’

– Traditional fields

– Publications in multidisciplinary journals (e.g., Nature, Science, PNAS, PLOS ONE);

– Subject categories seem often too coarse.

Challenge: scope of category

(e.g. cardiac & cardiovascular systems)

Color coding indicates

impact (MCS)

Clinical research Basic research

Alternative:

publication based classification

• A structure of ‘all’ sciences (WoS) more fine-grained;

• Not based on journals;

• Publication-based clustering;

• ~4000 clusters of publications identified by network analysis of

citation relations;

• Each cluster has its own average to be used for normalization of

citation impact.

CWTS publication-based classification

• Citing relations among publications (WoS 2000-to date);

• Self-organized: clusters at different levels (current version)

– Lowest level of ~4,500 research areas;

• Disjoint clusters;

• Hierarchical.

Publication-based classification

• Advantages

– ‘Objective’

– Independent from journals

– Dynamic

– provides more detail

• Challenges

– Labeling

– Updates.

Map of all sciences

Each circle represents a

cluster of pubs

Surface represents

volume

Distance represents

relatedness

(citation traffic)

Information available for each cluster

• All info covered by publications (journals, authors, affiliations,

keywords, etc);

• Total volume (number of P whole period);

• Volume per year (trend);

• Internal coverage;

• Other average stats (e.g., # authors, # refs, # affiliations);

• Impact (overall and per year);

➢ All these indicators for any subset (e.g., a unit’s output).

Impact measurement and counting method

Counting methods

• Full (or whole) counting:

– A publication is fully assigned to each co-author

– Example: A publication co-authored by three universities is assigned to each university

with a weight of 1

• Fractional counting:

– A publication is fractionally assigned to each co-author

– Example: A publication co-authored by three universities is assigned to each university

with a weight of 1/3 = 0.33

Citation advantage of collaborative publications

Main issues regarding counting method and

impact measurement

• Collaborative publications are cited more frequently than non-

collaborative publications;

• Avg number of co-authors in a publication differ among fields;

• Full counting therefore yields higher citation scores than fractional

counting;

• Full counting is biased in favor of certain fields of science, in

particular biomedicine;

• Use fractional counting in analyses at the level of institutions or

countries.

Mean Normalized Journal Score (MNJS):

in between output and impact

MNCS and MNJS

380.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4

MNJS[f ract ]

• Strong correlation between MNCS and MNJS

• Journal indicators on the aggregate level less

problematic than at the paper level (DORA)

Beyond fractional counting

• Distinguishing the different roles;

• Rarely detailed information available in publication (meta) data (c.f.,

PLOS One);

• In this report we assessed role by corresponding author information

(lead vs other);

• Other role may be very diverse;

• Around 50/50 distribution.

Main statistics UOulu by Role

• 50/50 output

• Teams smaller when UOulu

in lead (P vs P[fract]);

• Teams more international

and involving industry

when UOulu not in lead;

• Impact higher when in

other role.

Profiles and other use of results

Collaboration profile

Research profile

Other key elements

• P fractional vs P

• Internal coverage

• PP[top10%] vs MNCS

• MNCS vs MNJS

• PP [Intl collab]

Trends

More information

• www.cwts.nl

• noyons@cwts.leidenuiv.nl

cwts bibliometric report for the university of oulu in the ... · cwts publication-based...

Documents

bibliometric mapping

minutes of a meeting of the industrial commission of … are...

Σεμινάριο wos

cwts-tsd-34.108-3a0 (1)

scalextric wos

practising interdisciplinarity - cwts

rae2020 bibliometric analyses performed by oulu university...

selfie ghurls report in cwts

cwts leiden ranking: an advanced bibliometric approach to...

eng.iauctb.ac.ireng.iauctb.ac.ir/files/72/content/fanni/pajohesh/mokhtase...

cwts gospels

tools for analyzing and mapping scholarly publications...

cwts lecture

research review psychology - tilburg university review...

bibliometric report · objective of the research the centre...

bibliometric study on dutch medical centers 2015/2016 -...

bibliometric study on mathematics research in the … ·...

wos- politycy

cwts ocean energy dasn hicks

bibliometric report - suomen akatemia · | page 2 december...