journal impact factor: science’s misused metric and its...

Journal Impact Factor: Science’s Misused Metric and its Limitations

Submitted by: Sana Batool

Macaulay Honors College

Course: The Purpose, Practice and Politics of Science (MHC 360)

Instructor: Dr. Harold Varmus

Date: May 16th, 2018

Batool1

Introduction

Why are some journals like the Nature, Science, and Cell better known than others? Part

of the reason is their high Journal Impact Factor, abbreviated as JIF, which is, by far, the most

discussed bibliometric indicator in the Scientific Community. Eugene Garfield first mentioned

the idea of JIF in 1955 but it was not until 1975 when he first reported the impact factor for

scientific journals. It was primarily created to facilitate the US librarians to make subscription

decisions using an objective quantitative method. Since its inception, JIF has gradually evolved

as a tool for not only quantifying the impact of journals, but also for measuring the importance of

individual articles published in respective journals; However, JIF is far from being an

appropriate measure of scientific impact and it should not be used as a proxy to gauge the

scientific quality and productivity. The evaluation committees should, instead, base their

decisions for a grant or job position on a combination of more appropriate article-level metric

systems and utilize biographical sketches that briefly list the major accomplishments of a

candidate.

Limitations of JIF

Many skeptics of JIF point out its manipulatable nature that could potentially be

exploited by the journal editors in an effort to increase the impact factor of their journals. JIF is

annually reported in the Journal Citation Reports (JCR), currently published by Clarivate

Analytics. It is calculated as a mean value of citations (see Fig. 1), for which the published

documents are categorized as citable and non-citable items. Research and review articles are

considered citable items whereas the editorials, letters to the editors and news items are

considered non-citable items. For the calculation of JIF, the numerator includes citations

accumulated for both citable and non-citable items whereas the denominator, which accounts for

Batool2

the number of articles published, only includes the citable items. This introduces asymmetry in

the numerator and denominator and results in an inflating effect on JIF for the journals that

publish a lot more non-citable items than others. Moreover, review articles are compilations of

previously published studies and are often heavily cited. Therefore, journal editors could increase

the impact factor of their journals by publishing more non-citable items and review articles.

Fig 1. Formula used for calculation of JIF annually reported by JCR.

Another means of editorial manipulation of JIF is self-citations, which can be used by

authors and editors that have control over self-citations to inflate JIF. Such a systematic flaw

causes the authors to add unnecessary citations to their recently published articles to raise the

JIF. An analysis published in JCR in 2002 revealed that on average about 12% of the citations

for a journal were self-citations across all disciplines. There have been some cases reported

where the journals have returned papers to the authors, asking them to add more citations from

articles published in their journals.

JIF is also criticized for the brief length of citation window used for its calculation. As

indicated by Fig. 1, JIF only takes into account citations over a two-year window, which focuses

on the short-term impact of the articles. Dr. Rob Hyndman, Professor of Statistics at Monash

University, mentioned in a blog that this two-year period emphasizes on the recency instead of

longevity of scholarly documents and does not measure the real impact of articles. He gave an

example of one of his papers which was cited only 47 times in the first ten years of its

publication date, while in the next ten years it accumulated 457 citations. If JIF were calculated

over a longer time, this article would perhaps dramatically increase JIF of the journal it was

2018 𝐽𝑜𝑢𝑟𝑛𝑎𝑙 𝐼𝑚𝑝𝑎𝑐𝑡 𝐹𝑎𝑐𝑡𝑜𝑟 =

𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑖𝑡𝑎𝑡𝑖𝑜𝑛𝑠 𝑟𝑒𝑐𝑒𝑖𝑣𝑒𝑑 𝑏𝑦 𝐽𝑜𝑢𝑟𝑛𝑎𝑙 𝑋 𝑖𝑛 2018 𝑓𝑜𝑟 𝒄𝒊𝒕𝒂𝒃𝒍𝒆 𝒂𝒏𝒅 𝒏𝒐𝒏 − 𝒄𝒊𝒕𝒂𝒃𝒍𝒆 𝒊𝒕𝒆𝒎𝒔 𝑝𝑢𝑏𝑙𝑖𝑠ℎ𝑒𝑑 𝑖𝑛 2016− 2017

𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝒄𝒊𝒕𝒂𝒃𝒍𝒆 𝒊𝒕𝒆𝒎𝒔 𝑝𝑢𝑏𝑙𝑖𝑠ℎ𝑒𝑑 𝑖𝑛 𝐽𝑜𝑢𝑟𝑛𝑎𝑙 𝑋 𝑖𝑛 2016− 2017

Batool3

published in.

Although the number of citations of an article does give substantial data that could

gauge its impact in its relative field, it is equally important to consider the nature of these

citations, which could be positive or negative depending on whether the paper is credited or

criticized. A journal would still have an increase in its JIF even if its papers were cited as

examples of weak studies. For instance, a paper discussing arsenic-loving bacteria published in

Science in 2010 has been cited over 500 times, but it certainly has no impact in the field because

the results of this particular study remains irreproducible and mostly cited by others to criticize

the study. Dr. Stephan Curry, structural biologist at Imperial College and a longtime critic of JIF,

says that a fixation on JIF is problematic because if a single number could really quantify the

value of a scholarly document then this Science paper, which is a “worthless piece,” would be

considered impactful.

One of the major shortcomings of JIF is that it is calculated as a mean value instead of a

median. This means that a minority of papers in a journal that are heavily cited could account for

the vast majority of citations of that journal and raise the JIF. This results in a highly skewed

data when the citation distribution is plotted for these journals. For instance, an article in Nature

might be considered impactful because of the high JIF but could have zero citations in reality. In

a 2016 paper published by Lariviere et al., the citation distribution plots of 11 different journals

showed an extensive overlap, which demonstrated that JIFs cannot be used to measure the

impact of an individual article. It has also been found that on average, only about 28% of the

citable items accumulate citations equal to or greater than the impact factor of their journals

while about 70% of them have citations below the reported JIF.

In an effort to improve JIF, Dr. Randy Schekman, a 2013 Nobel Laureate and editor-in-

Batool4

chief of eLife, visited Thomson Reuters, who previously owned JCR, along with a group of

scientists a few years ago. They argued that the skewness of citation distribution makes the

average value of citations meaningless and made a suggestion to calculate JIF as a median.

Thomson Reuters responded that they had previously tried to calculate JIF as a median, but it

had resulted in the same impact factor for multiple journals. Since the scientific community

cannot differentiate between journals with the same JIF, Thomson Reuters abandoned this

approach, citing “we have always calculated it as a mean and that’s what we are going to

continue doing.” Dr. Schekman calls this a “bogus excuse” and says that the publishers of JIF

consider themselves like an “Olympic clock timekeeper” that wants to avoid ties at any cost.

Critics of JIF

Dr. Schekman made a public statement after he won the Nobel Prize in 2013, announcing

that his lab will boycott the so-called “luxury journals” namely Nature, Science, and Cell. He has

compared these journals to fashion designers who produce limited-edition bags. He says that

chasing after journals with high JIF is a “phony obsession,” which impedes the progress of

Science and “makes life miserable for the biomedical scientists.” Soon after his statement in

2013, Nature stopped advertising impact factor on its website and called it a false metric.

Dr. Stefano Bertuzzi, president of the American Society of Microbiology (ASM), is

another well-known critic of JIF who has called for complete abandonment of the JIF system,

which would redirect the scientists’ attention back to the content of the paper instead of the

prestige of journal. As the Executive Director of American Society of Cell Biology (ASCB) in

2012, he coordinated San Francisco Declaration on Research Assessment (DORA) after

discussing growing concerns over the misuse of JIF at the society’s annual meeting.

DORA is an important document that lays the foundation for movement against JIF. It

Batool5

encourages scientists on hiring, promotion and grant committees to judge the papers based on its

content instead of where it is published. DORA also calls for the scientists to send their

manuscript for publication to journals that best fit the scope of their study instead of blindly

going after journals with high JIF. Dr. Schekman was one of the first signatories among other

eminent scientists who showed support for DORA. As of this writing, DORA has been signed by

12,047 individuals and 475 organizations.

Consequences of misapplication of JIF

Institutions and individuals are both responsible for promoting JIF in the research

evaluation context, which has multiple consequences. In some countries, the obsession with

publishing in journals with high JIF is so strong that institutions have adopted cash-for-

publication policy. According to a recent article, Chinese scientists could earn over $100,000 per

paper published in prestigious Western journals like Nature and Science. Such policies not only

externalize the incentive to do scientific research but also promote fraudulent activity, as

desperate scientists falsify and fabricate data to be able to publish in top journals. Dr. Schekman

says, “such obsession has made biomedical publishing very uncomfortable and led to fraud, as

the scientists feel the need to jazz up their results and sensationalize their data just to publish in

luxury journals.” In some cases, the journals have even asked the authors to change the title of

their papers to make it more captivating. Such practices subtly influence the researchers to

choose topics and dissemination venues based on a single number that ends up controlling

science making across various fields.

As Dr. Curry rightly points out, the young researchers are the hardest hit by the current

system of evaluation. Many students may want to pursue fascinating and novel questions but are

subverted by JIF. In a correspondence, Gomar points out that many PhD students are under

Batool6

constant pressure by their Principal Investigators to have at least one Nature, Science, or Cell

publication to bolster their CV for jobs. Top journals like Nature have an acceptance rate of only

about 5%, which means few PhD students end up with a publication in these journals at the end

of their doctoral program; Consequently, this decreases their chances to obtain a postdoctoral

fellowship, which narrows down their career options. Therefore, the young researchers tend to

focus more on where to publish instead of what to publish, resulting in a loss of originality and

creativity in Science.

Dr. Sven Hendrix, a Neuroanatomy professor in Belgium, has called impact factors as the

“Currency of Fame” in Academia and emphasized that in an environment where JIF is highly

valued as in the scientific community, only those researchers climb up the success ladder whose

scientific endeavors are dictated by their desire to publish in top-notch journals. Dr. Bertuzzi

calls this a “deeply rooted cultural issue” that not only controls the fate of young researchers but

also slows down the dissemination of scientific knowledge. It could take several months for

scientists to publish their papers in a prestigious journal. The misapplication of JIF also dictates

the scientific progress because most PhD and postdoctoral students tend to be attracted to labs

with a high number of Nature and Science publications and overlook the labs that may do

meaningful science but publish in lesser-known journals.

Persistence of the misuse of JIF

Despite significant shortcomings of the JIF system and numerous consequences of its

misuse, it continues to be the main tool for research assessment in the scientific community.

Funding agencies use it to award grants, hiring, and promotion committees use it to fill academic

positions at universities, tenure committees use it to grant tenure to faculty members and

principal investigators use it to hire postdoctoral and graduate students. Many have referred to

Batool7

this obsession as “journal mania” and “impactitis.”

“Impact factor mania” continues to persist for a number of reasons. Scarcity of grants and

jobs combined with increasing number of postdocs in Science forces the evaluation committees

to discriminate among many similar candidates and grant proposals. The idea of accomplishing

this task by using a single quantitative bibliometric tool is desirable, as a single number often

gives the illusion of being an objective measure.

Moreover, the complex nature of science resulting from hyper-specialization in the past

few decades has also made the task of research assessment arduous across multiple fields. JIF

offers a convenient solution as a surrogate measure of quality for those scientists who have to

evaluate papers outside the scope of their expertise. Some critics have also attributed continued

use of JIF to laziness on the part of committees responsible for evaluating scientific work.

Proponents of JIF

Although most of the literature discussing JIF criticizes the system and encourages either

its abandonment or replacement, there are a small proportion of scientists who think otherwise.

Dr. Ludo Waltman, deputy director of the Centre for Science and Technology Studies (CWTS) at

Leiden University, Holland, is one such scientist who does not necessarily agree with the critics

of JIF. Although he acknowledges the shortcomings of JIF system, he does not believe that the

system should be abandoned. He argues that people who use JIF to compare journals for

subscription purposes are indirectly comparing papers in those journals, which subsequently

makes it a valid tool for assessing the quality of papers as well. He says that there should be

consistency in the way critics use JIF. If they are criticizing the system, then they should accept

the full consequences of their criticisms and fully reject any use of JIF because it is impractical

to separate these two ways of using Garfield’s bibliometric system.

Batool8

Dr. Waltman also seems to favor JIF because of the need for heuristics for quick

decision-making in a hypercompetitive field like Science. He says that reading biosketches for

all the candidates might be too time-consuming and JIF provides a shortcut for narrowing down

the application pool to a smaller number of candidates, who could then be subject to an in-depth

evaluation. Dr. Schekman refutes this argument by pointing out that reading a brief biosketch is

far less work than reading letters of recommendation and spending a few extra minutes on

evaluating applications is a small cost to pay in return for a better and fair assessment of

scientific quality that is free of fallacious JIF system. Dr. Bertuzzi also holds similar views and

says, “there are no shortcuts for research assessment.”

Alternative ways for Evaluation of Scientific Papers and Authors

Since JIF is a journal level metric, it is inappropriate to use it for the purposes of

assessing individual articles; thus, replacing it with alternative article level metrics would be

more effective. Two of the prominent article level metric systems include the h-index, introduced

by Jorge Hirsh in 2005 and Relative Citation Ratio (RCR) created by NIH working group in

2015. Such author-level metric systems reflect the number of publications of individual scientists

as well as the number of citations for individual papers, allowing comparison of authors without

being biased about the prestige of journals.

Just like the JIF, article level metrics have their own downsides. Dr. Curry describes the

h-index as fundamentally flawed because it heavily relies on the quantity of papers. For instance,

Dr. Harry Kroto won the Nobel Prize in Chemistry in 1985 because of his single influential

paper but ranks only 264th on the h-index because of a relatively lower number of publications.

The h-index also unduly favors the senior scientists, as the young researchers with only a handful

of papers score lower, no matter how impactful their papers are.

Batool9

RCR has also caught the attention of critics as many have pointed out the restrictive

nature of the system and raised concerns about its relevance to other disciplines besides

Biomedicine. A recent paper by Janssens et al. breaks down the numerator and denominator used

for the calculation of RCR and shows that RCR may decrease for older publications, putting the

established researchers at disadvantage. Since RCR is a fairly new metric system, it warrants

further research and perhaps a few adjustments to provide more robust bibliometric indicator and

potentially a better replacement for JIF.

Dr. Curry encourages the journals to display citation distributions alongside their JIF on

their website. This will reveal the full extent of skew of these distributions and give a realistic

view of JIF to the readers. It will be a countermeasure to the wider trend in Science to

inappropriately rely on JIF for the evaluation of researchers and perhaps, attenuate its influence

on the assessment of scholarly documents.

The abundance of available metric systems in Science can be attributed to natural

tendency of scientists to rely on numbers for research assessment. Majority of the skeptics of JIF

discourage reliance upon one particular metric system in deciding on grant awards or hiring

researchers. “The complexity of Science cannot be discerned by a single number,” says Dr.

Bertuzzi. Ideally, all the papers of all candidates should be read to make fair decisions about

grants and jobs, but given the time-constraints and the hyper-specialized nature of science, this

would not be a realistic solution.

Biographical sketches provide a useful alternative for evaluation committees to assess

individual scientists. A biosketch highlights each individual's accomplishments and briefly

describes his/her major contributions to their field. Dr. Harold Varmus proposed the inclusion of

biosketch in the National Institute of Health (NIH) grant applications during his time as the

Batool10

director of NIH between 1993 and 1999. It replaced the traditional bibliography and enabled the

grant committees to familiarize themselves with the content of a scientist’s research before

making their decision on grant allocations and awards. Since then, NIH requires a biosketch by

scientists who are either applying for or renewing their grants; Similarly, biosketch should be

made a mandatory part of every grant and job application because a single statistical number

from any metric system cannot fully gauge the merit of scientific papers. In addition, informed

decisions can only be made when the committees are acquainted with the scientific work.

Garfield’s bibliometric indicator, without a doubt, has some significant deficiencies that

render it as an unfitting surrogate for assessing research results and scientific productivity. It has

had adverse effects on the scientific community by negatively transforming the publication

industry and overemphasizing the importance of the prestige of journals for grant allocation and

hiring decisions. As a consequence, research interests of the scientists in numerous fields are

redirected, damaging scientific progress and discoveries.

Individual articles should be evaluated based on the quality of its content. Individual

scientists should be assessed based on what they publish instead of where they publish. This is

possible if the evaluation committees judge applications by using biographical sketches of the

candidates along with a combination of more appropriate article level metric systems such as the

RCR or h-index. Eradicating JIF would require a cultural change, and it may not occur overnight.

The funding agencies, evaluation committees and principal investigators could gradually uproot

the JIF system and reform the publication practices in Science by discouraging its use as a proxy

to measure the scientific impact. They can begin by signing the San Francisco DORA.

Batool11

References

Bertuzzi, S. (2018, April 27). Personal phone interview. Curry, S. (2018, May 01). Personal skype interview. Schekman, R. (2018, May 06). Personal skype interview. Waltman, L. (2018, May 07). Personal skype interview. Garfield, E. (2006). The History and Meaning of the Journal Impact Factor. Jama, 295(1), 90.

doi:10.1001/jama.295.1.90 Hyndman, R. J. (2017, June 21). Why I'm not celebrating the 2016 impact factors. Retrieved

from https://robjhyndman.com/hyndsight/2016-impact-factors/ Casadevalla, A. (2015, October 13). Arturo Casadevall. Retrieved from

http://mbio.asm.org/content/6/5/e01593-15.full Liu, X., Gai, S., & Zhou, J. (2016). Journal Impact Factor: Do the Numerator and Denominator

Need Correction? Plos One, 11(3). doi:10.1371/journal.pone.0151414 Kiesslich, T., Weineck, S. B., & Koelblinger, D. (2016). Reasons for Journal Impact Factor

Changes: Influence of Changing Source Items. Plos One, 11(4). doi:10.1371/journal.pone.0154199

Lariviere, V., Kiermer, V., MacCallum, C., McNutt, M., Patterson, M., Pulverer, B.,

Swaminathan, S., Taylor, S., Curry, S. (2016, September 11). A simple proposal for the publication of journal citation distributions. Retrieved from bioRxiv 062109; doi: https://doi.org/10.1101/062109

Bornmann, L., Marx, W., Gasparyan, A.Y. et al. Rheumatol Int (2012) 32: 1861. https://doi-

org.lehman.ezproxy.cuny.edu/10.1007/s00296-011-2276-1 Curry, S. (2012, August 13). Sick of Impact Factors. Retrieved from

http://occamstypewriter.org/scurry/2012/08/13/sick-of-impact-factors/ Cagan, R. (2013). The San Francisco Declaration on Research Assessment. Disease Models &

Mechanisms, 6(4), 869–870. http://doi.org/10.1242/dmm.012955 ArXiv, E. T. (2017, July 12). Chinese scientists can be paid up to $165K for publishing a single

paper in a top Western journal. Retrieved from https://www.technologyreview.com/s/608266/the-truth-about-chinas-cash-for-publication-policy/

Batool12

Gomar, F. S. (2014, March 01). How does the journal impact factor affect the CV of PhD students? Retrieved from http://embor.embopress.org/content/15/3/207

Callaway, E. (2016, July 08). Beat it, impact factor! Publishing elite turns against controversial

metric. Retrieved from https://www.nature.com/news/beat-it-impact-factor-publishing-elite-turns-against-controversial-metric-1.20224#/b1

Casadevalla, A. (2014, March 18). Arturo Casadevall. Retrieved from

http://mbio.asm.org/content/5/2/e00064-14.full Hendrix, S. (2018, March 08). Do I need Nature or Science papers for a successful career in

science? Retrieved from https://www.smartsciencecareer.com/do-i-need-nature-papers/ Lariviere, L., & Sugimoto, C. (2018, March 05). The Journal Impact Factor: A brief history,

critique, and discussion of adverse effects. Retrieved from https://arxiv.org/abs/1801.08992v2

Badner, A. (2015, January 8). The Growing Big Three Boycott: Understanding why scientists are

publically boycotting Cell, Nature and Science Journals. Retrieved from http://www.imsmagazine.com/the-growing-big-three-boycott-understanding-why-scientists-are-publically-boycotting-cell-nature-and-science-journals/

Curry, S. (2018, February 07). Let's move beyond the rhetoric: It's time to change how we judge

research. Retrieved from https://www.nature.com/articles/d41586-018-01642-w Waltman, L. (2016, July 11). The importance of taking a clear position in the impact factor

debate. Retrieved from https://www.cwts.nl/blog?article=n-q2w2c4 Bertuzzi, S. (2016, March 17). A New and Stunning Metric from NIH Reveals the Real Nature

of Scientific Impact. Retrieved from https://www.ascb.org/activation-energy/a-new-and-stunning-metric-from-nih-reveals-the-real-nature-of-scientific-impact/

Ball, P. (2012, January 06). The h-index, or the academic equivalent of the stag's antlers.

Retrieved from https://www.theguardian.com/commentisfree/2012/jan/06/bad-science-h-index

Janssens, A. C., Goodman, M., Powell, K. R., & Gwinn, M. (2017). A critical evaluation of the

algorithm behind the Relative Citation Ratio (RCR). PLOS Biology, 15(10). doi:10.1371/journal.pbio.2002536

Editage Insights. (2013, November 4). Why you should not use the journal impact factor to

evaluate research. Retrieved from https://www.editage.com/insights/why-you-should-not-use-the-journal-impact-factor-to-evaluate-research

journal impact factor: science’s misused metric and its...

Documents