software citation and a proposal (nsf workshop at havard medical school)
TRANSCRIPT
![Page 1: Software Citation and a Proposal (NSF workshop at Havard Medical School)](https://reader035.vdocuments.net/reader035/viewer/2022062821/58a122441a28abb91b8b5c29/html5/thumbnails/1.jpg)
@jameshowison
Software in the scientific literature:
Software mentions and a provocative proposal
James Howison Information School
University of Texas at AustinThis material is based upon work supported by the National
Science Foundation under Grant No. SMA-1064209.
![Page 2: Software Citation and a Proposal (NSF workshop at Havard Medical School)](https://reader035.vdocuments.net/reader035/viewer/2022062821/58a122441a28abb91b8b5c29/html5/thumbnails/2.jpg)
What does a citation do, anyway?
• Gives credit for contribution– A key reward that drives activity in science– Sits alongside publications, grants, promotions,
and prizes– Rewards drive type of artifacts and collaboration
• Explains the method used– Citations assist in knowing what was done– Provenance– Replication and extension
@jameshowison
![Page 3: Software Citation and a Proposal (NSF workshop at Havard Medical School)](https://reader035.vdocuments.net/reader035/viewer/2022062821/58a122441a28abb91b8b5c29/html5/thumbnails/3.jpg)
How problematic are current practices?
• How is software mentioned in papers?• How accessible and reusable is the software
mentioned?• How well do these mentions perform the functions of
citation?github.com/jameshowison/softciteDOI: 10.6084/m9.figshare.1146366
Howison, J., & Bullard, J. (2015). Software in the scientific literature: Problems with seeing, finding, and using software mentioned in the biology literature. Journal of the Association for
Information Science and Technology (JASIST), doi: 10.1002/asi.23538
@jameshowison
![Page 4: Software Citation and a Proposal (NSF workshop at Havard Medical School)](https://reader035.vdocuments.net/reader035/viewer/2022062821/58a122441a28abb91b8b5c29/html5/thumbnails/4.jpg)
Sample and Method
• 90 randomly selected articles from biology literature, articles published between 2000 and 2010.
• Journals stratified across Journal Impact Factor to balance coverage with influence
@jameshowison
![Page 5: Software Citation and a Proposal (NSF workshop at Havard Medical School)](https://reader035.vdocuments.net/reader035/viewer/2022062821/58a122441a28abb91b8b5c29/html5/thumbnails/5.jpg)
Content analysis scheme
Manual content analysis (3 coders, Kappa)1. Identifying mentions– Read article, locate a mention of a piece of software
2. Identify in-text characteristics of mention– Name of software? URL? Date? Version number? In
bibliography? Cite to paper/manual/webpage?3. Functions of mention– Identifiable? Findable? Accessible? Source? Match
preferred citation?
@jameshowison
![Page 6: Software Citation and a Proposal (NSF workshop at Havard Medical School)](https://reader035.vdocuments.net/reader035/viewer/2022062821/58a122441a28abb91b8b5c29/html5/thumbnails/6.jpg)
https://github.com/jameshowison/softcite/blob/master/data/software-citation-coding.ttl
@jameshowison
![Page 7: Software Citation and a Proposal (NSF workshop at Havard Medical School)](https://reader035.vdocuments.net/reader035/viewer/2022062821/58a122441a28abb91b8b5c29/html5/thumbnails/7.jpg)
How many mentions?
• 59 articles mentioned software, 31 did not.• There were 286 distinct mentions of software.• Those mentions were to 146 distinct pieces of
software.– This includes general purpose (e.g., Microsoft
Excel) and science-specific software (e.g., DENZO, BLAST).
@jameshowison
![Page 8: Software Citation and a Proposal (NSF workshop at Havard Medical School)](https://reader035.vdocuments.net/reader035/viewer/2022062821/58a122441a28abb91b8b5c29/html5/thumbnails/8.jpg)
Types of mentionsMention Type Example
Cite to Publication … was calculated using biosys (Swofford & Selander 1981).
Cite to Project Name or Website
… using the program Autodecay version 4.0.29 PPC (Eriksson 1998).Reference List has: ERIKSSON, T. 1998. Autodecay, vers. 4.0.29 Stockholm: Department of Botany.
Like Instrument … calculated by t-test using the Prism 3.0 software (GraphPad Software, San Diego, CA, USA).
URL in text … freely available from http://www.cibiv.at/software/pda/ .
In-text name mention only
… were analyzed using MapQTL (4.0) software.
Not even name mentioned
… was carried out using software implemented in the Java programming language.
@jameshowison
![Page 9: Software Citation and a Proposal (NSF workshop at Havard Medical School)](https://reader035.vdocuments.net/reader035/viewer/2022062821/58a122441a28abb91b8b5c29/html5/thumbnails/9.jpg)
Types of Mentions
@jameshowison
![Page 10: Software Citation and a Proposal (NSF workshop at Havard Medical School)](https://reader035.vdocuments.net/reader035/viewer/2022062821/58a122441a28abb91b8b5c29/html5/thumbnails/10.jpg)
Simpler Mention Kinds
@jameshowison
![Page 11: Software Citation and a Proposal (NSF workshop at Havard Medical School)](https://reader035.vdocuments.net/reader035/viewer/2022062821/58a122441a28abb91b8b5c29/html5/thumbnails/11.jpg)
By Strata?
@jameshowison
![Page 12: Software Citation and a Proposal (NSF workshop at Havard Medical School)](https://reader035.vdocuments.net/reader035/viewer/2022062821/58a122441a28abb91b8b5c29/html5/thumbnails/12.jpg)
What sort of software mentioned?
@jameshowison
![Page 13: Software Citation and a Proposal (NSF workshop at Havard Medical School)](https://reader035.vdocuments.net/reader035/viewer/2022062821/58a122441a28abb91b8b5c29/html5/thumbnails/13.jpg)
Proprietary software more likely to be mentioned “like instrument”
@jameshowison
![Page 14: Software Citation and a Proposal (NSF workshop at Havard Medical School)](https://reader035.vdocuments.net/reader035/viewer/2022062821/58a122441a28abb91b8b5c29/html5/thumbnails/14.jpg)
How useful are these mentions?
@jameshowison
![Page 15: Software Citation and a Proposal (NSF workshop at Havard Medical School)](https://reader035.vdocuments.net/reader035/viewer/2022062821/58a122441a28abb91b8b5c29/html5/thumbnails/15.jpg)
Not much change across strata
@jameshowison
![Page 16: Software Citation and a Proposal (NSF workshop at Havard Medical School)](https://reader035.vdocuments.net/reader035/viewer/2022062821/58a122441a28abb91b8b5c29/html5/thumbnails/16.jpg)
Do mention types work differently?
@jameshowison
![Page 17: Software Citation and a Proposal (NSF workshop at Havard Medical School)](https://reader035.vdocuments.net/reader035/viewer/2022062821/58a122441a28abb91b8b5c29/html5/thumbnails/17.jpg)
Other findings
• Only 24% journals had policies that mentioned software, declining by strata.– Rarely mention versions.– Not clear that these are followed.
• Only between 13–30% of packages make a specific request for a particular type of citation– 32% of mentions didn’t follow the citation.
@jameshowison
![Page 18: Software Citation and a Proposal (NSF workshop at Havard Medical School)](https://reader035.vdocuments.net/reader035/viewer/2022062821/58a122441a28abb91b8b5c29/html5/thumbnails/18.jpg)
Visible citation formats as “nudge”
• Some disagreement about how important the text of a publication is:– Should effort focus on machine readable “meta-data”
in publication repositories (not in paper)?– Or focus on human readable formats in the paper?
• My position is that human readable will influence practice more quickly
• Formal, well-structured formats and policies act as a “nudge” to shape how authors mention software.
@jameshowison
![Page 19: Software Citation and a Proposal (NSF workshop at Havard Medical School)](https://reader035.vdocuments.net/reader035/viewer/2022062821/58a122441a28abb91b8b5c29/html5/thumbnails/19.jpg)
Software archiving
• Strong finding that many pieces of software were not findable.– 1 in 10 packages could not be found at all– Only 1 in 20 packages could the specific version be found
(combination of missing version info and missing versions online)
• Analogous to link-rot for URLs in publications (Koehler, 1999)
• Need to influence how software is archived– Is that a role for publishers? Escrow for non-open software?
@jameshowison
![Page 20: Software Citation and a Proposal (NSF workshop at Havard Medical School)](https://reader035.vdocuments.net/reader035/viewer/2022062821/58a122441a28abb91b8b5c29/html5/thumbnails/20.jpg)
Part 2
But what are we working to incentivize anyway?
@jameshowison
![Page 21: Software Citation and a Proposal (NSF workshop at Havard Medical School)](https://reader035.vdocuments.net/reader035/viewer/2022062821/58a122441a28abb91b8b5c29/html5/thumbnails/21.jpg)
@jameshowison
Howison, J., & Herbsleb, J. D. (2013). Incentives and integration in scientific software production. In Proceedings of the ACM Conference on Computer Supported Cooperative Work (pp. 459–470). San Antonio, TX.
![Page 22: Software Citation and a Proposal (NSF workshop at Havard Medical School)](https://reader035.vdocuments.net/reader035/viewer/2022062821/58a122441a28abb91b8b5c29/html5/thumbnails/22.jpg)
Citation and collaboration
• What is the impact on collaboration of credit-giving through citations?
• Can a citation (of any kind) incentivize an ongoing collaboration able to do the work needed to keep a piece of software scientifically functional?
• Could a standard undermine collaboration further?
@jameshowison
![Page 23: Software Citation and a Proposal (NSF workshop at Havard Medical School)](https://reader035.vdocuments.net/reader035/viewer/2022062821/58a122441a28abb91b8b5c29/html5/thumbnails/23.jpg)
Can citation incentivize maintenance?
• Software relies on other software– Dependencies all the way down– Software stacks change quickly (new opportunities, new
problems, new libraries)• Scientists seek to extend the work of others, not just
re-execute it.• Many re-implementations come from frustration with
poorly maintained software– Software that wasn’t adjusted as its dependencies changed– Software that wasn’t updated with newer techniques
@jameshowison
![Page 24: Software Citation and a Proposal (NSF workshop at Havard Medical School)](https://reader035.vdocuments.net/reader035/viewer/2022062821/58a122441a28abb91b8b5c29/html5/thumbnails/24.jpg)
A modest proposal
1. Papers have full workflow available2. Workflows have regression tests running on a
continuous integration system3. Integration system pulls all new versions of
dependencies, executes regression tests.4. On fail (build or tests) the paper is retracted.
@jameshowison
Howison, J. (2014). Retract bit-rotten publications: Aligning incentives for sustaining scientific software. In Working towards Sustainable Software for Science: Practice and Experiences (SuperComputing 2014 Workshop). New Orleans.
![Page 25: Software Citation and a Proposal (NSF workshop at Havard Medical School)](https://reader035.vdocuments.net/reader035/viewer/2022062821/58a122441a28abb91b8b5c29/html5/thumbnails/25.jpg)
Uh …
• Retraction too strong, you say?
Ok, let’s revisit step 4:• On fail, the paper is marked “provisionally
non-extendable” and authors have some period to fix before marked as “retired”.
@jameshowison
![Page 26: Software Citation and a Proposal (NSF workshop at Havard Medical School)](https://reader035.vdocuments.net/reader035/viewer/2022062821/58a122441a28abb91b8b5c29/html5/thumbnails/26.jpg)
Could others fix papers?
• Why must the original authors be the ones to fix maintenance issues?– Attract new resources, motivate integration.
• Re-write Step 4 again:– On fail, workflow is marked as “needing work”– Anyone can contribute that work• Those extending the work, grad students, citizen
scientists– Anyone that succeeds is added as an Author
@jameshowison
![Page 27: Software Citation and a Proposal (NSF workshop at Havard Medical School)](https://reader035.vdocuments.net/reader035/viewer/2022062821/58a122441a28abb91b8b5c29/html5/thumbnails/27.jpg)
Added as an author??!?
• Just for fixing a bug?Ok, fine. Let’s re-write the second half of step 4 again:– Anyone maintaining a workflow and returning a
publication to full extendable status is:• Added to paper as an acknowledgement• Invited to a conference, Given a prize• Credited in a visible, public, system (think github
profile)
@jameshowison
![Page 28: Software Citation and a Proposal (NSF workshop at Havard Medical School)](https://reader035.vdocuments.net/reader035/viewer/2022062821/58a122441a28abb91b8b5c29/html5/thumbnails/28.jpg)
Takeaways
• Software citation is diverse and fails functions:– “Like instrument” and “cite to publication” citations
give credit but fail to provide version information– Other, informal mentions, better at versions but often
fail to give credit• Software is frequently inaccessible• Collaboration is counter-motivated by publication• Bit-rotten papers should create opportunities to
earn reputation for scientific contribution.
@jameshowison
![Page 29: Software Citation and a Proposal (NSF workshop at Havard Medical School)](https://reader035.vdocuments.net/reader035/viewer/2022062821/58a122441a28abb91b8b5c29/html5/thumbnails/29.jpg)
@jameshowison
Extras
![Page 30: Software Citation and a Proposal (NSF workshop at Havard Medical School)](https://reader035.vdocuments.net/reader035/viewer/2022062821/58a122441a28abb91b8b5c29/html5/thumbnails/30.jpg)
@jameshowison
Software packages found