automated research impact assessment (aria) · 2020. 9. 26. · automated research impact...

1
Automated Research Impact Assessment (ARIA) Christina H. Drew 1 , Kristianna G. Pettibone 1 , Fallis Owen Finch, III 2 , Douglas Giles 2 , Paul Jordan 3 1 National Institute of Environmental Health Sciences, Program Analysis Branch; 2 Open Intelligence, Inc. 3 National Institutes of Health, National Institutes of Health, Office of Data Analysis Tools and Systems (OD OER) Abstract ARIA Process (Pilot) DISCUSSION As federal programs are held more accountable for their research User Actions: In the Background: Questions: investments, The National Institute of Environmental Health Sciences (NIEHS) has developed a new method to quantify the 1. Access ARIA What does it mean? impact of our funded research on the scientific and broader Tool 1. Imports list of references Is there a critical mass of references that are needed in order to have a credible analysis? A pilot version of the assessment tool was developed for NIEHS. 2. Select “Enter 2. Extracts title, author, and year Can we determine “benchmarks” for specific fields or types of communities. Ideally the tool will become available to all NIH Extramural Staff. list of from original reference into artifacts? ARIA includes new statistics that science managers can use to References” separate fields Strengths: benchmark contributions to research by funding source. This new method provides the ability to conduct automated impact analyses of 3. Provide Job 3. Searches title, author, and year Automated – requires a fraction of the time needed for manual federal research that can be incorporated in program evaluations. Title in PubMed and looks for PMID analysis We apply ARIA to several case studies to examine the impact of Three separate parsers used to match Ability to examine long-term impacts NIEHS funded research, propose a number of questions that t 4. Enter Email with PubMed. Best results used. Makes use of existing, readily available information sources Relatively simple to implement new method raises, and discuss strengths and weaknesses of he the approach. 5. Add references (1 per line) 4. If PMID found, looks for NIH Could be available to all of NIH On balance, we believe that the strengths outweigh the limitations Grant # and that ARIA represents another tool that NIH can use to describe 6. Hit upload button Limitations: Not all artifacts have a bibliography (laws, policies) impacts of its research investments. 7. Results load in job grid – 5. Generates multi-tab MS Excel Improperly sourced references (getting better with recent NIH status column indicates report with raw data and requirements) progress novel statistics about Not all journals included in PubMed NIH project support Reference might not support the findings (e.g. retraction/ 8. Download file rebuttals) Evaluation Context at NIEHS Parser imperfect. For example, deeper analysis of one ARIA report 8 found that, of 129 references not analyzed by ARIA • We get many questions about portfolios: Raw Data Output 14 (11%) published before 1980 55 (43%) were “reasonable” – books abstracts, gray About: methods, approaches, results, impacts From: program officers, Extramural Division leadership, ‘Project Mappings’ tab from the MS Excel output Raw data designed so user can easily literature, non-english, or a thesis and thus not likely to NIEHS leadership, NIH, HHS, reporters, external recalculate metrics be in Pub Med. • 60 (47%) unknown errors stakeholders, etc. Original reference provided in right • Logic models help us look beyond simple output metrics to column think about long term impacts. 1-3 Indicates if key criteria are met and Future Directions included in automated analysis Title / author / year found Hoping to expand pilot to broaden access to all of NIH via Published since 1980 SPIRES • PMID found Analyzed by ARIA Metrics need vetting and discussion within NIH analysis community to assess utility and meaning of results Logic Model – organized, project specific, informs metrics Inputs – resources available Shows exactly what the parsers search • Potential algorithm enhancements: Activities – actions that use available resources Filter out duplicates Outputs – direct products of activities Provides PMID, Confirmed projects Allow user to import a combination of references and PMIDs Track iterations of requests Impacts – benefits or changes resulting from activities, outputs Lists potential project matches (not Improve parser capacity (e.g., a common error is to interpret authors included in summary statistics) as the title, preventing possible match to PubMed record) • Typically evaluations start with NIH grant programs and look We have already added a filter to the year so that letters (e.g. 2001a) are removed prospectively for impact. • This tool provides an automated way to start with programs we know have had high impact and look retrospectively for ARIA’s Novel Metrics of NIH Investment and Case Studies References NIH influence. 1. Engel-Cox, J. A., B. Van Houten, et al. (2008). "Conceptual model of comprehensive Artifacts We examined references for three research metrics for improved human health and environment." Environ Health Perspect Objective Metrics 2010 EPA 2009 EPA 2012 EPA Integrated Science Assessments available 116(5): 583-592. Carbon Summary Output Particulate Lead (Pb) Evidence of NIH Total # and % of electronically from the Environmental 2. Liebow, E., J. Phelps, et al. (2009). "Toward the assessment of scientific and public health Automated Research Impact Assessment Metrics Monoxide Protection Agency. impacts of the National Institute of Environmental Health Sciences Extramural Asthma Matter ISA 5 ISA 7 investment references that ISA 6 Research Program using available data." Environ Health Perspect 117(7): 1147-1154. Premise acknowledge NIH Total # of references submitted 3,483 179 625 Criteria for “important artifacts” 3. Orians, C., J. Abed, et al. (2009). "Scientific and Public Health Impacts of the NIEHS Project Total # of references that could not be Plausible – NIEHS reasonably Extramural Asthma Research Program - Insights from Primary Data." Res Eval 18(5): 375- Technology exists at NIH (SPIRES) to automate analysis of analyzed 1,517 28 238 expected to influence the artifact 385. Evidence of ICO Total # and % of funding sources associated with a list of references Title, author or year not be determined 2 0 31 Credible – Artifact published by a 4. Boyack, K. W. and P. Jordan (2011). "Metrics associated with NIH funding: a high-level view." investment references that Scientific Publication Information Retrieval & Evaluation System 4 trustworthy source Journal of the American Medical Informatics Association 18(4): 423-431. PMID could not be determined 1,502 24 198 acknowl ICO Pro edge an Crawls PubMed and matches to NIH Grants Important – makes a significant 5. EPA (2009). Integrated Science Assessment for Particulate Matter. N. C. f. E. Assessment. Published before 1980 13 4 9 ject Provides information to QVR, RePORTER and has its own UI Research Triangle Park, NC. contribution to the field of Total # of references that are analyzable 1,966 151 387 environmental health science 6. EPA (2010). Integrated Science Assessment for Carbon Monoxide. N. C. f. E. Assessment. Total # of references that acknowledge an NIH Bibliography of an “important artifact” is an untapped Research Triangle Park, NC. Grant 467 58 12 Relative % of NIH resource for assessing impacts Observations 7. EPA (2012). EPA Integrated Science Assessment for Lead. E. P. Agency. Research Triangle Total # of references that acknowledge an investment references from “Important artifact” = a document from a credible source that is Wide range of references supported by Park, NC, Environmental Protection Agency. NIEHS Grant 357 16 11 compared t of ICO CO o the I plausibly connected to NIEHS/NIH research NIH 8. NRC (2001). Update NRC Arsenic in Drinking Water. Washington, D.C. (467/1966) (58/151) (12/387) rest of NIH % of references that acknowledge NIH % NIEHS/NIH support also ranges funding 24% 38% 3% • Artifacts include: widely Distribution of Tota projec l # NIH/ICO Documentation of policy/regulatory decisions (357/1966) (16/151) (11/387) Many references not “parsable” % of references that acknowledge NIEHS investment ts referenced funding More work needed on this, but Acknowledgments Clinical and treatment guidelines 18% 11% 3% across NI matching to PubMed is good Major decision or guidance documents (357/467) (16/58) (11/12) ICO projects H and When a reference is analyzed most The authors would like to acknowledge the work of Sheila Newton and Raymond Grissom, Jr., of the % of NIH references from ES Reference works from authoritative sources NIEHS Office of Planning and Policy Evaluation, who conducted an early manual review of the EPA 76% 28% 92% likely “gray lit” or books Ozone Regulation, resulting in the idea for this new bibliometric research method. ICO = NIH Institute, Center or Office NIEHS = National Institute of Environmental Health Sciences

Upload: others

Post on 01-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Automated Research Impact Assessment (ARIA) · 2020. 9. 26. · Automated Research Impact Assessment (ARIA) Christina H. Drew 1, Kristianna G. Pettibone 1, Fallis Owen Finch, III

Automated Research Impact Assessment (ARIA) Christina H. Drew1, Kristianna G. Pettibone1, Fallis Owen Finch, III2, Douglas Giles2, Paul Jordan3

1National Institute of Environmental Health Sciences, Program Analysis Branch; 2Open Intelligence, Inc. 3National Institutes of Health, National Institutes of Health, Office of Data Analysis Tools and Systems (OD OER)

Abstract ARIA Process (Pilot) DISCUSSION As federal programs are held more accountable for their research User Actions: In the Background:

Questions: investments, The National Institute of Environmental HealthSciences (NIEHS) has developed a new method to quantify the 1. Access ARIA • What does it mean? impact of our funded research on the scientific and broader Tool 1. Imports list of references • Is there a critical mass of references that are needed in order

to have a credible analysis? A pilot version of the assessment tool was developed for NIEHS. 2. Select “Enter 2. Extracts title, author, and year • Can we determine “benchmarks” for specific fields or types of communities.

Ideally the tool will become available to all NIH Extramural Staff. list of from original reference into artifacts? ARIA includes new statistics that science managers can use to References” separate fields

Strengths: benchmark contributions to research by funding source. This newmethod provides the ability to conduct automated impact analyses of 3. Provide Job 3. Searches title, author, and year • Automated – requires a fraction of the time needed for manual federal research that can be incorporated in program evaluations. Title in PubMed and looks for PMID analysis We apply ARIA to several case studies to examine the impact of Three separate parsers used to match • Ability to examine long-term impacts NIEHS funded research, propose a number of questions that t 4. Enter Email with PubMed. Best results used. • Makes use of existing, readily available information sources

• Relatively simple to implement new method raises, and discuss strengths and weaknesses ofhe the

approach. 5. Add references (1 per line) 4. If PMID found, looks for NIH • Could be available to all of NIH On balance, we believe that the strengths outweigh the limitations Grant # and that ARIA represents another tool that NIH can use to describe 6. Hit upload button Limitations:

• Not all artifacts have a bibliography (laws, policies) impacts of its research investments. 7. Results load in job grid – 5. Generates multi-tab MS Excel • Improperly sourced references (getting better with recent NIH

status column indicates report with raw data and requirements) progress novel statistics about • Not all journals included in PubMed

NIH project support • Reference might not support the findings (e.g. retraction/ 8. Download file rebuttals) Evaluation Context at NIEHS • Parser imperfect. For example, deeper analysis of one ARIA

report8 found that, of 129 references not analyzed by ARIA• We get many questions about portfolios: Raw Data Output • 14 (11%) published before 1980

• 55 (43%) were “reasonable” – books abstracts, gray • About: methods, approaches, results, impacts • From: program officers, Extramural Division leadership,

‘Project Mappings’ tab from the MS Excel output • Raw data designed so user can easily

literature, non-english, or a thesis and thus not likely to

NIEHS leadership, NIH, HHS, reporters, external recalculate metrics be in Pub Med.

• 60 (47%) unknown errors stakeholders, etc. • Original reference provided in right

• Logic models help us look beyond simple output metrics to column think about long term impacts.1-3

• Indicates if key criteria are met and Future Directionsincluded in automated analysis• Title / author / year found • Hoping to expand pilot to broaden access to all of NIH via • Published since 1980 SPIRES • PMID found• Analyzed by ARIA • Metrics need vetting and discussion within NIH analysis

community to assess utility and meaning of resultsLogic Model – organized, project specific, informs metrics

• Inputs – resources available • Shows exactly what the parsers search • Potential algorithm enhancements:• Activities – actions that use available resources • Filter out duplicates • Outputs – direct products of activities • Provides PMID, Confirmed projects • Allow user to import a combination of references and PMIDs

• Track iterations of requests• Impacts – benefits or changes resulting from activities, outputs • Lists potential project matches (not • Improve parser capacity (e.g., a common error is to interpret authors

included in summary statistics) as the title, preventing possible match to PubMed record) • Typically evaluations start with NIH grant programs and look • We have already added a filter to the year so that letters (e.g.

2001a) are removed prospectively for impact.

• This tool provides an automated way to start with programs we know have had high impact and look retrospectively for ARIA’s Novel Metrics of NIH Investment and Case Studies References NIH influence.

1. Engel-Cox, J. A., B. Van Houten, et al. (2008). "Conceptual model of comprehensive Artifacts We examined references for three research metrics for improved human health and environment." Environ Health Perspect Objective Metrics 2010 EPA2009 EPA 2012 EPA Integrated Science Assessments available 116(5): 583-592. Carbon Summary Output Particulate Lead (Pb)Evidence of NIH Total # and % of electronically from the Environmental 2. Liebow, E., J. Phelps, et al. (2009). "Toward the assessment of scientific and public healthAutomated Research Impact Assessment Metrics Monoxide

Protection Agency. impacts of the National Institute of Environmental Health Sciences Extramural Asthma Matter ISA5 ISA7investment references that ISA6 Research Program using available data." Environ Health Perspect 117(7): 1147-1154. Premise acknowledge NIH Total # of references submitted 3,483 179 625 Criteria for “important artifacts” 3. Orians, C., J. Abed, et al. (2009). "Scientific and Public Health Impacts of the NIEHS Project

Total # of references that could not be • Plausible – NIEHS reasonably Extramural Asthma Research Program - Insights from Primary Data." Res Eval 18(5): 375-• Technology exists at NIH (SPIRES) to automate analysis of analyzed 1,517 28 238 expected to influence the artifact 385.

Evidence of ICO Total # and % of funding sources associated with a list of references Title, author or year not be determined 2 0 31 • Credible – Artifact published by a 4. Boyack, K. W. and P. Jordan (2011). "Metrics associated with NIH funding: a high-level view."investment references that • Scientific Publication Information Retrieval & Evaluation System4 trustworthy source Journal of the American Medical Informatics Association 18(4): 423-431. PMID could not be determined 1,502 24 198acknowl

ICO Proedge an• Crawls PubMed and matches to NIH Grants • Important – makes a significant 5. EPA (2009). Integrated Science Assessment for Particulate Matter. N. C. f. E. Assessment.Published before 1980 13 4 9ject• Provides information to QVR, RePORTER and has its own UI Research Triangle Park, NC. contribution to the field ofTotal # of references that are analyzable 1,966 151 387

environmental health science 6. EPA (2010). Integrated Science Assessment for Carbon Monoxide. N. C. f. E. Assessment.Total # of references that acknowledge an NIH • Bibliography of an “important artifact” is an untapped Research Triangle Park, NC. Grant 467 58 12Relative % of NIH resource for assessing impacts Observations 7. EPA (2012). EPA Integrated Science Assessment for Lead. E. P. Agency. Research Triangle Total # of references that acknowledge an investment references from • “Important artifact” = a document from a credible source that is • Wide range of references supported by Park, NC, Environmental Protection Agency. NIEHS Grant 357 16 11compared t

of ICO COo the Iplausibly connected to NIEHS/NIH research NIH 8. NRC (2001). Update NRC Arsenic in Drinking Water. Washington, D.C. (467/1966) (58/151) (12/387) rest of NIH % of references that acknowledge NIH • % NIEHS/NIH support also ranges funding 24% 38% 3%• Artifacts include: widely Distribution of Tota

projecl # NIH/ICO • Documentation of policy/regulatory decisions (357/1966) (16/151) (11/387) • Many references not “parsable”% of references that acknowledge NIEHS investment ts referenced funding • More work needed on this, but Acknowledgments • Clinical and treatment guidelines 18% 11% 3%across NI matching to PubMed is good • Major decision or guidance documents (357/467) (16/58) (11/12) ICO projects

H and

• When a reference is analyzed most The authors would like to acknowledge the work of Sheila Newton and Raymond Grissom, Jr., of the% of NIH references from ES • Reference works from authoritative sources NIEHS Office of Planning and Policy Evaluation, who conducted an early manual review of the EPA 76% 28% 92% likely “gray lit” or books Ozone Regulation, resulting in the idea for this new bibliometric research method.

ICO = NIH Institute, Center or Office NIEHS = National Institute of Environmental Health Sciences