presented at the american evaluation association annual conference october 18, 2013

27
Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: [email protected] 1 Multivariate Analysis of Collaboration Patterns Among Researchers in an Epidemiological Cohort Study Presented at the American Evaluation Association Annual Conference October 18, 2013 Matthew Eblen, MPIA 1 , Katherine Catevenis, MPH 1 , Richard R. Fabsitz, PhD 2 , Jean L. Olson, MD 2 , Mona Puggal, MPH 2 , Robin M. Wagner, PhD, MS 1 1 Division of Statistical Analysis and Reporting, Office of Planning, Analysis and Communication, Office of Extramural Research, Office of the Director, National Institutes of Health

Upload: jam

Post on 23-Feb-2016

44 views

Category:

Documents


0 download

DESCRIPTION

Multivariate Analysis of Collaboration Patterns Among Researchers in an Epidemiological Cohort Study. Presented at the American Evaluation Association Annual Conference October 18, 2013 Matthew Eblen , MPIA 1 , Katherine Catevenis, MPH 1 , Richard R. Fabsitz, PhD 2 , - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Presented at the American Evaluation Association Annual Conference October 18, 2013

Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: [email protected] 1

Multivariate Analysis of Collaboration Patterns Among Researchers in an Epidemiological Cohort Study

Presented at the American Evaluation Association Annual ConferenceOctober 18, 2013

Matthew Eblen, MPIA1, Katherine Catevenis, MPH1, Richard R. Fabsitz, PhD2, Jean L. Olson, MD2, Mona Puggal, MPH2, Robin M. Wagner, PhD, MS1

1Division of Statistical Analysis and Reporting, Office of Planning, Analysis and Communication, Office of Extramural Research, Office of the Director, National Institutes of Health

2Epidemiology Branch, Division of Cardiovascular Sciences, National Heart, Lung, and Blood Institute, National Institutes of Health

Page 2: Presented at the American Evaluation Association Annual Conference October 18, 2013

Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: [email protected] 2

Outline

• Research Questions• Background on NHLBI-funded Cohort Study

– Cardiovascular Health Study• Methods with Examples• Analysis• Summary• Next Steps

Page 3: Presented at the American Evaluation Association Annual Conference October 18, 2013

Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: [email protected] 3

Research Questions

• General Question:– What motivates researchers to collaborate with one another?

• Specific Questions:– Are certain researcher characteristics associated with an

increased likelihood to collaborate?– If so, can we measure the relative magnitude of these

characteristics?

Page 4: Presented at the American Evaluation Association Annual Conference October 18, 2013

Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: [email protected] 4

Cardiovascular Health Study Background

• Cardiovascular Health Study (CHS): Started in 1988 to study development and progression of clinical coronary heart disease (CHD) and stroke in older adults– Cohort study funded by the National Heart, Lung, and Blood

Institute (NHLBI)– Persons were recruited at 4 study field sites– Includes questionnaires, clinic exams, laboratory exams, and

ongoing participant follow-up to identify clinical events– Includes occasional training events and a policy on data

sharing• Information on journal articles associated with CHS, and

published between 1990 – June 2011 was collected– Publications reported by study coordinating centers,

augmented through PubMed searches– Co-author linkages identified

Page 5: Presented at the American Evaluation Association Annual Conference October 18, 2013

Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: [email protected] 5

Methods - I

• Builds on previous paper* applying social network analysis techniques to collaboration networks of two NHLBI-funded cohort studies, the CHS and Strong Heart Study – Two authors were said to have collaborated if they co-

authored a publication together– Collaboration network was constructed using co-authorship

linkages– Authors represented by nodes– Collaboration (co-authorship) caused a line to be drawn

between collaborating authors– Network measures of density, diameter and centralization

were calculated• Similar to traditional summary statistics

*Eblen et al., Social network analysis comparing researcher collaborations in two cardiovascular cohort studies , Research Evaluation (2012) 21 (5): 392-405. doi: 10.1093/reseval/rvs030

Page 6: Presented at the American Evaluation Association Annual Conference October 18, 2013

Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: [email protected] 6

Example - I

Here is one possible author collaboration network.

author node

instance of collaboration between two authors

Density = 36% (16/45 = 36% of possible collaborations have occurred)This is the probability that two random authors in the network have collaborated.

Page 7: Presented at the American Evaluation Association Annual Conference October 18, 2013

Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: [email protected] 7

Example - II

Let color represent the author’s field of science. Estimates of how likely it is two authors collaborated will improve if their fields of science are known.

e.g., if both authors are red there is a 90% chance they have collaborated (9/10 possible collaborations have occurred among red authors)

General observations about this network:1. Red authors are more collaborative than blue authors (ten

collaborations vs. seven collaborations). 2. Both red and blue authors are more likely to collaborate with one

another than with an author of the other field of science (only one collaboration between blue and red).

Page 8: Presented at the American Evaluation Association Annual Conference October 18, 2013

Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: [email protected] 8

Example - III

Let node size represent seniority (larger nodes = more seniority). The more seniority an author pair has, the more likely it is they collaborated.

Are red authors more collaborative than blue authors? Or are high seniority authors more collaborative than low seniority authors? Or both?

Exponential Random Graph Models (ERGMs) are designed to answer such questions.

Page 9: Presented at the American Evaluation Association Annual Conference October 18, 2013

Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: [email protected] 9

Methods - II

• Used ERGMs to estimate the likelihood that two CHS authors would collaborate– Similar to traditional multivariate logistic regression models– Dependent variable is the probability that two authors will

collaborate given characteristics of the authors (independent variables)

– Isolates the contribution of one characteristic on the likelihood of collaboration while “controlling” for all other characteristics

– ERGMs were fitted with the Statnet package in R (3.0.1)• Also modeled the Strong Heart Study collaboration network,

but results not presented here due to time constraints

Page 10: Presented at the American Evaluation Association Annual Conference October 18, 2013

Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: [email protected] 10

Methods - III

• CHS author characteristics included in ERGM – Continuous Variables

• Publication Productivity– Average # of publications per year

• Network Seniority– Number of years since entering the network

– Categorical Variables• Role in Study:

– PI: Funded Principal Investigator– Co-Investigator: Any non-PI paid staff member of study (or NHLBI

staff involved in study)– Neither: No formal study affiliation

Page 11: Presented at the American Evaluation Association Annual Conference October 18, 2013

Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: [email protected] 11

Methods – III (cont.)

– Categorical Variables (cont.)• Primary Field of Science (FOS):

– Authors coded to the field of science associated with the journal they published in most often

– If authors published in two FOS journals an equal amount they were classified as “More Than One FOS”

• Training Events– Data Analysis Workshop Attendance– NHLBI sponsored workshops for new junior investigators in 2005

and 2007• Data Sharing

– Utilization of NHLBI Data Repository Data Set– Annually updated de-identified data set available since 2000, which

is easily distributed by NHLBI to any qualified investigator– Formerly known as Limited Access Data Set (LADS)

Page 12: Presented at the American Evaluation Association Annual Conference October 18, 2013

Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: [email protected] 12

Methods - IV

• For categorical variable characteristics, two types of estimates were calculated:– Sociality

• The general propensity of authors of a particular category to collaborate, regardless of whom with

– e.g., “red” authors were more collaborative than “blue” authors – If authors of a given pair differ in characteristic type, a different sociality

estimate applies to each author in the pair– Assortative Mixing

• The propensity of authors to collaborate specifically within their own categorical type

– e.g., “red” authors were more likely to collaborate with “red” authors than with “blue” authors

– An assortative mixing estimate only applies to author pairs that match on a particular characteristic

Page 13: Presented at the American Evaluation Association Annual Conference October 18, 2013

Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: [email protected] 13

Methods – IV (cont.)

• For continuous variable characteristics, two types of estimates were calculated:– Combined

• Add together the values of both authors in the pair• E.g., the more combined years of seniority an author pair

had, the more likely it is they collaborated• Similar to sociality

– Difference• Subtract the values of the author pair from one another• E.g., the greater the difference in years of seniority between

the author pair, the less likely it is they collaborated• Similar to assortative mixing

Page 14: Presented at the American Evaluation Association Annual Conference October 18, 2013

Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: [email protected] 14

Methods – V

• Estimates of characteristics (independent variables) shown in log odds (logit) form– Useful for showing the relative magnitude of each

characteristic’s contribution to collaboration• 95% confidence intervals

– Intervals that cross zero indicate the estimate is not statistically significant

• Sociality estimates require a baseline reference group, assortative mixing estimates do not– Sociality considers all collaborative ties, so degrees of

freedom are exhausted– Assortative mixing only considers collaborative ties within

characteristic types, so degrees of freedom are not exhausted

Page 15: Presented at the American Evaluation Association Annual Conference October 18, 2013

Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: [email protected] 15

CHS Collaboration Network

Author Role in StudyRed = Principal InvestigatorOrange = Co-InvestigatorGreen= Neither

# Authors = 1749Density = 2%

Page 16: Presented at the American Evaluation Association Annual Conference October 18, 2013

Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: [email protected] 16

Psychology/Psychiatry (12)

Kidney Disease (57)

Respiratory Medicine (12)

Cancer (15)

Eye Disease (14)

Radiology/Imaging (19)

Metabolism (18)

Rehabilitation (20)

Sleep (50)

Nutrition (12)

Neurology/Neuroscience (93)

Gerontology/Aging (148)

Science - general (41)

Cardiovascular Disease (443)

Genetics (173)

More Than One FOS (242)

Participated in Data Sharing (130)

Attended 2007 Workshop (31)

Attended 2005 Workshop (18)

Co-Investigator (102)

Principal Investigator (17)

Even

ts(#

of A

utho

rs)

-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0

CHS Collaboration Estimates: Sociality, by Study Role of Author and Field of Science

Model Estimates (log odds)

More collaborativeLess collaborative

Meaning: Principal and Co-Investigators were more collaborative than other in-vestigators (baseline).

E.g., an author pair with 1 PI was about 3% more likely to collabo-rate than an author pair with no PIs or Co-Is, all else equal

Meaning: Authors associated with different fields of science exhibited varying degrees of col-laborativeness.(baseline: Medicine/Public Health - general)

E.g., an author pair with 1 Genetics author and 1 Med/Pub Health author was about 2% more likely to collabo-rate than an author pair with 1 Car-diovascular Disease author and 1 Med/Pub Health author, all else equal.

Meaning: Authors who attended work-shops or participated in data sharing were more collaborative.

Page 17: Presented at the American Evaluation Association Annual Conference October 18, 2013

Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: [email protected] 17

More Than One FOS (242)Cardiovascular Disease (443)Medicine/Public Health (315)

Gerontology/Aging (148)Genetics (173)

Neurology/Neuroscience (93)Science - general (41)

Kidney Disease (57)Rehabilitation (20)

Sleep (50)Metabolism (18)

Nutrition (12)Radiology/Imaging (19)

Eye Disease (14)Respiratory Medicine (12)

Psychology/Psychiatry (12)Cancer (15)

Participated in Data Sharing (130)Attended 2007 Workshop (31)Attended 2005 Workshop (18)

Neither (1625)Co-Investigator (102)

Principal Investigator (17)

Even

ts(#

of A

utho

rs)

-7.0 -6.0 -5.0 -4.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0

CHS Collaboration Estimates: Assortative Mixing, by Study Role of Author and Field of Science

Model Estimates (log odds)More likely to collaborate within typeLess likely to collaborate within

type

Message: Authors did not demonstrate a par-ticular preference for collaborating with au-thors of the same role type.

E.g., a data sharing participant was 21% more likely to collaborate with another data sharing participant than with a non-data sharing partic -ipant all else equal.

Message: Authors who attended workshops or participated in data sharing were more likely to collaborate with authors that also attended workshops or did data sharing.

Message: Authors from all fields of science tended to collaborate with authors in the same field of science, though to varying degrees.

E.g., a Neurology author was 40% more likely to collaborate with another Neurology author than with a non-Neurology author, all else equal.A Cardiovascular Disease (CD) author was 9% more likely to collaborate with another CD au-thor than with a non-CD author, all else equal.

Page 18: Presented at the American Evaluation Association Annual Conference October 18, 2013

Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: [email protected] 18

Difference in Years in Network (Range: 0 - 20)

Combined Years in Network (Range: 2 - 42)

Difference in Pubs per Year (Range: 0 - 8.3)

Combined Pubs per Year (Range: 0 - 16.6)

-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5

CHS Collaboration Estimates, by Publication Productivity and Network Seniority

Model Estimates (log odds)More likely to collaborateLess likely to collaborate

E.g., a pair of authors who combined to publish 5 articles per year on average was 7% more likely to collaborate than a pair of authors who combined to publish 4 articles per year on average, all else equal.

E.g., an author pair in which one author published 3 articles per year and the other published 2 articles per year was 2% less likely to collabo-rate than an author pair in which both authors published 2.5 articles per year, all else equal.

95% confidence inter-val

Meaning: The more productive an author pair was, the more likely they collaborated.

Meaning: Productive authors tended not to collaborate with unproductive authors.

Meaning: The more experience an author pair had, the more likely they collaborated.

Meaning: Experienced authors tended not to collaborate with inexperienced authors.

Page 19: Presented at the American Evaluation Association Annual Conference October 18, 2013

Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: [email protected] 19

Summary - I

• Publication productivity and seniority were associated with more collaboration (co-authorship) in general (sociality)– However, highly productive and high seniority authors

preferred to collaborate with other highly productive and high seniority authors (assortative mixing)

• PIs and Co-Investigators tended to be more collaborative in general than other researchers (sociality)– There was no evidence that PIs and Co-Investigators preferred

to collaborate exclusively with one another (no assortative mixing)

• Some fields of science were more collaborative in general than others (sociality)– Fields that were more peripheral in subject matter to CHS

tended to be less collaborative

Page 20: Presented at the American Evaluation Association Annual Conference October 18, 2013

Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: [email protected] 20

Summary - II

• All fields of science preferred to collaborate within their own field, though to varying degrees (assortative mixing)– Fields that were more peripheral in subject matter to CHS were

more likely to collaborate exclusively with others in their own field

• NHLBI events designed to encourage collaboration were effective (sociality)– Invitees who attended workshops and who participated in

NHLBI’s data sharing program tended to be more collaborative in general than similar authors who did not

Page 21: Presented at the American Evaluation Association Annual Conference October 18, 2013

Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: [email protected] 21

Next Steps

• Greater knowledge of author characteristics would enhance the ability of ERGMs to identify the main drivers of collaboration

• This methodology could be fruitfully combined with information on which co-authorships had greater impact (e.g., citation information)– ERGMs could estimate the factors associated with highly cited

co-authorships– Results could suggest potential collaborating partners with a

high likelihood of producing impactful publications– Knowledge gained could be integrated into design of new

studies, building in or encouraging characteristics that would promote collaboration

Page 22: Presented at the American Evaluation Association Annual Conference October 18, 2013

Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: [email protected] 22

Questions?

Page 23: Presented at the American Evaluation Association Annual Conference October 18, 2013

Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: [email protected] 23

Supplemental Slides

Page 24: Presented at the American Evaluation Association Annual Conference October 18, 2013

Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: [email protected] 24

Strong Heart Study Background

• Strong Heart Study (SHS): Started in 1988 to estimate cardiovascular disease (CVD) mortality and morbidity, and prevalence of known and suspected CVD risk factors in American Indians– Includes 13 American Indian tribes and communities

• Phoenix, Arizona• Southwestern Oklahoma• Western and central North and South Dakota

– Required participants to be 45-74 years old at entry– Includes questionnaires, clinic exams, laboratory exams, and

ongoing participant follow-up to identify clinical events• Strong Heart Family Study launched in 1998, includes family

members of original participants to add genetic risk factors• Largest multi-center epidemiologic study of American

Indians

Page 25: Presented at the American Evaluation Association Annual Conference October 18, 2013

Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: [email protected] 25

Difference in Years in Network (Range: 0 - 21)

Combined Years in Network (Range: 2 - 44)

Difference in Pubs per Year (Range: 0 - 7.4)

Combined Pubs per Year (Range: 0 - 14.2)

-2.00 -1.50 -1.00 -0.50 0.00 0.50 1.00 1.50 2.00

SHS Collaboration Estimates, by Publication Output and Network Seniority

Model Estimates (log odds)More likely to collaborateLess likely to collaborate

E.g., a pair of authors who combined to publish 4 articles per year on average was 54% more likely to collaborate than a pair of authors who combined to publish 2 articles per year on average, all else equal.

E.g., an author pair in which one auhor published 3 articles per year and the other published 1 article per year was 27% less likely to collab-orate than an author pair in which both authors published 2 articles per year, all else equal.

95% confidence intervalMeaning: The more prolific an author pair

was, the more likely they collaborated.

Meaning: Prolific authors tended not to collaborate with unprolific authors.

Meaning: The more experience an author pair had between them, the more likely they collabo-rated.

Meaning: Experienced authors tended not to collaborate with inexperienced authors.

Page 26: Presented at the American Evaluation Association Annual Conference October 18, 2013

Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: [email protected] 26

Sleep (48)

Other (< 10)

Rehabilitation (19)

Diabetes (27)

Cardiovascular Disease (83)

Kidney Disease (10)

Genetics (64)

Nutrition (10)

More Than One FOS (59)

Co-Investigator (87)

Principal Investigator (7)

Fiel

d of

Scie

nce

(# o

f Aut

hors

)Ro

le in

Stu

dy (#

of A

utho

rs)

-1.50 -1.00 -0.50 0.00 0.50 1.00 1.50

SHS Collaboration Estimates: Sociality, by Study Role of Author and Field of Science (FOS)

Model Estimates (log odds)More collaborativeLess collaborative

Meaning: Principal and Co-Investigators were more collaborative than those who were nei-ther (baseline).

E.g., an author pair with 1 PI was about 4% more likely to collaborate than an author pair with 0 PIs, all else equal.

Meaning: Authors associated with different fields of science exhibited varying degrees of collabo-rativeness.(baseline: Medicine/Public Health - general)

E.g., an author pair with 1 Genetics author and 1 Med/Pub Health author was about 2% more likely to collaborate than an au-thor pair with 2 Med/Pub Health authors, all else equal.

Page 27: Presented at the American Evaluation Association Annual Conference October 18, 2013

Data provided by the Division of Statistical Analysis & Reporting (DSAR)/OPAC/OER Contact: [email protected] 27

More Than One FOS (59)

Cardiovascular Disease (83)

Genetics (64)

Diabetes (27)

Kidney Disease (10)

Sleep (48)

Rehabilitation (19)

Medicine/Public Health (213)

Nutrition (10)

Other (< 10)

State

Institution

Region

Neither (476)

Co-Investigator (87)

Principal Investigator (7)

Fiel

d of

Scie

nce

(# o

f Aut

hors

)Ge

ogra

phy

Role

in S

tudy

(# o

f Aut

hors

)

-5.00 -4.00 -3.00 -2.00 -1.00 0.00 1.00 2.00 3.00 4.00 5.00

SHS Collaboration Estimates: Assortative Mixing, by Author Role and Field of Science

Model Estimates (log odds)More likely to collaborate within typeLess likely to collaborate within

type

Message: Co-Investigators were more likely to collaborate with authors out-side their role type. Ns were more likely to collaborate with other Ns.

E.g., a Co-Investigator was 2% less likely to col -laborate with another Co-Investigator than with another type of investigator, all else equal.

Message: Authors were more likely to collaborate with authors in the same region or at the same in-stitution. Geography still matters!

Message: Authors from all fields of science tended to collaborate with authors in the same field of science, though to varying degrees.

E.g., a Nutrition author was 69% more likely to collaborate with another Nutrition author than with a non-Nutrition author, all else equal.A Cardiovasular Disease (CD) author was 27% more likely to collaborate with another CD au-thor than with a non-CD author, all else equal.