how many citations are there in the data citation index?
DESCRIPTION
Presentation of a descriptive anaysis of the DCI from Thomson Reuters by Daniel Torres-Salinas, Evaristo Jiménez-Contreras and Nicolás Robinson-García at the STI Conference held in Leiden (The Netherlands) 3-5 september 2014 sti2014.cwts.nlTRANSCRIPT
How many citations are there in
the Data Citation Index?
D Torres-Salinas, E Jiménez-Contreras & N Robinson-Garcia
EC3 Research Group
EC3Metrics
University of Granada
19th International Conference on Science and Technology Indicators
3-5 September 2014 Leiden, The Netherlands
Outline
Rationale
Data and citations
Data Citation Index
Discussion
Rationale
“The data deluge has arrived.[…] If the rewards of the
data deluge are to be reaped, then researchers who
produce those data must share them”
Borgman, 2012
Peng, 2011
Rationale
“The ‘dirty little secret’ behind the promotion of data
sharing is that not much sharing may be taking
place”
Borgman, 2012
“The lack of recognition incentives is regarded as a
crucial and unresolved obstacle to establishing a
data sharing culture”
Piwowar et al., 2008
Data and citations
“A consistent, rigorous approach to data citation is
lacking”
Parsons et al., 2010
What do we cite?
Original study <- Piwowar et al.
Data papers <- Scientific Data
Data sets <- Data Citation Index
Data Citation Index
GENERAL DESCRIPTION
Multidisciplinary database launched in 2012
It indexes data repositories from all scientific fields
along with citation data associated to them
Follows an evaluation and selection process at the
level of repository based on: subject, editorial content
and geographic origin and scope
Data Citation Index
PUBLICATION TYPES
Data repositories a database comprising datasets and data studies which stores and provides access to the raw data
Datasets a single or coherent set of data or a data file provided by the repository, as part of a collection, data study or experiment.
Data studies description of studies or experiments held in repositories with the associated data which have been used in the data study.
Data Citation Index
DATA STUDY EXAMPLES
Data Citation Index
DATA SET EXAMPLES
Data Citation Index
MATERIAL AND METHODS
Data retrieval in May-June 2013
Analysis by areas: Science, Engineering &
Technology, Social Sciences and Arts & Humanities
arXiv:1306.6584
Data Citation Index
GENERAL INDICATORS
All Document Types Datasets Data studies
Total Citations 404,211 294,051 106,895
Total Records 2,623,528 2,468,736 154,674
Uncited Records 2,311,553 2,185,062 126,428
% Uncited 88.11 88.51 81.74
Citation Average 0.15 0.12 0.69
Standard Desviation 3.06 0.36 9.56
Data Citation Index
REPOSITORIES BY AREA
Engineering &
Technology
1
Science
67
Social Sciences
19
Humanities &
Arts
9
Datasets Citations Data studies Citations
Engineering & Technology 1545 890 240 26
Humanities & Arts 44588 1 6847 20459
Science 2004449 293193 114338 26189
Social Sciences 424952 7 37855 69659
Data Citation Index
RECORDS AND CITATIONS BY AREA AND TYPE
Data Citation Index
TOP 10 CATEGORIES HIGHLY CITED FOR DATASETS
0.00
0.50
1.00
1.50
0%
10%
20%
30%
40%
50%
Cry
stal
logr
aph
y
Bio
che
mis
try
& M
ol.
Bio
logy
Ge
ne
tics
& H
ere
dit
y
Ge
osc
ien
ces
Ph
ysic
s, A
tom
ic,
Mo
lecu
lar
Evo
luti
on
ary
Bio
logy
Ce
ll B
iolo
gy
Spe
ctro
sco
py
Me
dic
al L
abo
rato
ry T
ech
.
Nan
osc
ien
ce &
Nan
ote
ch.
Cit
atio
n a
vera
ge a
nd
stan
dar
d d
evi
atio
n
% o
f to
tal c
itat
ion
s fr
om
DC
I 47%
23%
16%
Data Citation Index
TOP 10 CATEGORIES HIGHLY CITED FOR DATA STUDIES
0
5
10
15
20
25
30
35
0%
10%
20%
30%So
cio
logy
De
mo
grap
hy
Eco
no
mic
s
Bu
sin
ess
Po
litic
al S
cie
nce
Bio
che
mis
try
& M
ol.
Bio
logy
Ge
ne
tics
& H
ere
dit
y
He
alth
Car
e S
cie
nce
s
Cri
min
olo
gy &
Pe
no
logy
Fam
ily S
tud
ies
Cit
atio
n a
vera
ge a
nd
stan
dar
d d
evi
atio
n
% o
f to
tal c
itat
ion
s fr
om
DC
I
30%
Data Citation Index
MAIN REPOSITORIES IN THE DCI, CITATIONS & RECORDS
0
20000
40000
60000
80000
100000
120000
140000
160000
0 100000 200000 300000 400000 500000 600000 700000
MiRBase
Gene Expression
UniProt knowledgebase
Crystallography Open Database
U.S. Census Bureau TIGER
Protein Data Bank
ArrayExpress Archive
PANGEA
UK DATA ARCHIVE
Inter-university Consortium for Political and Social Research
Animal QTL Database
Total
Number of citations in the Data Citation Index
Total Number of records indexed the Data Citation Index
Size= Total CitationsPie Chart= % of citations
LEGEND
Discussion
I. High rate of uncitedness (88%)
II. Biased towards the Science
III. Data sets vs. Data studies (Two Cultures?)
IV. Too soon or too presumptious?
THANK YOU D Torres-Salinas [email protected]
N Robinson-Garcia [email protected]
E Jiménez Contreras [email protected]
19th International Conference on Science and Technology Indicators
3-5 September 2014 Leiden, The Netherlands