© tefko saracevic11 bibliometrics tefko saracevic rutgers university tefko

Post on 22-Dec-2015

219 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

© Tefko Saracevic 11

BIBLIOMETRICS

Tefko SaracevicRutgers Universityhttp://www.scils.rutgers.edu/~tefko

© Tefko Saracevic 2

What is?

“… all studies which seek to quantify processes of written communication.”

Pritchard

“… the quantitative treatment of the propertiesd of recorded discourse and behavior pertaining to it.”

Fairthorne

Recorded communication - ‘literature’->quantitative methods

© Tefko Saracevic 3

Alan Pritchard 1969

Coined the term "bibliometrics""the application of mathematics and

statistical methods to books and other media of communication“

Journal of Documentation (1969) 25(4):348-349

© Tefko Saracevic 4

and other related metrics …

Also used to study broader than books, articles …Scientometrics

covering science in general, not just publications

Infometrics all information objects

Webmetrics or cybermetrics web connections, manifestations using bibliometric techniques to study the

relationship or properties of different sites on the web

© Tefko Saracevic 5

Concepts

Basic (primitive) concepts:1. Subject2. Recorded communication ->

document, information object3. Subject literatureBibliometrics related to:

science of sciencesociology of science - numerical methods

© Tefko Saracevic 6

Literature studies

Qualitativeoften in humanities, librarianship

Quantitativebibliometrics

Mixed

© Tefko Saracevic 7

Reasons for quantitative studies of literature

Analysis of structure and dynamicssearch for regularities - predictions

possibleUnderstanding of patterns

“order out of documentary chaos”verification of models, assumptions

Rationale for policies & design

© Tefko Saracevic 8

Why quantitative studies?

Qualitative methods often depend on assertions. ‘authoritative’ statements, anecdotal evidence

Science searches for regularitiesSuccess of statistical methods in social

sciencesNeed for justification & basis for decisionsSomething can be counted - irresistible

© Tefko Saracevic 9

Application in ...

History of scienceSociology of scienceScience policy; resource allocationLibrary selection, weeding, policiesInformation organizationInformation management

utilization

© Tefko Saracevic 10

Historical note

Bibliometrics long precedes information science

But found intellectual home in information sciencestudy of a basic phenomenon - literature

It is not ‘hot’ lately, but still produces very interesting results

Branched out into web studies (web is a “literature” as well)

© Tefko Saracevic 11

What studied?

Governed by data available in documents or information resources in general - that what can be countedauthor(s)origin

organization, country, language

source journal, publisher, patent …

© Tefko Saracevic 12

what … more

contents text, parts of text, subject, classes

representationcitations

to a document, in a document, co-citationutilization

circulation, various useslinksany other quantifiable attribute

© Tefko Saracevic 13

Tools

Science Citation IndexCompilation of variables from

journals in a subjectUse dataPublication counts from indexes, or

other data basesWeb structures, links

© Tefko Saracevic 14

Variable: authors

number in a subject, field, institution, countrygrowth correlation with indicators like GNP, energy etc.productivity e.g. Lotka’s lawcollaboration - co-authorship, associated networksdynamics - productive life, transcience, epidemicspapers/author in a subjectmapping

© Tefko Saracevic 15

Variable: origin

Rates of production, size, growth bycountry, institution, language, subject

Comparison between theseCorrelation with economic & other

indicators

© Tefko Saracevic 16

Variable: sources

Concentration most often on journalsGrowth, dynamics, numbers

information explosion - exponential lawstime movements, life cycles

Scatter - quantity/yield distributionBradford’s law

Various distributions by subject, language, country

© Tefko Saracevic 17

Variable: contents

Analysis of textsdistribution of words – Zipf’s lawwords, phrases in various partssubject analysis, classificationco-word analysis

© Tefko Saracevic 18

Variable: representation

frequency of use of index terms, classesdistribution laws - key terms where?thesaurus structure

© Tefko Saracevic 19

Variable: citations

Studied a lot; many pragmatic resultsbase for citation indexes, web of science,

impact factors, co-citation studies etcDerived:

number of references in articlesnumber of citations to articles

research front; citation classics

bibliographic coup[ling

© Tefko Saracevic 20

citations … more

co-citations author connections, subject structure,

networks, maps

centrality of authors, papers

validation with qualitative methodsimpact

© Tefko Saracevic 21

Variable: utilization

frequencydistribution of requests for sources,

titles e.g. 20/80 law

relevance judgement distributionscirculation patternsuse patterns

© Tefko Saracevic 22

Variable: links

Development of link-based metricsin-links, out-links

Web structureWeb page depth; updatePageRank vs quality

© Tefko Saracevic 23

Examples from classic studies

Comparative publications over centuriesNumber of journals founded over timeNumber of abstracts published over

timeNational share of abstracts in chemistryNational scientific size vs. economy sizeBibliographic coupling and co-citationWeb structures, links

© Tefko Saracevic 24

Examples of laws & methods

Lotka’s lawBradford’s lawZipf’s lawImpact factorCitation structuresCo-citation structures

© Tefko Saracevic 25

Alfred J. Lotka 1926

Statistics—the frequency distribution of scientific productivity

Purpose: to "determine, if possible, the part which men of different calibre contribute to the progress of science“Looked at Chemical Abstracts Index, then

Geschichtstafeln der Physik J. Washington Acad. Sci. 16:317-325

© Tefko Saracevic 26

Lotka’s law: xn • y = C

The total number of authors y in a given subject, each producing x publications, is inversely proportional to some exponential function n of x.

Where: x = number of publications y = no. of authors credited with x

publications n = constant (equals 2 for scientific

subjects) C = constant

inverse square law of scientific productivity

© Tefko Saracevic 27

1 publ. 2 publ. 3 publ. 4 publ.

Lotka's Law - scientific publications

xn • y = C

No

. of

auth

ors

© Tefko Saracevic 28

Samuel Clement Bradford 1934, 1948

Distribution of quantity vs yield of sources of information on specific subjects he studied journals as sources, but applicable to other what journals produce how many articles in a subject

and how are they distributed? or How are articles in a subject scattered across journals?

Purpose: to develop a method for identification of the most productive journals in a subject & deal with what he called “documentary chaos”

First published in: Engineering (1934) 137:85-86, then in his book Documentation, (1948)

© Tefko Saracevic 29

Bradford’s law

"If scientific journals are arranged in order of decreasing productivity of articles on a given subject, they may be divided into a nucleus of periodicals more particularly devoted to the subject and several groups or zones containing the same number of articles as the nucleus, when the numbers of periodicals in the nucleus and succeeding zones will be as a : n : n2 : n3 …"

© Tefko Saracevic 30

Bradford's Law of Scattering – an idealized example

No. of source journals

121224

10755

No. of articles per source

60353025986543

Total no. of articles

60703050183260352015

9

27

130

130

1303

© Tefko Saracevic 31

Bradford's Law of Scattering – zones

3 sources 130 articles

9 sources 9 sources 130 articles130 articles

27 sources 27 sources 130 articles130 articles Garfield hypothesis

nucleus

© Tefko Saracevic 32

George Kingsley Zipf 1935, 1949

The psycho-biology of language: an introduction to dynamic philology (1935)

Human behavior and the principle of least effort: An introduction to human ecology (1949)

Looked, among others, at frequency distributions of words in given textscounted distribution in James Joyces’ Ulysses

Provided an explanation as to why the found distributions happen:

Principle of least effort

© Tefko Saracevic 33

Zipf’s law: r • f = c

Where:r = rank (in terms of frequency)f = frequency (no. of times the given word is used in the text)c = constant for the given text

For a given text the rank of a word multiplied by the frequency is a constant

Works well for high frequency words, not so well for low – thus a number of modifications

© Tefko Saracevic 34

Charles F. Gosnell 1944 Obsolescence

He studied obsolescence of books in academic libraries via their use

• College Res. Libr. (1994) 5:115-125

But this was extended to study of articles via citations, and other sources

Age of citations in articles in a subject:half life – half of the citations are x year old etc

different subjects have very different half-lives

© Tefko Saracevic 35

Curve of obsolescence

Nu

mb

er o

f u

sers

Age at time of use

© Tefko Saracevic 36

Eugene Garfield 1955

Focused on scientific & scholarly communication based on citations

• Science (1995) 122:108-111

Founded Institute for Scientific Information (ISI)major proeduct now ISI Web of Knowledge

Impact factor for journals, based on how much is a journal cited

Mapping of a literature in a subjectCitation indexes/web of knowledge

MAJOR resources in bibliometric studies

© Tefko Saracevic 37

citedarticle

Citation matrix

citedarticle

citedarticle

article

citingarticle

citingarticle

citingarticle

citingarticle

citingarticle

citingarticle

citingarticle

© Tefko Saracevic 38

citedarticle

Science Citation Index

citedarticle

citedarticle

article

citingarticle

citingarticle

citingarticle

citingarticle

citingarticle

citingarticle

citingarticle

Association-of-ideas index

© Tefko Saracevic 39

Co-citation analysis

Articles that cite the same article are likely to both be of interest to the reader of the cited article

article

citingarticle

citingarticle

These two articles are likely to be related

© Tefko Saracevic 40

Impact factor (IF)

number of citations received in current year by papers published in the journal in the previous two yearsdivided by

number of papers published in the journal in the previous two years

IF has become over time a crucial indicator of journal quality andgiven ISI a monopoly position in the evaluation of

journal qualityReported in Journal Citation Reports (1976-)

© Tefko Saracevic 41

Garfield’s HistCite

“Bibiliographic Analysis and Visualization Software”

Provides citation statistics & graphs for people, journals, institutions …various citations scores, no. of cited references

in articles … various graphs with connections

Example: articles and authors for JASIST (and predecessor names) for 1956-2004includes citations to authors

© Tefko Saracevic 42

Conclusion

Bibliometrics, & related scientometrics, infometrics, webmetrics provide insight into a number of properties of information objectssome general, predictive “laws”

formulatedstructures have been exposed, graphedmyriad data collected & analyzed

A good area for research!

© Tefko Saracevic 43

Sources used in making this presentation– among others

Ruth Palmquist BibliometricsDonna Bair-Mundy Boolean, bibliometrics, Boolean, bibliometrics,

and beyondand beyond Short set of bibliometric exercises by J. DownieShort set of bibliometric exercises by J. Downie

http://people.lis.uiuc.edu/~jdownie/http://people.lis.uiuc.edu/~jdownie/biblio/biblio/

top related