02 scientific information sources

40
UNIVERSITAT DE BARCELONA Facultat de Biblioteconomia i Documentació Metodologia de la recerca Professor: Ángel Borrego Scientific Information Sources

Upload: tahereh-dehdarirad

Post on 06-Apr-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

8/2/2019 02 Scientific Information Sources

http://slidepdf.com/reader/full/02-scientific-information-sources 1/41

UNIVERSITAT DE BARCELONAFacultat de Biblioteconomia i Documentació

Metodologia de la recerca

Professor: Ángel Borrego

Scientific Information Sources

8/2/2019 02 Scientific Information Sources

http://slidepdf.com/reader/full/02-scientific-information-sources 2/41

Contents

• From author to reader: the scholarlycommunication process

• Index & Abstract (I&A) databases

• Assessment of I&A databases

• Searching I&A databases

8/2/2019 02 Scientific Information Sources

http://slidepdf.com/reader/full/02-scientific-information-sources 3/41

Scholarly communication chain

Authors (scientists)

Journal Editors and Referees

Journals

I&A databases

Librarians

End users

8/2/2019 02 Scientific Information Sources

http://slidepdf.com/reader/full/02-scientific-information-sources 4/41

Authors (researchers)

Scientists are the first link in the scholarly communicationchain. They create new knowledge and describe it inarticles, books, patents, etc.

8/2/2019 02 Scientific Information Sources

http://slidepdf.com/reader/full/02-scientific-information-sources 5/41

Publishing an article

Source: Weller, 2000

8/2/2019 02 Scientific Information Sources

http://slidepdf.com/reader/full/02-scientific-information-sources 6/41

Editors and publishers

• Scientific editor: an expert in the book or journal’s field whomanages manuscripts’ review.

• Referee or reviewer (usually two or three): experts in the fieldwho (blindly) evaluate the work for the editor, noting

weaknesses or problems along with suggestions forimprovement, and including an explicit recommendation ofwhat to do with the manuscript (accept or reject).

• Publisher: some journals are published by non profit scientificsocieties or universities; other journals are published bycommercial publishers (Elsevier, Emerald, Springer, Wiley…)that expect economic revenues, especially through librarysubscriptions.

8/2/2019 02 Scientific Information Sources

http://slidepdf.com/reader/full/02-scientific-information-sources 7/41

Scholarly journals

A periodical publication reporting new research in the form of: – Articles: complete descriptions of current original research findings. – Review articles: accumulate the results of many articles on a topic into a

coherent narrative about the state of the art in that discipline. – Letters (not to be confused with the letters to the editor) or short

communications: short descriptions of important current research findings.

In 2004, Carol Tenopir (Library Journal , 2/1/2004) estimated that there were about43,500 active academic journals.

8/2/2019 02 Scientific Information Sources

http://slidepdf.com/reader/full/02-scientific-information-sources 8/41

I&A databases

• Databases are produced and/or hosted by publicadministrations or private companies.

• These organizations select the most important journals

in a field and analyse them in order to create Index &Abstract (I&A) databases.

• These databases usually offer additional services suchas setting user’s profiles, email alerts, etc.

• Hosts commercialize databases from several producersand provide users with engines to search them.

8/2/2019 02 Scientific Information Sources

http://slidepdf.com/reader/full/02-scientific-information-sources 9/41

Where to search for scientificinformation?

• Bibliographic (I&A) databases: produced and distributedby public administrations or private companies: – Instituto de Estudios Documentales sobre Ciencia y Tecnología

(cindoc.csic.es) – National Library of Medicine (www.nlm.nih.gov) – Dialog (dialog.com)

• Journal gateways: – Elsevier ScienceDirect (sciencedirect.com) – EmeraldInsight (emeraldinsight.com)

• Internet search engines: – Google Scholar (scholar.google.com) – Scirus (scirus.com)

8/2/2019 02 Scientific Information Sources

http://slidepdf.com/reader/full/02-scientific-information-sources 10/41

Librarians

• They are intermediaries between information and endusers: – Know the best information sources in any given field. – Have the ability to transform a user’s information

need into a search equation that can be addressed toan automatic system.

• Tasks: – Exploit information sources. – Create new information sources. – Train users in the use of these sources.

8/2/2019 02 Scientific Information Sources

http://slidepdf.com/reader/full/02-scientific-information-sources 11/41

Librarians

• They are intermediaries between information and endusers: – Know the best information sources in any given field. – Have the ability to transform an information need into

a search equation that can be addressed to anautomatic system.

• Tasks: – Exploit information sources. – Create new information sources. – Train users in the use of these sources.

Are you sure about

this?

8/2/2019 02 Scientific Information Sources

http://slidepdf.com/reader/full/02-scientific-information-sources 12/41

Where do scientists search forinformation?

Rowlands & Nicholas, 2005

8/2/2019 02 Scientific Information Sources

http://slidepdf.com/reader/full/02-scientific-information-sources 13/41

Fry et al . , 20 09

Where???

Schonfeld i Housew right, 20 0 9

8/2/2019 02 Scientific Information Sources

http://slidepdf.com/reader/full/02-scientific-information-sources 14/41

End users

• The main users of scientific information are scientists―i.e. authors― and some professionals―doctors, forinstance. Articles in scientific journals are written byscientists for scientists.

• There is also an ‘education market’―i.e. handbooks andmanuals that explain the basics of each discipline foreducational purposes.

• Finally, there is also a market for “popular science”including books, journals, mass media, museums, etc.

8/2/2019 02 Scientific Information Sources

http://slidepdf.com/reader/full/02-scientific-information-sources 15/41

In summary

Information is the main

input and output of science

 ─ 

8/2/2019 02 Scientific Information Sources

http://slidepdf.com/reader/full/02-scientific-information-sources 16/41

Contents

• From author to reader: scholarlycommunication process

• I&A databases: concept

• Assessing I&A databases

• Searching I&A databases

8/2/2019 02 Scientific Information Sources

http://slidepdf.com/reader/full/02-scientific-information-sources 17/41

Databases

• A database is “an organized collection of data, usually indigital form so that its contents can easily be accessed,managed, and updated....”

• … but you already know what a database is!

8/2/2019 02 Scientific Information Sources

http://slidepdf.com/reader/full/02-scientific-information-sources 18/41

Access to scientific information

1665 2010200019801960

Printindexes

Access todatabasesthrough

telephonelines

Databaseson CD-ROM

Webonlineaccess

1840

Firstscientific journals

8/2/2019 02 Scientific Information Sources

http://slidepdf.com/reader/full/02-scientific-information-sources 19/41

The database market

Year Databases Producers Hosts

1980 411 269 71

1985 2.247 1.316 414

1990 3.943 1.950 645

1994 5.307 2.220 812

1997 10.000 3.400 1.800

2007 20.000 n.d. n.d.

Large et al ., p. 46

8/2/2019 02 Scientific Information Sources

http://slidepdf.com/reader/full/02-scientific-information-sources 20/41

Gale Directory of Databases 

• Volume 1: online databases

Profiles nearly 11,000 online databases madepublicly available from the producer or an onlineservice

• Volume 2: CD-ROM, DVD, etc.Profiles more than 8,000 database products offeredin portable from or through batch processing

• “In its 34th edition (2011), Gale Directory of 

Databases contains contact and descriptiveinformation on nearly 19,000 databases and over3,300 producers, online services, andvendors/distributors of database products.”

8/2/2019 02 Scientific Information Sources

http://slidepdf.com/reader/full/02-scientific-information-sources 21/41

Gale Directory of Databases (2) 

• Product descriptions.

• Database producers: contact information for databaseproducers and a list of products they produce.

• Vendors and distributors: contact information for vendorsand distributors, conditions of use, and a list of productsthey offer.

• Geographic index: list producers and vendors/distributorsby country.

• Subject index: classifies products within 1,800 subjectterms.

• Master index: lists all names in a single alphabeticsequence.

2011 edition

8/2/2019 02 Scientific Information Sources

http://slidepdf.com/reader/full/02-scientific-information-sources 22/41

Gale Directory of Databases (3) 

8/2/2019 02 Scientific Information Sources

http://slidepdf.com/reader/full/02-scientific-information-sources 23/41

Contents

• From author to reader: scholarlycommunication process

• I&A databases: concept

• Assessing I&A databases

• Searching I&A databases

8/2/2019 02 Scientific Information Sources

http://slidepdf.com/reader/full/02-scientific-information-sources 24/41

Assessment criteria

• Contents: – Coverage, accuracy, consistency, updating

• Information retrieval:

 – Interface and search options

• Management: – Price, hardware and software requirements,

authentication, information provided by the producer,integration with other library products, support, etc.

8/2/2019 02 Scientific Information Sources

http://slidepdf.com/reader/full/02-scientific-information-sources 25/41

Database contents

• Coverage: – Topics, source types, chronological, geographical, languages – Local availability of the indexed sources

• Accuracy: – Grammar and typing mistakes. – Duplicate records.

• Consistency: – Formal description: names of authors and journals – Subject description: indexing and classification

• Updating: – Growth in the number of records – Delay in the introduction of records since publication

8/2/2019 02 Scientific Information Sources

http://slidepdf.com/reader/full/02-scientific-information-sources 26/41

Interface and search options

• Search page: – Database structure and searchable fields – Simple / advanced / command search – Operators (Boolean, proximity, wildcards, etc.) – Field indexes and thesaurus – Search in a specific database, search history, multilingual interface, etc.

• Results page:

 – Visualisation: format and number of records – Ranking criteria – Select and manage records – Record clustering – Similar records – Refine search – Information on errors (0 results).

• Record visualisation: – Record formats – Navigation between records and linked fields – Highlight of search terms in records

• Additional pages: database description, structure, help, etc.

8/2/2019 02 Scientific Information Sources

http://slidepdf.com/reader/full/02-scientific-information-sources 27/41

Database management

• Price and payment options

• Hardware and software requirements

• Authentication (password  / IP / federated authentication)

• User’s manuals, online help, languages, etc.• Integration with other library products (metasearch engines, reference

management software, other databases from the same host).

• Library support

• Access (CD / online)*****************************************************

• And listen to your users: log analysis, surveys, observation…!!

8/2/2019 02 Scientific Information Sources

http://slidepdf.com/reader/full/02-scientific-information-sources 28/41

“The JISC Academic DatabaseAssessment Tool (ADAT) aims to

help libraries to make informeddecisions about future subscriptionsto bibliographic databases.”

http://www.jisc-adat.com

8/2/2019 02 Scientific Information Sources

http://slidepdf.com/reader/full/02-scientific-information-sources 29/41

Precision and recall

Relevant documents retrieved (a)• Precision = ─────────────────────── X 100

Retrieved documents (a + b)

Relevant documents retrieved (a)• Recall = ────────────────────────────── X 100

Relevant documents in the database (a + c)

Relevant Non- relevant Total

Retrieved a b (noise) a + b

Non-retr ieved c (silence) d c + d

Total a + c b + d a + b + c + d

8/2/2019 02 Scientific Information Sources

http://slidepdf.com/reader/full/02-scientific-information-sources 30/41

Drawbacksof precision and recall

• What is a relevant document?

• We assume that relevance is binary.

• Different users may require of different levels of precision andrecall.

• There is an inverse relationship between precision and recall.

• Recall is just an estimate.

• If the system ranks documents by relevance, then precisionand recall vary as the user examines the retrieved records.

8/2/2019 02 Scientific Information Sources

http://slidepdf.com/reader/full/02-scientific-information-sources 31/41

Example

• Relevant documents in the database for query Q1:D3, D5, D9, D25, D39, D44, D56, D71, D89, D123

• Retrieved documents for query Q1 ranked by relevance

(relevant documents are dotted):

1. D123 • 6. D9 • 11. D38

2. D84 7. D511 12. D48

3. D56 • 8. D129 13. D250

4. D6 9. D187 14. D113

5. D8 10. D25 • 15. D3 •

8/2/2019 02 Scientific Information Sources

http://slidepdf.com/reader/full/02-scientific-information-sources 32/41

Precision at differentlevels of recall

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

Recall

      P     r     e     c      i     s

      i     o     n

8/2/2019 02 Scientific Information Sources

http://slidepdf.com/reader/full/02-scientific-information-sources 33/41

Contents

• From author to reader: scholarlycommunication process

• I&A databases: concept

• Assessing I&A databases

• Searching I&A databases

8/2/2019 02 Scientific Information Sources

http://slidepdf.com/reader/full/02-scientific-information-sources 34/41

Query process

Contents

Representation

Organization

System User

Need

Representation

SearchMatch

Retrievedrecords

8/2/2019 02 Scientific Information Sources

http://slidepdf.com/reader/full/02-scientific-information-sources 35/41

Search options

• Truncation and wildcards

• Natural vs. controlled vocabulary

• Boolean operators

• Proximity operators

• Search limits: date, type of source, language, etc.

+ Recall

+ Precision

8/2/2019 02 Scientific Information Sources

http://slidepdf.com/reader/full/02-scientific-information-sources 36/41

Drawbacks of the Boolean model

• It does not matter whether there is an occurrence of thesearch term in the document or a hundred.

• It does not matter whether a document complies with all therequirements of an “or” search.

• Partial coincidence (for instance, complying with almost all the“and” conditions) is not taken into account.

• It is not possible to reflect the importance of each searchterm.

• A Boolean search just divides the database in two sets ofrelevant and non-relevant documents depending on whether

they fulfil the search conditions or do not. All retrieveddocuments are supposed to be of similar relevance so there isno mechanism to rank documents.

8/2/2019 02 Scientific Information Sources

http://slidepdf.com/reader/full/02-scientific-information-sources 37/41

Relevance sorting

• A simple method consists in assigning a weight to eachterm in each document.

• The easiest way to assign a weight to a term is to count

its frequency in the document.

• The total weight of a document in reply to a query is thesum of weights of all search terms.

• Those documents with a higher weight are ranked first.

8/2/2019 02 Scientific Information Sources

http://slidepdf.com/reader/full/02-scientific-information-sources 38/41

Relevance sorting: example

Term A Term B Term C Term D

Docum ent 1 8 6 0 3

Docum ent 2 4 0 7 6

Docum ent 3 3 0 4 2

8/2/2019 02 Scientific Information Sources

http://slidepdf.com/reader/full/02-scientific-information-sources 39/41

Relevance sorting: example

• Retrieved documents sorted by relevance for each query:  – A AND C: Doc. 2; Doc. 3 – A OR C: Doc. 2; Doc. 1; Doc. 3 – A NOT C: Doc. 1

• Improving relevance sorting: – Weight the frequency of each term in the database: less

frequent terms are more useful to discriminate documents. – Position of the search term (title, for instance). – Number of incoming links from other documents (in digital

environments).

8/2/2019 02 Scientific Information Sources

http://slidepdf.com/reader/full/02-scientific-information-sources 40/41

Pay attention to thepresentation of results

• Good presentation increases the potential use of theinformation by the users, improves their comprehensionof the information, helps them to save time, andincreases users’ satisfaction.

 – Specify the sources searched and the search strategy. – Summarise the results.

 – Organise the references (alphabetically, by relevance...) andpresent them in a standard format.

 – Pay attention to the format (headers, fonts, margins, etc.).

 – Include recommendations: full text access, relevant sources, etc.

8/2/2019 02 Scientific Information Sources

http://slidepdf.com/reader/full/02-scientific-information-sources 41/41

Reading

Stone, G. 2009. Resource Discovery. In:Digital Information: Order or anarchy? London: Facet, p. 133-164.

Also available at:http://eprints.hud.ac.uk/5882/