1 searching overview – providing a framework a bit of history...
Post on 21-Dec-2015
218 views
TRANSCRIPT
1
Searching
Overview – providing a frameworkA bit of history
[email protected]; http://comminfo.rutgers.edu/~tefko/
Tefko Saracevic
Central ideas
• Searching is a complex, interactive process aimed at finding & retrieving relevant information– this, of course, raises a number of questions as to what do
we mean by: • information• relevant• finding• retrieving• interaction
• Modern searching has deep roots in historical attempts to deal with information explosion
Tefko Saracevic 2
ToC
3
1. Basics - definitions2. Complexity – elements involved3. Searching & search interaction4. Professional changes in searching5. A bit of history6. Conclusions
Tefko Saracevic
A few definitionsWe all know them, but sometimes we should think about them
1. Basics
Tefko Saracevic 4
InformationGenerally
• “Information” has many meanings– depending on context – but it is universally well
understood• it is a primitive concept – one
does not have to explain it -other concepts, definitions are then built upon it
– but many definitions on the Web
From “informare” (Latin): to fashion, shape, or create, to give
form to
Context of searching: several layers or strata
– Narrow: information as a property of the message (text, record, document, image …)
– Broader: as property of cognition - affects or changes the state of a mind
– Broadest: also connected to the expansive social context or horizon, such as culture, work, task, or problem-at-hand
• We must consider inf. not only as “message” but in its cognitive & contextual sense
Tefko Saracevic 5
“Information is a difference that makes a difference.” Gregory Bateson
"Where is the wisdom we have lost in knowledge? Where is the knowledge we have lost in information?" T. S. Eliot
"With so much information now online, it is exceptionally easy to simply dive in and drown." Alfred Glossbrenner
"The stone age was marked by man's clever use of crude tools; the information age, to date, has been marked by man's crude use of clever tools." Source Unknown
Tefko Saracevic 6
Oh well…
Relevant, relevance
GenerallyMerriam Webster (2005)
“Relevant: having significant and demonstrable bearing on the matter at hand.”
“Relevance: the ability (as of an information retrieval system) to retrieve material that satisfies the needs of the user.”
SYNONYMS: pertinent, useful, of utility, germane, material, applicable, appropriate …
Information which is connected with a user’s (searcher’s, inquirer’s …) information need AND– cognitive state– as related to given task, or
problem at hand– & given affective state –
motivation, intention …
• Relevance has always a connection: “to”
Tefko Saracevic 7
In the context of searching: several layers or strata:
Finding, findabilityGenerally
• To realize, understand, or locate something especially by studying or observing
• To make a special effort to gather something together or summon something up
• To discover something or somebody after a search
In the context of searching:information has to foundable
Findability: (Morville, 2005)
• The quality of being locatable or navigable.
• The degree to which a particular object is easy to discover or locate.
• The degree to which a system or environment supports navigation & retrieval.
Tefko Saracevic 8
Information retrieval (IR)Generally
“Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers).”
Manning, Raghavan, & Schütze, (2008)
Other definitions on the Web
In the context of searching: Searching of & retrieval from
– abstracting & indexing databases & services
– specialized databases & sites – search engines– directories, portals– digital libraries– OPACs– reference resources– and the like …
• All can be labeled as IR systems – that is what they do
Tefko Saracevic 9
Searching is… (repeated)
Tefko Saracevic 11
… a complex process involving interaction & feedback between and among
PEOPLE,INFORMATION, &
TECHNOLOGY
Information
People
Technology
PeopleUsers
• Generally:– People who accesses & use an
information system
• In information retrieval (IR):– people with an information
need that may be satisfied by a search of an IR system
• End users:– people who use an IR system
directly to retrieve information.
Professional searchers - you
• Experts with knowledge & competencies for performing effective & efficient searches in a variety of sources, systems & media– searches may be done on
behalf of people, institutions, tasks mediated searching
– searchers must follow ethical guidelines
Tefko Saracevic 12
People
InformationContent
Objects that potentially may convey information as:– texts, documents, images
recordings , sites …– often refered to as records
• So far majority are texts & documents– even images & recordings are
mostly labeled (tagged) with texts
• Many systems collect them as to sources – e.g. journals, areas …
Organization
Ways & means by which the objects are organized to facilitate access & searching– vocabularies (free, controlled),
indexes, fields, abstracts, summaries, classification, clustering, links, sites …• great many now created
automatically e.g. terms extracted from texts
– many types of organization exist, more on the horizon
– essential for searching
Tefko Saracevic 13
Infor-mation
Technology: two components or layers
Hardware & software
• A variety of information & communication technologies– most importantly, includes
networks
• Software: many applications available– most are taken as given by users
& searchers
Systems
• Systems that handle information objects by:– identifying, collecting,
organizing, storing, managing providing access …
– & provide capabilities to search, retrieve, navigate, browse,
• We label them information retrieval (IR) systems
Tefko Saracevic 14
Again, the two are different things but are closely connected.Professional searchers need to know how to use both.
IN THIS COURSE WE DEAL ONLY WITH SYSTEMS!
Again, the two are different things but are closely connected.Professional searchers need to know how to use both.
IN THIS COURSE WE DEAL ONLY WITH SYSTEMS!
Tech-nology
Interactionwith information there is no such thing as not to interact (yes, a double negative)
General• We concentrate on behavior
of people in the use of information embedded in systems, services, networks, and devices
• More broadly & recently this also includes cooperative activities among dispersed people and resources
Various interactions• Human-information
interaction• Human-human information
interaction– User-searcher interaction
• Human-computer interaction
Tefko Saracevic 16
Human-information interactionGeneral
• Information interaction is the process that people use to communicate & act reciprocally with an information system, particularly in relation to its content
• It is a dynamic process mostly mediated by technology– involving feedback– reiteration; reciprocal action – evaluation
Context of searching
• How & why people access information is highly dependent on the context of their interaction– this context is influenced by a
range of factors such as • the time, place, and history of
interaction• the tasks motivating the
interaction• the technical possibilities of the
information systems.(from Information Interaction in Context, 2008)
Tefko Saracevic 17
Problem, Task
Job, situation at hand; incl. demographic & other characteristics & affective states of user. All in context
Inf. need Somewhat nebulous & subjective concept. Mostly refers to cognitive state, gaps in knowledge (Dervin), or anomalous state of knowledge (Belkin). Instrumental- goal oriented. Uncertainty. Necessity.
Question Verbal (written or oral) representation of the information need and/or problem at hand.
Query Question translated into a search statement as allowed/prescribed by a system. Variations.
Search Process of submission of a query and conduct of the search as prescribed by a system. Variations.
Response Responses or answer(s) by the system. Could be rearranged, reformatted. Evaluation of responses
Human-information interaction: Components defined
Tefko Saracevic 19
Feed
back
Reiteratio
n
• Not only to do the searching, but also (or in addition) to assist, lead, instruct, help a user in– defining, specifying the problem, task at hand
• particularly in terms of informational aspects & resources
– articulating the information need – diagnosis • guide from possibly visceral to expressive to be searchable
– formulating of question(s) – clarifying, defining concepts– translating into query(ies); choosing variations– evaluating responses; eliciting feedback to steer reiteration– guiding toward further action, resources on their own
Role of searchers - you
Tefko Saracevic 20
There is much more to searching than searchingThere is much more to searching than searching
Human-human inf. interactionGeneral
Communication between or joint activity involving two or more people with a goal of obtaining or exchanging information– reciprocal action– goal directed
• Most people still get most information from other people
Context of searching
User-searcher interaction– part of mediated searching
• searchers acting on behalf of other people or institutions
• On part of searcher involves user modeling: determining users’ inf. needs & requirements, & their characteristics as related to effective searches – predated by
reference interview
Tefko Saracevic 21
Human-computer interaction (HCI)General
HCI is the study & practice of interaction between people (users) and computers– relationship between humans
and computersHCI is concerned with the design, evaluation
& implementation of interactive computing systems for human use and with the study of major phenomena surrounding them.( Association of Computing Machinery, Special Interest Group on Computer-human Interaction (SIG CHI))
Context of searching
Study & practice of using computers, particularly interfaces, in searching for & retrieval of information– often concentrates on using
particular information systems, interfaces & algorithms
– evaluation of the effectiveness & efficiency of interactions – algorithms, systems, interfaces
Tefko Saracevic 22
• Of interest in librarianship for over a century– reference became a major component of library practice
• With advent of information & communication technology mediated online searching became a major professional & research activity– even mainstream of many information centers– publications, conferences, inf. industry oriented to it
• Searching, meaning mediated searching, became a big deal– well, we are teaching it for decades
Mediated searching
Tefko Saracevic 25
• Starting from early 1960’s an information industry developed dealing with computerized abstracting & indexing services & database available for searching – earliest ones were government sponsored (e.g. Medline),
then transformed within professional organizations (e.g. Chemical Abstracts), then private industry (e.g. Dialog)
• By 1970’s inf. industry became strong & global• Most databases & services from inf. industry were
oriented toward professional searchers– who then offered searching to users in their institutions,
companies, public – mediated searching
Information industry
Tefko Saracevic 26
• But, search engines have radically changed the way people search for information– mediated searching including reference questions, have
declined drastically over the last decade– users became end users – searching for themselves– end user searching of search engines exploded globally
• Reference questions drastically fallen off– between 1995 & 2006 reference transactions declined 54% in ARL
libraries (source: Assoc. Research Libraries statistics)
• Mediated searching followed – done much less than a decade ago
Changes due to search engines
Tefko Saracevic 27
• Libraries have added great many digital resources– including digital resources & databases for end user searching
• As a result today's users have changed use of libraries– virtual use is skyrocketing while physical use is plummeting
• users don’t vote anymore with their feet but their fingers
– electronic transactions are growing rapidly– physically users are not in the library but library use is going up
& up & up (again see ARL statistics)
• We do not have statistics how many searches are done on databases available in libraries, but must be a LOT!
General changes in library use
Tefko Saracevic 28
“Many years ago, the esteemed Barbara Quint offered an estimate that Google answered as many reference queries in half an hour as all the reference librarians in the world did in 7 years.”
Abram, S. (2008), Searcher, 16(8)I have no idea of the source of the statistics, or if they are right at all, but it seems OK
“While they [users] may be absent they are not inactive. Networked electronic resources via library portals and the Internet have provided users with benefits that go far beyond anything available when physical use was the only alternative.”
Martell, C. (2008), The Journal of Academic Librarianship, 34(5)
Oh, well…
Tefko Saracevic 29
• Web changed architecture & orientation of many databases & changed inf. industry in a big, big way– old databases restructured significantly e.g. Web of Science– new databases emerged - some very large e.g. Scopus– aggregators or publishers of journals became databases for
searching – e.g. EBSCOhost, Wilson– they went with great gusto after end users
• and with it after a much bigger & different market
• Now libraries & inf. centers buy time-based licenses from databases for access to their users
• e.g. RUL provides access to close to 300 databases in every field
Web & changes in inf. industry
Tefko Saracevic 30
Changes for searchers - you • Searchers are now
also involved with licenses, library Web systems, & access provision, plus:
• New orientation & services emerged & are still being developed, refined (as already mentioned in previous lecture):
knowledge navigation - supporting the user in locating and retrieving relevant information in the global information environment
cooperative searching – with users & projects
source recommendation – acting as recommenders
source evaluation – assessing value, quality & suitability
impact investigation – search for evaluative data of use in assessing outputs & impacts of research, institutions, researchers …
user assistance and training - incl. information literacy
Tefko Saracevic 31
But no matter what you still have to master searchingBut no matter what you still have to master searching
Tefko Saracevic 33
Antecedents • Europe before WWII:– strong documentation movement
– Universal Decimal Classification, indexing of scientific literature, utilitarian integration of technology & technique toward social goals
• In the US right after WWII concern about information explosion, particularly in science– Vannevar Bush’s classic article “As we may think” in Atlantic
Monthly in 1945 stirred imagination & funding– problem: “the massive task of making more accessible a bewildering store
of knowledge.”– solution: use of new technology, suggested a machine named “Memex”
as idealized model
• Technological imperative became a norm for solving inf. explosion problems – followed to this day
Tefko Saracevic 34
Beginnings• National Science Foundation (NSF) act of 1950 (& later
amendments) mandated support for scientific & technical information (STI) for effective use– from the start in 1950s to this day NSF supports research &
development in this area, including digital libraries• now through Division of Information & Intelligent Systems (IIS)
– sparked involvement of many fields; many projects were funded
• Other government agencies got involved– e.g. National Institutes of Health in supporting mechanization of the
National Library of Medicine to Medline & now MedlinePlus
• Other governments, first in Europe, USSR, and later globally started supporting similar activities
• Key idea in providing support for STI activities from the end of Second World War to this date:– effective dissemination of information considered of
strategic value for progress in science & technology
• Spread to all other fields & human endeavors • Bedrock of information industry• Searching fits right in there:– affected importance & increase of online searching as
a professional activity– affected spread of searching to wide populace
Information as strategic resource
Tefko Saracevic 35
Idea of information as strategic resource
Affected evolution of information age– global economy's shift in
focus away from the production of physical goods (as exemplified by the industrial age) and towards the manipulation of information
And information society– in which the creation,
distribution, diffusion, use, integration and manipulation of information is a significant economic, political, and cultural activity
Tefko Saracevic 36
• 1951 Calvin Mooers coined term “information retrieval” (IR) to label a burgeoning activity– by mid 1950’s computerized IR systems emerged & later proliferated
fast in many fields even outside of science & globally– among others, their searching became a professional activity
• Societies and conferences proliferated globally related to problems of IR and broader issues of information science– e.g. very influential 1958
International Conference on Scientific Information (with really great Proceedings)
Information scienceInformation retrieval
Tefko Saracevic 37
Tefko Saracevic 38
IR research• From the 1960’s & onwards Gerald Salton & his students
in computer science pioneered research into advanced IR methods– addressed technical or system side of IR– great many good results over decades
• but it took decades before results applied commercially• today all vendors & search engines use it
– IR research continues to this day internationally• particularly under TREC (Text Retrieval Conference)• and reported by Special Interest Group on IR (SIGIR)
• Research and IR are still closely connected– source of advances, but now also proprietary
Tefko Saracevic 39
Research (cont.)
• 1970s & 80s also saw emergence of research dealing with the human (user) side of IR– addressed users, use of information & IR systems– basic notions, such as relevance
• In the 1990’s till present growing research in areas:– interaction in IR, or human-computer interaction– human information behavior (Wilson, 2000)– information seeking & searching (Bates, 2002)
• Human and system side of research do not mesh well– still & unfortunately
Tefko Saracevic 40
Onto the real world• 1960s saw computer applications for IR blossoming– also library automation emerged, incl. MARC (go to RUL then ERIC to
retrieve the report)
• Late 1960’s: Medline, the online version of MEDLARS (National Library of Medicine) came out
• this was online way before the Internet & the Web through commercial time-sharing networks, such as Tymnet & Telenet
• Professional searching became firmly established– grew at high rate– most access for users was through mediated searching– but end user searching grew slowly
• Early 1970’s: Dialog and ORBIT established – large commercial online vendors
• Dialog after a number of changes in owners is still in business; ORBIT later merged with other vendors & disappeared
– they provided online access to an ever growing number of databases – became information supermarkets
– later joined by a number of other vendors more specialized• e.g LexisNexis, STN, EBSCOhost, CSA, etc. etc.• or new giants, such as Scopus (already mentioned; link is to an overview)
• Magazines, such as Information Today & Searcherdutifully record & comment on what is going on in information
industry & the profession
Onto the real world (cont.)
Tefko Saracevic 41
• Internet first went live in 1969 as ARPANET, an inter-university net – in 1983 TCP/IP protocol was adopted, free & still in use
globally today – i.e. present Internet was born– in 1986 NSFnet was created, broadening reach significantly– in 1995 NSF pulled out & offered to broad public &
commercial use• Internet infrastructure is now provided commercially
• By 1980s it became a force– by 1990’s it took the world
• Internet has a colorful history (from the Internet Society)– timeline shows rapid growth & development
the Net
Tefko Saracevic 42
• In 1991 Tim Berners-Lee invented the World Wide Web – a hypermedia initiative for global information sharing – in 1993 first Web browser was developed by Marc
Andreessen - Mosaic to become Netscape• it popularized the Web
• WWW became the fastest growing & spreading technology in history
• Search engines– Yahoo launched in 1993 & Google in 1999– affected searching enormously– today over 3000 search engines in over 150 countries
• but a few large ones dominate in every market e.g Baidu in China
Tefko Saracevic 43
WWW
Tefko Saracevic 44
Digital libraries
• Emerged in mid 1990s• Since then involved – massive research & development programs
• e.g the National Science Digital Library (NSDL)– massive investments by libraries
• changed the library landscape• particularly as to access & searching • for most libraries digital library portions of budget skyrocketed
• Brought together IR & libraries• Today vast international presence– many institutions in addition to libraries involved
• e.g. museums, societies, professional organizations
• Major resource for searchers– large variety of texts, images, sounds digitized all over the
world• rich source of many (and many unusual) resources not found through
databases or the Web
• At the same time major headache for searching– search mechanisms not well developed & integrated– federated searching (covering multiple databases at once)
still in infancy – e.g. RUL has close to 300 databases (see under Research resources –
Indexes & databases) yet almost all have to be searched individually – at RUL federated searching through Searchlight can be done on 8
databases only
Tefko Saracevic 45
Digital libraries and searching
• Everybody is a searcher now– searching is a mass sport
• whoever has a computer or other communication devices also searches
– however few do it well– even fewer can assess how well they are doing
• horror stories abound
• Search engines are constantly enlarging & refining their reach, coverage, specialization (e.g. Google Scholar)
• But still the major flaw: Web is value neutral• diamonds & rubbish, true & untrue, good & evil are all equal
New world for searching
Tefko Saracevic 47
• New opportunities & challenges • They are providing value added services– and could so even more
• Connecting in different ways with users• Their basic worth:TRUST – that is where ethics play a major rolePROFESSIONAL COMPETENCE – that is where your life
long education plays a major role• This whole course is just a beginning
Opening for searchers - YOU(and libraries & information centers)
Tefko Saracevic 48