secrets of the invisible web using the web effectively in research and fact-checking prof. janice...

24
Secrets of the Invisible Web Using the Web Effectively in Research and Fact-Checking Prof. Janice Castro, The Medill School of Journalism

Upload: cleopatra-williams

Post on 25-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Secrets of the Invisible Web Using the Web Effectively in Research and Fact-Checking Prof. Janice Castro, The Medill School of Journalism

Secrets of the Invisible WebUsing the Web Effectively in Research

and Fact-Checking

Prof. Janice Castro, The Medill School of Journalism

Page 2: Secrets of the Invisible Web Using the Web Effectively in Research and Fact-Checking Prof. Janice Castro, The Medill School of Journalism

Searching The Invisible Web

How To Boldly Go Where Google Never Goes

And Get It Right!

Page 3: Secrets of the Invisible Web Using the Web Effectively in Research and Fact-Checking Prof. Janice Castro, The Medill School of Journalism

c. Janice Castro

The World Wide Web has brought us information wealth

Page 4: Secrets of the Invisible Web Using the Web Effectively in Research and Fact-Checking Prof. Janice Castro, The Medill School of Journalism

c. Janice Castro

It seems to have just about everything

Instead of just reading our local paper, we can read papers from Israel, Jordan, India, our hometown and everywhere else.http://www.newseum.org/todaysfrontpages/If we miss the old home team, we can follow them on the Web.http://www.jsonline.com/packer/insider/flash.asp

Page 5: Secrets of the Invisible Web Using the Web Effectively in Research and Fact-Checking Prof. Janice Castro, The Medill School of Journalism

c. Janice Castro

Instead of asking around, we can find it through search.

When we get a new prescription, we can look up possible side

effects.

We can search for new restaurants, or find out when the movie

starts.

We can find our destination on a map, then print out driving

directions.

When someone’s in the news, we can search out all kinds of

background information, or review the history of a series of events.

Anytime we’re doing research, the world is at our fingertips.

However …

Page 6: Secrets of the Invisible Web Using the Web Effectively in Research and Fact-Checking Prof. Janice Castro, The Medill School of Journalism

c. Janice Castro

In fact, we are searching only the surface of the web.

There is a certain random feel to what we find there.Results include all sorts of irrelevant items that simply happen to include the sequence of letters we searched.While we may enjoy serendipitous discovery …… too often searching the Web is like rummaging around on an untidy desk.Interesting, maybe, but where is the primary source information?Much of the information you would get from a good librarian never turns up in your searches.

Page 7: Secrets of the Invisible Web Using the Web Effectively in Research and Fact-Checking Prof. Janice Castro, The Medill School of Journalism

c. Janice Castro

Search engines are not objective filters.

Search engines sometimes manipulate results for pay.

Selling key words (auctions start at a penny a word).

Ad terms.

Selling placement in the listed results.

Google “Ad Sense” – pay per click.

roosevelt hotel seattle

Page 8: Secrets of the Invisible Web Using the Web Effectively in Research and Fact-Checking Prof. Janice Castro, The Medill School of Journalism

c. Janice Castro

Search engines follow the crowd.

If many clicks are coming to a page, it will rise in the search

rankings.

The latest joke becomes more important than the original

document.

Social networking as search: http://www.eurekster.com/

Find the stuff your friends would want to see.

Search engines are utilities, not professionals.

Even when they don’t compromise their findings, sometimes

they can’t afford to update their results.

Page 9: Secrets of the Invisible Web Using the Web Effectively in Research and Fact-Checking Prof. Janice Castro, The Medill School of Journalism

c. Janice Castro

There’s a reason why they call it “surfing.”

Page 10: Secrets of the Invisible Web Using the Web Effectively in Research and Fact-Checking Prof. Janice Castro, The Medill School of Journalism

c. Janice Castro

The Web is not a library.

A vast retrieval system.A network of networks.It is not the content, but the delivery system.More like AT&T than the Library of Congress.Hundreds of billions of pages that can link to one another.Infinite variety in terms of quality, origin, authorship.No master plan for organizing it.

Page 11: Secrets of the Invisible Web Using the Web Effectively in Research and Fact-Checking Prof. Janice Castro, The Medill School of Journalism

c. Janice Castro

All sorts of things can be found on the Web …

On a San Francisco site, we found a 1972 article by a little-known reporter for Nashville’s Tennessean about a strange commune of several hundred former residents of the Haight Ashbury in San Francisco that had set up housekeeping in a rural Tennessee county.

“SUMMERTOWN, Tenn. A barn decorated with oriental rugs and bleachers made of straw has become the unlikely meeting ground between members of Stephen Gaskin's commune and a Church of Christ congregation. Every Sunday afternoon and every Monday night for the past four weeks, the minister of Sandy Hook Church of Christ has come with 30 to 40 members of his congregation and several other ministers to share "the word of God" with Gaskin and the estimated 450 members of his commune who live on a farm near here …”

The author: Al Gore.

Page 12: Secrets of the Invisible Web Using the Web Effectively in Research and Fact-Checking Prof. Janice Castro, The Medill School of Journalism

c. Janice Castro

You never know what you’re going to find…

And that’s the problem.Because search engines are so easy to use, we tend to forget what they are sorting and what they may be missing.This is delightful for browsers, but frustrating for researchers.The Web is a route to most of the best sources of information.Rest assured that most of the high-quality data you seek can be found online ...… but they cannot always be located via search engines.

Page 13: Secrets of the Invisible Web Using the Web Effectively in Research and Fact-Checking Prof. Janice Castro, The Medill School of Journalism

c. Janice Castro

This is the surface Web

Made up of what we scan with search engines (like dragging a

net across the ocean surface).

Composed largely of static html pages linked to one another.

We browse via links; so do the search engines.

Like us, the search engines come up with different results.

We think that search engines are searching it all.

Page 14: Secrets of the Invisible Web Using the Web Effectively in Research and Fact-Checking Prof. Janice Castro, The Medill School of Journalism

c. Janice Castro

We’re wrong.

Page 15: Secrets of the Invisible Web Using the Web Effectively in Research and Fact-Checking Prof. Janice Castro, The Medill School of Journalism

c. Janice Castro

There’s much more.

Page 16: Secrets of the Invisible Web Using the Web Effectively in Research and Fact-Checking Prof. Janice Castro, The Medill School of Journalism

c. Janice Castro

Secrets of the invisible (deep) Web

Rich content databases.

Information from libraries, universities, governments,

government agencies, businesses, trading houses,

associations.

Dynamic pages generated on demand by databases typically

are invisible to search engines.

Page 17: Secrets of the Invisible Web Using the Web Effectively in Research and Fact-Checking Prof. Janice Castro, The Medill School of Journalism

c. Janice Castro

Deep Web versus Surface Web

Surface Web: mainly static html pages, not always up to

date.

Deep Web: often dynamic pages generated on request.

Surface Web: Often secondary or tertiary treatments of data.

Deep Web: More likely original sources and research.

Page 18: Secrets of the Invisible Web Using the Web Effectively in Research and Fact-Checking Prof. Janice Castro, The Medill School of Journalism

c. Janice Castro

How deep is this ocean?

Expert estimates of its size vary: from 50 to more than 500

times as large as the Surface Web.

Estimate: some 200,000 information-richD eep Web sites.

One estimate: just 60 Deep Web sites contain 40 times the

total information contained in the Surface Web.

There is not a perfect separation.

Storms of public interest often stir some Deep content to the

surface, before it sinks again.

Page 19: Secrets of the Invisible Web Using the Web Effectively in Research and Fact-Checking Prof. Janice Castro, The Medill School of Journalism

c. Janice Castro

What Makes These Sites Invisible?

? … Spiders run away from question marks.

Unwelcome spiders sometimes fall into spider traps.

Interactivity: Typing or judgment required to proceed.

Privacy: “Keep Out” signs, robots.txt, “No Index” metatags.

Search engines exclude the media or format.

Page 20: Secrets of the Invisible Web Using the Web Effectively in Research and Fact-Checking Prof. Janice Castro, The Medill School of Journalism

c. Janice Castro

Material That Is Left Out

Formats: most PDF, Excel, PowerPoint, WordPerfect.

Media types: audio, video, Flash, Shockwave.

Any content that requires passwords for access.

Streaming media: financial, news, weather, sports.

Search engine does not optimize for it.

Page 21: Secrets of the Invisible Web Using the Web Effectively in Research and Fact-Checking Prof. Janice Castro, The Medill School of Journalism

c. Janice Castro

Deep Web Contents Include: Major Databases

The Library of Congress (http://www.loc.gov)Thomas: everything you ever wanted to know about what’s going on in Congress, status of legislation, etc.: http://thomas.loc.gov/Gateway to federal sites: http://www.firstgov.gov/The federal reference center:(http://www.firstgov.gov/Topics/Reference_Shelf.shtml#statistics)NASA (http://www.nasa.gov/home/index.html)FedStats (http://www.fedstats.gov)The U.S. Census Bureau: http://www.census.gov/

Page 22: Secrets of the Invisible Web Using the Web Effectively in Research and Fact-Checking Prof. Janice Castro, The Medill School of Journalism

c. Janice Castro

Specialized portals and librarians’ gifts

PubMed (http://www.ncbi.nih.gov/entrez/query.fcgi)

Librarians’ Index (http://www.lii.org) - >11,000 sites.

How Stuff Works: http://www.howstuffworks.com/

The List of Lists (http://www.specialissues.com/lol/) - archive

collections, library catalogues, periodical databases and more.

The S.E.C. http://www.sec.gov/

U.S. Patent and Trademark Office: http://www.uspto.gov/

Infomine: http://infomine.ucr.edu/

Page 23: Secrets of the Invisible Web Using the Web Effectively in Research and Fact-Checking Prof. Janice Castro, The Medill School of Journalism

c. Janice Castro

Want to try it out?

Page 24: Secrets of the Invisible Web Using the Web Effectively in Research and Fact-Checking Prof. Janice Castro, The Medill School of Journalism

Thank you!

Prof. Janice [email protected]