exploring the deep web peter l. kraus j. willard marriott library – university of utah

52
Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah

Upload: jackson-ramsey

Post on 27-Mar-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah

Exploring the Deep Web

Peter L. Kraus

J. Willard Marriott Library – University of Utah

Page 2: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah

What is the Deep Web?

The deep Web is the hidden part of the Web, containing a huge volume of content that is inaccessible to conventional search engines, and consequently, to most users.

Page 3: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah

How big is the Deep Web?

• 550 billion documents

• 500 times the content of the surface Web

• Google has identified 1.2 billion documents

• An Internet search typically searches .03% (1/3000) of available content.

Page 4: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah

What’s in the Deep Web?

• Searchable databases

• Downloadable files & spreadsheets

• Image and multi-media files

• Data sets

• Various file formats such as .pdf

• Lots of government information

Page 5: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah

Why use the Deep Web?

• Higher quality sources– Selected and organized by subject experts

• Dynamic display

• Customized data sets

• Some data is visual, and not word searchable

• Regular search engines miss vast resources available in the Deep Web

Page 6: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah

Why are we talking about Government Sites in the Deep

Web?

• Governments have the mandate and the capacity to gather information that individuals don’t

• Most government information is copyright free

• Government information is authoritative• Governments have the financial and

human resources to maintain Deep Web sites

Page 7: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah
Page 8: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah

The Web Today

• Web sites from the federal government only occupy about 1% of the entire global web. However, they hold 85% of “The Deep Web”.

• The content of these web sites include items with either an .html or .pdf format (reports, records, data-sets, etc) – diversity of files. Little standardization or uniformity ; Common term for this content is “Grey Literature”.

Page 9: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah

Definition of “Grey Literature”

• “That which is produced on all levels of government, academics, business and industry in print and electronic formats, but which is not controlled by commercial publishers”

Page 10: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah

Growth and Life of Federal Information

• On federal web sites the amount of information grew 13-fold between 1992-2003

• The average life expectancy of federal web resource is 4 months (2003)

Page 11: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah

What can libraries do?

• LOCKSS-DOCS project (BYU and UU are members) (Archival project)

• Cooperative efforts in specific subject areas (Western Waters Digital Library)

• Individual Institutional Initiatives; such as Institutional Repositories ; reflecting the institutional productivity in research (Information often funded by federal grants)

Page 12: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah
Page 13: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah
Page 14: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah
Page 15: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah
Page 16: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah
Page 17: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah
Page 18: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah
Page 19: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah
Page 20: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah
Page 21: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah

Finding Naked People - Forsyth, Fleck (1996)   (Correct)   (54 citations)

This paper demonstrates an automatic system for telling whether there are naked people present in an image. The approach combines color and texture properties to obtain a mask for skin regions, which is shown to be effective for a wide range of shades and colors of skin.

http.cs.berkeley.edu/~daf/newo2.ps.Z

Page 22: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah

Graph showing number of citations to “Finding Naked People”

Page 23: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah
Page 24: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah

Arches National Park : NASA Landsat 7 10/3/99

Page 25: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah
Page 26: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah
Page 27: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah

searching for ""University of Utah""

displaying records 1 - 25 of a total of 27

next 25 last 25

Development and Evaluation of Stitched Sandwich PanelsLarry E. Stanley; Daniel O. AdamsNASA Langley Research CenterNASA/CR-2001-211025 , June 2001; 20010702

….. test panels were produced initially at the University of Utah and later at NASA Langley Research Center……

http://techreports.larc.nasa.gov/ltrs/PDF/2001/cr/NASA-2001-cr211025.pdf

Page 28: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah
Page 29: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah
Page 30: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah
Page 31: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah
Page 32: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah
Page 33: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah

Marriott Library, Salt Lake City, Utah, United States 9/18/2003 (TerraServer)

Page 34: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah
Page 35: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah

Utah Seismic Hazards (National Atlas)

Page 36: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah

International Deep Web Resources

• International organizations collect an amazing amount of data

• Statistical data is often best organized in database and spreadsheet format

• Like the US Government, individual countries post data files and databases

• This information may not be available in print sources in schools and libraries

Page 37: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah

United Nations Official Documents System

• http://documents.un.org/

Page 38: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah

Why use the ODS?

• Full-text Official United Nations Documents (1993 -) online, free

• Retrospective digitization in process

• Highly relevant material for almost any international topic

• Timely and authoritative

Page 39: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah
Page 40: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah
Page 41: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah
Page 42: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah

United Nations Statistical Databases

• Value of the information:– Authoritative – Comparative– Time series– Compact

• Database topics include:

• Commodity trade• Demographics• Disability statistics• Social indicators• Statistics on men and

women

Page 43: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah

http://unstats.un.org/unsd/databases.htm

Page 44: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah
Page 45: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah
Page 46: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah
Page 47: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah

Individual Country Statistics

• http://www.census.gov/main/www/stat_int.html

Page 48: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah

Why use this kind of information?

• Aggregate statistical sources are often not as up-to-date

• Individual countries are often more specific in their indicators than aggregate sources

• Information in databases, spreadsheets, and downloadable files is usually NOT searchable by web crawlers

Page 49: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah
Page 50: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah
Page 51: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah
Page 52: Exploring the Deep Web Peter L. Kraus J. Willard Marriott Library – University of Utah

For Further Information

• Marriott Library, University of Utah

801-581-8394

www.lib.utah.edu/documents

[email protected]