hidden universes of information on the internet rev. 05/2015 copyright © russ haynal russ haynal...

73
Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker http://navigators.com [email protected] 703- 729-1757 f you send me an email, put “internet training” in the e-mail's subject abyznewslinks.com Ensure the Internet is an asset, not a liability for your organization

Upload: lilian-banks

Post on 11-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Hidden Universes of Information on the Internet

Rev. 05/2015Copyright ©  Russ Haynal

Russ Haynal Internet

Instructor, Speaker, and Paradigm Shaker

http://[email protected] 703-729-1757

Note: If you send me an email, put “internet training” in the e-mail's subject

abyznewslinks.com

Ensure the Internet is an asset, not a liability for your organization

Page 2: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 2

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Course Outline

• Introduction to Internet Architecture

• Preparing for a search

• “Persona” issues

• Search Tools - In Depth

• Advanced Features

• Specialized Resources

• Source Evaluation

• Review / Summary

specific_page.html

Online Web page = http://navigators.com/opensource.html

Page 3: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 3

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Disclaimer

• This session illustrates a wide variety of search tools, techniques and research methods

• Consult your organization’s policies to verify if these methods are approved for your types of Internet connections

Page 4: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 4

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Internet Definition

Internet represents a once per thousand year event

Last such event = Gutenberg printing press

Are You Literate in Today’s Online World?

“A large collection of Inter-connected networks and computers”

“A new fundamental form of communication that will absorb other communication channels”

Page 5: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 5

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Internet’s Growthstats.html

Page 6: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 6

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Number of hosts in each Domain

net 366,592,151

com 163,634,309

edu 12,251,571

mil 2,591,408

gov 2,304,501

org 2,119,538

jp 74,461,142

de

34,904,481

br 33,691,951

it 26,136,473

cn

19,976,554

mx

17,658,991

fr 17,437,386

au 16,900,586

ru

15,122,103

nl

14,011,944

pl 13,535,863

ar 13,335,042

ca 9,004,861

uk 8,116,718

in 7,429,638

tr 7,146,979

tw

6,429,021

se6,214,373

be 5,380,902

ch 5,241,511

co

4,721,748

fi4,572,642

es

4,147,699

pt 4,003,039

cz 3,895,833

th

3,674,102

at

3,646,960

gr, za, no,

hu, nz,ro,

dk, il, ua

us

2,087,768

Source: www.isc.org as of July 2013

Top Level Domains

stats.html

Page 7: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 7

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Example Backbone Maps

Level 3

AT&T

isp.html

Sprint

C&W

Verizon

Page 8: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 8

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Backbones Connecting

regionalISP #1

For a complete picture, initiate traceroutes from within several different backbones

backbone ISP- A

Backbone ISP

Regional ISP

Exchange Point

Client

Server

Private Peering

backbone ISP- B

regionalISP #2

hostingdata

center

Enterprise LAN/Wan

large organization

traceroute.html

Page 9: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 9

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Exchange Point Traffic

• Notice the daily fluctuations - Analysts may want to “schedule” their research

• Traffic continues to grow rapidly in many locations

isp.html

Source: http://www.hkix.net

Page 10: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 10

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/How Does it Work?• Internet started as “Packet Switching Networks” using TCP/IP

(Transmission Control Protocol - Internet Protocol)

• Every Internet connection has a unique IP Address consisting of 4 numbers, each number has a range of 0-255 (e.g. 198.211.16.134)

• IP numbers are allocated through a hierarchy

– IANA ARIN / RIPE / APNIC / LACNIC / AFRINIC ISP/company/country

• Routers direct your packets of information along the “preferred” path

Router Router

Router

Router

Router

Router

RouterRouter

Note: The next version of IP address space (IPV6) is LARGE3,911,873,538,269,506,102 IP #’s per square meter of the Earth's surface 4,500,000,000,000,000 IP #’s for every observable star in the universe

traceroute.html

Page 11: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 11

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Domain Name System

• The Domain Name System (DNS) associates alpha-numeric names with IP addresses

• Names are registered with commercial registrars such as Go Daddy or country-specific registrars

• DNS Servers are distributed throughout the Internet - They act as a set of inter-linked phone books

• You enter “www.navigators.com”, DNS servers match it to “198.171.173.51”

• Historical meaning for domain names– .com=commercial .net= Internet Provider .org = non-profit

– .uk = United kingdom .pk= Pakistan .ru = Russia

• Reality…. Many country domain names are for sale to ANYONE from ANYWHERE

domain_name.html

Page 12: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 12

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Web Server / Web Site

Web site = the content

Web server = computer with server software and reliable Internet connection

Web pages= htm, html

Graphics= gif, jpg

Other files=pdf, ppt, doc, txt, exe, zip

Page 13: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 13

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/A more complex environment

•Internet users interact with web server•Web server query is passed along to database•The content of the database is only displayed

TEMPORARILY in a web page that is created in response to USER-actions.•Most database content is unreachable by search

engines

User Browser

Online Hosting

typed form

Web server

page data

Application server

Page 14: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 14

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Accessing a Web Page

company.com

Sales

gadget.html

1. Browser requests URL:http://www.company.com/sales/gadget.html

2. Connect to web server

3. Server sends gadget.htmlfrom its sales directory

4. Browser displays gadget.html, requests graphics,

and eventually terminates connection to the server

5. background communications:Graphics, cookies, etc

“Document not found”? - Try shortening the URL!

logo.gif

Page 15: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 15

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Course Outline

• Introduction to Internet Architecture

• Preparing for a search

• “Persona” issues

• Search Tools - In Depth

• Advanced Features

• Specialized Resources

• Source Evaluation

• Review / Summary

specific_page.html

Page 16: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 16

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Introduction to “Persona”

• While viewing a web page (URL1), you click on a hyperlink to another web page (URL2)

• Your web browser sends “environment variables” to the web server

• Webmaster’s use this information to determine information about you and your organization (physical location, your interests, software )

ReportsAccesslogs

Analyst WebmasterWeb ServerURL1

URL2

InternetAccess

As you surf the Internet, you give-off a certain persona

You should always know what websites know about you

persona.html

Page 17: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 17

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Persona Details•Your persona is communicated to every web server that you visit

•You should understand your persona before you visit any website For example, should you visit:– badguy.com from agency.gov?

Your persona is communicated via “environment variables” such as:

•REMOTE_HOST = This is the name associated with your IP Number

•REMOTE_ADDR= This is the IP number of your computer, or proxy. A webmaster could do a traceroute to see how you are connected

•HTTP_REFERER = This is the URL of the page you were previously viewing. Be careful on how you create web pages. For example, do you want to reveal the following?:

– http://badguy.com is listed on http://intranet.agency.gov/joe_smith/investigation_targets.html?

• Persona details may also be transmitted via Java applets (e.g. ga.js) and Adobe flash

persona.html

Page 18: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 18

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/A Typical Scenario...

searchtool.com webmaster knows your “search terms”

destination.com webmaster knows what “search terms” you used to find them

Persona:- agency.gov OR- town.ninja.com

Analyst

searchtool.com

destination.com

“search terms”

http://searchtool.com/query=searchterms

hits

page

webmaster

webmaster

persona.html

Page 19: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 19

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Always check your Persona

• Several persona testers are listed at navigators.com/persona.html

This is a key paragraph to look for… If this is missing, then no referring URL is being passed via http_referer

Important note: This test page is most accurate when you click on a link to arrive at this page.

persona.html

Page 20: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 20

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Think before you click...

• Does your connection transmit a referring URL?

• IF IT DOES... do NOT “Click” on your search results

• Clicking on this link will tell orgnet.com’s webmaster that you found them while searching for “terrorist”

Referring URL

Hover over the link to see its URL

persona.html

http://www.google.com/query=terrorist_&start=110

Page 21: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 21

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/

ninja.com

agency_portal.com/page_namestarget.comagency.gov

Analyst #1

Analyst #2

Persona=agency.gov + referrer = portal

Persona=ninja.com + referrer = portal

The “portal” Problem...

Exposing a “less recognizable” persona

Analyst #1: uses agency.gov persona to visit “targets”

Analyst #2: uses “ninja.com” persona to visit “targets”

Result: “ninja” persona may be recognized as “agency.gov” visitor

The “parallel visit” Problem...

ninja.com

target.comagency.govAnalyst #1

Analyst #2

Even with no http_referer, a webmaster can still make the association due to high volume hits, usage patterns, software footprint, etc.

persona.html

Page 22: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 22

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Course Exercises

A combination of lecture, demo, and hands-on exercises will occur for each major search tool as follows:

Lecture - I will introduce the search tool/technique(Please refrain from using your computer)

Demo - I will demonstrate the tool/technique (Please refrain from using your computer)Individual search – You search your chosen topic

- Be an “explorer”, not a “camper” - Add many favorites, and keep going

Student-chosen topic – You will search for the same topic throughout the course. This allows you to compare results among the various search tools / techniques.

Pick a topic you can stay with for 2 days

Page 23: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 23

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Plan out your Internet Research

• Spell it Out - Define the topic, spell it out, key words, acronyms, “what” and “who”

• Strategize - Choose your approach, online resources, specific search tools

• Search - Get online, stay focused, use advanced search features

• Sift - Filter the results, follow the leads

• Save – Make bookmarks, take notes, organize results, share with co-workers

search_methodology.html

Page 24: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 24

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Spell out the topic...

1. Name of topic, and what do you want to learn about the topic

__________________________________________________________________

__________________________________________________________________

2. Spell out the topic (search terms, acronyms, abbreviations)

_______________________________

_______________________________

_______________________________

_______________________________

_______________________________

_______________________________

_______________________________

_______________________________

3. Make a list of “who” might publish such information (industry association, government agency, NGO’s, user group etc.)

__________________________________________________________________

__________________________________________________________________

common, simple terms obscure, specific terms

search_methodology.html

Page 25: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 25

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Follow All Good Leads in Parallel

linkAlinkBlinkC

Results

linkAlinkBlinkC

Results

link1link2link3

Page AlinkXlinkYlinkZ

Page 1linkMlinkNlinkO

Page Y

link1link2link3

Page A

link1link2link3

Page B

link1link2link3

Page C

Valuable links B&C never get explored...

Many users follow only one good lead at a time

linkXlinkYlinkZ

Page 1linkMlinkNlinkO

Page Y

• Right-click to open each link in its own browser window (or tab)• Switch between windows = “ALT-tab”• Switch between tabs= “CTRL-tab”• Note: http_referrer is still transmitted• Do NOT launch multiple browsers from desktop or start-menu

multiple_browsers.html

Page 26: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 26

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Course Outline

• Introduction to Internet Architecture

• Preparing for a search

• “Persona” issues

• Search Tools - In Depth

• Advanced Features

• Specialized Resources

• Source Evaluation

• Review / Summary

specific_page.html

Page 27: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 27

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Overview of Search Tools

• Search Engine (Google, Bing)– Large database – text from billions of clickable pages

• Directory (dmoz.org)– Manually built subject tree – links to millions of web sites

•“User Pages” (Joe’s guide to widgets)– Built by subject experts - hundreds of topic-related links

Each tool has strengths and weaknesses

Pick the right tool...

search_tool_intro.html

Page 28: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 28

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/

Directory ( www.dmoz.org, dir.yahoo.com)

Directory

Filer may not be a subject-expert

URL’s & Descriptions (submitted by users)

• Good for early stages of search, general subjects

• Links are grouped by topic

• Pages are manually built

search_tool_intro.html

Page 29: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 29

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Class Exercise – browsing a directory

• Go to www.dmoz.org

• Do NOT use the “search box”

• “Explore” for your topic by clicking through categories / sub-categories /

• When you reach the “bottom” of a subject tree, right-click “open new window” any useful links

• Make bookmarks of any good websites (including websites that are “close enough” to your topic)

Page 30: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 30

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Searching a directory...

Main Menu“top”

Topics

subtopics

Content of subject tree

website

Links to external web pages

• Searches the text within the directory’s own web pages

• Use search terms that would appear in:

– category titles

– web site titles

– web site’s brief description

• You are NOT searching the websites, just their brief description

search_tool_intro.html

Page 31: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 31

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Class Exercise: Searching a directory• Go to www.dmoz.org

• Type into the search box

• Enter only a few simple searchterms

– name of category / name of website

– keyword from website’s brief description

• Do not just click on search results

• Instead, click on the category to see this hit and additional websites which may not have used your particular search terms

NO

Yes

search_tools.html

Page 32: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 32

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Search Engines ( google.com , bing.com)

• Search engine’s “robot” clicks through Internet, copies web pages into its database

• Supports detailed keyword searches

• Learn the features & options of each search engine

Web ServersSearch Engine

RobotIndexerSearch

Interface

Your PC

Indexed Database

CachedWeb pages

copied Web page

search_tool_intro.html

You must envision the target page “Use your imagination”

e.g. Try adding “resume” or “curriculum vitae” to your search terms

Page 33: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 33

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Class Exercise: Using a search engine

• Go to google.com and bing.com

• Enter identical terms into both search engines(make sure search terms remain unchanged)

• Look through the search results

– Which gave more hits?

– Are top-ten hits the same?

• Add additional specific search terms as needed to focus the search results

• Make bookmarks of any good websites

search_tools.html

Page 34: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 34

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Search Engine Comparison

• http://ranking.thumbshots.com – Compares the first sixty hits from two search engines you select

• Notice on this search for “jihad’, only 12 out of 60 hits appeared in both Google and Yahoo… Most hits are unique to each search engine

search_tools.html

• News, forums and analysis of search engines

Global Search Stats

Source: comScore qSearch

Page 35: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 35

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Which have you bookmarked?

• Advanced search page can be used just as easily as basic search page

• Seeing these options might remind you to use them

basic search advanced search

Key Tip: Limit your searches to PDF or Powerpoint files to quickly locate detailed content

from great web sites

search_tools.html

www.google.com/advanced_search and www.google.com/preferences

Page 36: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 36

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Google’s Cached Issues…

Leads your browser to live website

Google stores the text of a cached webpage. The graphics, videos, etc. are still downloaded by your browser from the live website. To view a “text only” version of Google’s cache…

1) Cut and paste this text into your browser address bar:

http://webcache.googleusercontent.com/search?strip=1&q=cache: 2) Add your desired address onto the end of the above stringfor example: webcache.googleusercontent.com/search?strip=1&q=cache:navigators.com/isp.html

cached.html

no space

Page 37: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 37

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/www.yippy.com

• Yippy examines the first couple hundred hits, and groups them together into “clouds”

• View the 10-15 hits you really want without reading through 200 mixed search results

search_tools.html

• Ixquick.com - searches multiple search tools

• Stars show number of search engines that gave site a top 10 ranking

Page 38: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 38

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/

Web Analytics (alexa.com, urlm.com, urlm.co.uk)

• Like most toolbars and browser extensions, it “spies” on its users

• Some of the information collected via the toolbar is available for free at alexa.com

search_tools.html

This is a great way to quickly assess the popularity of a website, and audience demographics

• Enter a domain name

• Study web analytics and “related” sites

• Top sites listed by country or subject area

Page 39: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 39

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/“User Pages”

Potential weblink

• Focused on a specific subject• Developed by “experts” in that

field(or a person with passion for subject)

Info Expert

• Often contains “the best” online resources

search_tool_intro.html

Page 40: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 40

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Finding “User pages”• Announced to Dmoz and other directories

• Listed at wikipedia, wikimapia

• Groups of users at forums, blogs and mailing lists

• Watch for sites labeled:“Joe’s ultimate guide to widgets”

• “User pages” often point to other “user pages”

• “Surfing Upstream” from several related sites

• Ask other researchers – there are several sites that everyone knows as “the best”

• Interactive, live communication (Chat, VOIP, virtual worlds)

search_universes.html

Page 41: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 41

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Wiki ____

• A Wiki allows immediate creation and editing of pages by “anyone”

• Wikipedia.org – Encyclopedia that can be instantly edited by ANY Internet user

• Good starting point for many subjects to gain an overview of the topic

• Page can be biased from the most recent editor

• Some entries get “locked-down” due to vandalism

• old.wikimapia.org – same concept applied to maps

• “map type” google map: zoom to the right location

• “map type” “wikimapia classic” : to see comments

• To learn about the author: click on a comment box menu history the user’s name stats then clicking on the stat numbers listed shows every place that user has added

Page 42: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 42

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Blogs and Forums

• A Web Log (blog) is usually owned by one person

• Owner can post a log of their daily activities, or post ongoing comments about a topic

• Others may also be allowed to add comments onto the blog

• Wordpress and blogger are popular sites

• Forum – discussion focused on a particular topic

• Many users can participate by posting messages

• Moderators may “police” comments that are considered off-topic

• Try searching for:

• Searchterms forum post - to find a forum that discusses your topic

• Searchterms forum post replies views – to find individual threads and messages that discuss your topic

• Membership requirements are a barrier to search engine robots

• Vbulletin is a popular program used on many forums

search_universes.html

Page 43: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 43

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Surfing Upstream vs. Downstream

#1 Most researchers follow the links “downstream” from an interesting page

#2 Shows pages that link towards the target (=upstream) This is an Indication of the page’s “popularity” = who knows about target.com

#3 Shows pages that link to both target sites … = “user pages” for that topic

Target.com

Target.com Target.com Target2.net

#3#2

“Upstream” “Joe’s guide to MANY targets”

search_upstream.html

#1

Page 44: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 44

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Be Creative When Surfing Upstream

Example: Washington DC Tourist Sites

• Any combination of these target pages will lead you to “DC Tourism” pages, but certain pairings may also lead you to subject-specific pages

www.spymuseum.org

www.nasm.si.edu(air & space museum)

www.fordstheatre.org

www.kennedy-center.org

Theatre links DC TourismMuseums / Educational

search_upstream.html

Page 45: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 45

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Surfing Upstream Details

• You need to decide which scenario makes more sense; Row #1 or Row #2 e.g. who links to the home page of the entire site vs, who links to a specific webpage within the site

• A 3rd and 4th site can be added if they are popular enough

• Note: do not include “http://”

search_upstream.html

search format at google or bing search results

“www.example.com” contain text: www.example.com

“www.example.com/pageA.html” contain text of the specific page address

+“www.example1.com”

+“www.example2.com”

contain text of both example site addresses This is a great way to discover “user pages” (e.g. Joe's guide to many example-sites)

Page 46: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 46

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Searching within a site or domain name

• This technique can save you weeks of search time• Much faster than reading through thousands of

web pages from a large website• “use your imagination” to focus these searches• Note: do not include “http://” or “www”

search_upstream.html

search format at google search results

site:example.compages hosted on any kind of example.com servers (www.example.com, blog.example.com, etc). This is a quick way to assess the size/depth of a web domain

site:example.com searchterm pages hosted at example.com which mention "searchterm"

site:ru searchterm pages hosted on .ru servers which mention "searchterm"

site:ac.ru nuclearpages hosted on any academic .Russian servers which mention nuclear

site:iaea.org iran filetype:pdf PDF documents hosted at iaea web servers which mention iran.

Page 47: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 47

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/

Example: Iranian cell phone Company (Irancell-MTN)

Government Regulations, license site:gov.ir irancell

Industry MagazineNews, vendors, maps,Management interviewssite:gsma.com iran

Construction vendorTowers, networkssite:vendorsname.com iran

Equipment vendorPhones, networksPress announcementsite:nokia.com iran

Res

ume’

EmployeesResumes,Job Postingsresume irancellsite:linkedin.com irancell

CustomersService issues, technology insightsIrancell forum postsite:mob.ir irancell

Topic’s own websiteMarketing informationPress announcementsite:irancell.ir

Investors Ownership, disclosures

search_upstream.html

Who knows about your topic? (google search terms in red)

Page 48: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 48

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Course Outline

• Introduction to Internet Architecture

• Preparing for a search

• “Persona” issues

• Search Tools - In Depth

• Advanced Features

• Specialized Resources

• Source Evaluation

• Review / Summary

specific_page.html

Page 49: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 49

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/The “clickable web” is TINY

• Many detailed searches are a two-step process

– find the specialized database

– then type appropriate query into that database

World Wide Web (Clickable pages)

BlogsForums

Multi-media

Search Engines

Total online material

SpecializedDatabasesEmail

Closed systems

1. Initial Search 2. Detailed Search

= 1000X larger than the web

search_tool_specialized.html

© navigators.com

Page 50: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 50

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Lists of databases• For specific information, use a specialized

search tool – Get “deeper” results than a general search engine

• Thousands of search engines are listed

• Search engines are grouped according to the subject they cover

search_tool_specialized.html

70,000 databases

55,000 public record databases

.com

.net

• Or do your own search for the organization that would host the specialized database

Page 51: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 51

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Specialized Databases

• A phone book for the entire U.S. Includes reverse look-ups

• Federal Register and much more

• Worldwide list of manufactures

Specialized databases contain content that search engines can’t reach

search_tool_specialized.html

• Real-time tracking of ships from around the world

Page 52: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 52

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/

• Most publicly held companies are required to file financial statements with the Securities Exchange Commission

• These filings are online at SEC’s EDGAR database

• READ forms 10-Q and 10-K (quarterly and annual report) These are very detailed reports about the company’s activities, plans, sales, etc

• Seek out other business databases: financial, investment, Patents, government regulatory, etc

• Databases may be available at your library (internal or public)

Business databases can be quite usefulsearch_tool_specialized.html

Page 53: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 53

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Many country resources are onlinecountry_specific_content.html

Phone books

Assess popularity of resources using Alexa, or do a quick search using site: at Google

Page 54: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 54

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Course Outline

• Introduction to Internet Architecture

• Preparing for a search

• “Persona” issues

• Search Tools - In Depth

• Advanced Features

• Specialized Resources

• Source Evaluation

• Review / Summary

specific_page.html

Page 55: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 55

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Many countries sell their domains

• These were just some of the country domains available for sale

• “All Domains” happens to be a licensed “registrar” for these countries

• There are many additional countries who sell their domain names to “anyone”

domain_name.html

Page 56: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 56

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Learn about the 2-letter code

• Visit your county’s domain name registrar

–www.iana.org/domains/root/db OR

–www.norid.no/domenenavnbaser/domreg.html

• What is the policy for getting a domain name? (citizenship, trademark, local presence, money)

–What is the cost to register a domain name?

–Are there any censorship clauses?

• Does the registrar require any proof of identity? (drivers license, passport, business license)

• Is there a whois service? (make a bookmark)

domain_name.html

Page 57: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 57

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/

An analysis of domain name policies

http://www.norid.no/regelverk/rammer/regelverksmodeller.en.html

domain_name.html

Most countries sell their domain names to “anybody”

Page 58: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 58

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Domain Names for Sale • Only 29% .HT domain names were

registered to people with a Haitian address

• 48% of Haiti’s Domain names were registered to U.S addresses

• When you see a .ht website… is it necessarily foreign?

UnitedStates

Haiti

Mailing address for .HT Domain Owners

domain_name.html

1000+ new domains!

Page 59: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 59

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Source evaluation

• Pick apart the URL:

• Determine where “ownership” of the web page begins– www.facebook.com/joesmith/info.html

– www.joesmith.com/stuff/info.html

• Browse the directories (shorten URL if necessary)

• Look at domain’s home page - Is it a web hosting site? Is “pathname” a user account?

• IF the domain home page looks like the “owner” of the content, then you can move forward with whois and traceroute

sesseval.html

Page 60: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 60

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Source Evaluation - Using WHOIS

• Domain names are “registered” at Internet registrars (global, country-specific)

• Each registrar develops its own policies– may sell to anyone/anywhere (.com, .org, .net, .tv, .pk )

– may have strict qualification requirements (.gov, .mil, .au)

• Registrants provide “point of contact” information, for at least invoicing purposes

• Domain “point of contact” information is often available from the registrars’ database via a “WHOIS” query

• WHOIS contents may be inaccurate, although usually the email, or postal address will be correct to receive renewal invoice

whois.html

Page 61: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 61

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Performing a “Whois Query”• “whois” reveals the “owner” of a domain (searchenginewatch.com)

whois.html

Administrative contact: Ron Doobay HAYMARKET HOUSE 28-29 HAYMARKET LONDON SW1Y 4RX UK +44.2074849700 +44.2079302238 [email protected]

Technical contact: Domain Administrator 3rd Floor Prospero House 241 Borough High Street Borough London SE1 1GA UK +44.2070159370 +44.2070159375 [email protected]

Created on: 1998-03-20 Expires on: 2016-03-19

Domain name servers: NS3.INCBASE.NET 85.133.68.200 NS2.INCBASE.NET 62.140.213.136 NS1.INCBASE.NET 62.140.213.135

• Spam concerns has lead to many domain names being registered via “privacy enhanced” options

Page 62: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 62

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Traceroute• Shows a network path between 2 machines

• Traceroute designed to help de-bug network connections

• Can initiate traceroute from your workstation, or from public “traceroute servers” located throughout the Internet

• Each Internet provider has their own naming convention for their infrastructure

– Location labels: City names or 3-letter airport codes

– Exchange points (LINX, HKIX, AMS-IX)

– Infrastructure Topology (T3, FDDI, GE, SMW3)

• A website can be hosted anywhere

– Could be at organizations’ building, but may be hosted at a well-connected web hosting facility

traceroute.html

Page 63: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 63

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/

traceroute output from WWW.Telcom.Arizona.EDU to www.nsa.gov: 1 128.196.128.253 (128.196.128.253) 1 ms2 192.80.43.25 (192.80.43.25) 1 ms3 192.80.43.58 (192.80.43.58) 1 ms4 207.250.65.133 (207.250.65.133) 5 ms 5 core-01-ge.phnx.twtelecom.net (209.234.146.45) 5 ms 6 core-02-so.lsag.twtelecom.net (168.215.53.73) 17 ms 7 tran-01-ge.lsag.twtelecom.net (168.215.54.98) 17 ms 8 POS1-1.GW3.LAX1.ALTER.NET (208.222.8.245) 17 ms 9 CL2.LAX4.ALTER.NET (152.63.52.246) 18 ms 10 TL2.LAX9.ALTER.NET (152.63.115.146) 18 ms 11 so.TL2.DCA8.ALTER.NET (152.63.3.193) 74 ms 12 so.XL2.DCA8.ALTER.NET (152.63.35.250) 74 ms 13 ATM6-0.GW3.BWI1.ALTER.NET (152.63.39.41) 76 ms 14 * * * 15 * * *

Results of Traceroute

Traceroute and other online resources help reveal the dynamic

architecture of the Internet

Time-Warner and Alternet may peer at Los Angeles

Baltimore airport code

traceroute.html

Page 64: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 64

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/A Foreign Newspaper ???URL = http://www.eldia.com.ar

• “.ar” implies the site is from Argentina?

• Traceroute reveals this website is physically hosted in the U.S.

traceroute from WWW.Telcom.Arizona.EDU to www.eldia.com.ar: 1 woody-netops.telcom.Arizona.EDU (128.196.128.1) 1 ms …..8 peer-01-ge.chcg.twtelecom.net (168.215.53.194) 46 ms ….10 r01.chcgil01.us.bb.verio.net (129.250.2.254) 48 ms 11 r02.stngva01.us.bb.verio.net (129.250.5.103) 83 ms 12 ge.r0728.stngva01.us.wh.verio.net (129.250.27.219) 81 ms 13 ge.stngva01.us.wh.verio.net (161.58.129.13) 81 ms 14 noticiasargentinas.com (161.58.165.155) 80 ms 80 ms 81 ms

• Chicago, Illinois

• Sterling, Virginia

• wh = web hosting

Page 65: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 65

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Build a web page in 5 minutes

• Launch microsoft word

• Type, type, type (be creative)

• File save as web page

• Make a hyperlink:

– Highlight some text : “insert menu” -> hyperlink”

– Type complete URL (e.g. http:/ /www.cnn.com )

• Test the page file preview in browser

• Borrow a graphic “right-click” CNN logo “Save image” (C/temp)

• Insert a graphic : “insert menu” “insert image” from file

• Upload the finished page: announce to Google, Bing, etc

Any webmaster only requires this 5 minutes worth of knowledge

developer.html

Page 66: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 66

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Course Outline

• Introduction to Internet Architecture

• Preparing for a search

• “Persona” issues

• Search Tools - In Depth

• Advanced Features

• Specialized Resources

• Source Evaluation

• Review / Summary

specific_page.html

Page 67: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 67

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Each search tool is different

• Each search tool has it’s own unique set of defaults and options

• Take the time to learn the options of each tool

– Don’t assume anything

• These tools are competing, trying to be unique

• Read the help

Page 68: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 68

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Search - Review

• Stay organized in your search

– (spell, strategize, search , sift, save)

• Be conscious of the type of tool you are using (and read its help)

• The “right” search terms, placed correctly into the “right” search tool, should quickly yield “good” results

• Discover the best “user pages” and online communities for your topic - follow their leads(They have already weeded through the junk)

• Stay organized in saving your discoveries

Page 69: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 69

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Search Scenario

• Create bookmark folder

• Explore topic areas at directories or wikipedia

• Watch for “user pages”

• Are there databases or forums for the topic

• Surf upstream to find additional “user pages”

• Save search engines for specific, obscure search terms - use advanced features (pdf)

Page 70: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 70

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/

Several open sources can be combined to build a complete picture

fcc.gov filings: “12. C&W USA states that the Apollo Cable landing stations in the United States will be located in New York and New Jersey. In New York, the cable landing station will be located in Tritec Park, Brookhaven Technology Center, Shirley, New York, at coordinates 40º 50 minutes 30 seconds north and 72º 53 minutes 4 seconds west.”

Newspaper / Building Permit Section: “USA Apollo Cable Landing Station, Ramsay Rd. and Precision Dr., site plan-land division station, construct 25,573-square-foot one-story building to house computer equipment for a fiber optic cable landing station on one lot of a two-lot land division in Phase 1. External generators and associated above-ground vaulted diesel fuel tanks to be installed in Phase II. Cable & Wireless USA, Shirley.”

Start with a simple cable mapNautical charts show exact cable locations

Satellite imagery follows cable

FCC Filings, Building Permits, etc. provide additional details:

Reference: http://cryptome.org/eyeball/cable/cable-eyeball.htm

Here is the cable landing station

Page 71: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 71

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/web.archive .org

• Surf through previous copies of a web site

• Deleting sensitive information from today’s web server does not remove it from archive.org

• Archive.org robot collects web pages like other search engines

• Previous web page copies are not deleted

Web Servers

RobotUser

Interface

Recent copy

copied web page

Archive copies

User PC

• “document not found”? – Paste the address into archive.org

• Viewing archived web pages will cause hits to live target website

persona_example.html

Page 72: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 72

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/The Future of the Internet

Content Consumerof content

• Types of content

– Information, entertainment, business, leisure

• Content origins

– corporations, hollywood, other people

• Content formats

– text, audio, video, interactive reality

• Transport mechanism

– Phone line (copper/fiber), coaxial cable, wireless, direct satellite, electric lines

transport

Mergers and acquisitions are occurring horizontally and vertically

Page 73: Hidden Universes of Information on the Internet Rev. 05/2015 Copyright © Russ Haynal Russ Haynal Internet Instructor, Speaker, and Paradigm Shaker

Page 73

Russ Haynal/navigators.com/

Internet Instructor & Speaker

http:/Summary• Internet contains a large, fragmented information space

• Search engines are limited to billions of “clickable” pages

• The best content is organized by “people without lives”

• The Internet will transcend all other communication technologies

• Change is the only constant

The Future is Clear...Master the Information Superhighway

orBecome Roadkill