http:// download the website to work locally tool: surf offline 1.0 create perl program to extract...

8
http://www.banrep.gov.co Download the website to work locally Tool: Surf Offline 1.0 Create PERL program to extract ebsite structure information and storage in Oracle Create Oracle Schema to store data Tools: TOAD 9.0, Oracle9i Tools: PERL 5.8 Create PERL programs to crawl the website and store data In Oracle Tools: PERL 5.8 Create the Matrix with links structure 1 1 1 1 1 1 P1 P2 P3 P4 P1 P2 P3 P4 Tool: Surf Offline 1.0 Excel . . .

Upload: tracy-hall

Post on 18-Jan-2016

242 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Http:// Download the website to work locally Tool: Surf Offline 1.0 Create PERL program to extract website structure information and storage

http://www.banrep.gov.co

Download the website to work locally

Tool: Surf Offline 1.0

Create PERL program to extractwebsite structure information and

storage in Oracle

Create Oracle Schema to store data

Tools: TOAD 9.0, Oracle9i

Tools: PERL 5.8

Create PERL programs to crawl the website and store data

In Oracle

Tools: PERL 5.8

Create the Matrix with links structure

1

1 1

1 1

1

P1 P2 P3 P4

P1

P2P3

P4

Tool: Surf Offline 1.0

Excel

...

Page 2: Http:// Download the website to work locally Tool: Surf Offline 1.0 Create PERL program to extract website structure information and storage

1

1 1

1 1

1

P1 P2 P3 P4

P1

P2P3

P4

What can of thinks can I do with this Matrix?

• Visualize the website structure

...

Page 3: Http:// Download the website to work locally Tool: Surf Offline 1.0 Create PERL program to extract website structure information and storage
Page 4: Http:// Download the website to work locally Tool: Surf Offline 1.0 Create PERL program to extract website structure information and storage

Internethttp://www.banrep.gov.co

TABLES

HTML_DOCUMENT

HTML_LINK

HTML_MATRIX

structure.pl

crawler.pl

PL/SQL

PERL 5.8

Excel

Different Formats

excel.pl

SQL

PAJEK

NETDRAW

Collect DataCollect Data

VisualizingVisualizing

Graph

Graph

VIEWSHTML_1_255

HTML_256_510

HTML_511_756

HTML_757_999

1000 Webpages

http://oracle92.is.umbc.edu:7778/isqlplus

Surfoffline

example1

example2

example3

ARCHITECTURE

Page 5: Http:// Download the website to work locally Tool: Surf Offline 1.0 Create PERL program to extract website structure information and storage

structure.pl

Page 6: Http:// Download the website to work locally Tool: Surf Offline 1.0 Create PERL program to extract website structure information and storage

crawler.pl

Page 7: Http:// Download the website to work locally Tool: Surf Offline 1.0 Create PERL program to extract website structure information and storage

PL/SQL

Page 8: Http:// Download the website to work locally Tool: Surf Offline 1.0 Create PERL program to extract website structure information and storage

SQL Scripts to create input files with different formats