programming for www (ice 1338)
DESCRIPTION
Programming for WWW (ICE 1338). Lecture #4 July 2, 2004 In-Young Ko iko .AT. i cu . ac.kr Information and Communications University (ICU). Announcements. Our TA Name: Mr. Trinh Minh Cuong Email: minhcuong .AT. icu.ac.kr Office: F641 Office Hours: Tuesday 11-12PM, Thursday 2-4PM - PowerPoint PPT PresentationTRANSCRIPT
Programming for WWWProgramming for WWW(ICE 1338)(ICE 1338)
Lecture #4Lecture #4 July 2, 2004
In-Young Koiko .AT. icu.ac.kr
Information and Communications University (ICU)
July 2, 2004 2 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University
AnnouncementsAnnouncements
Our TAOur TA Name: Name: Mr. Trinh Minh Mr. Trinh Minh CuongCuong Email: minhcuong .AT. icu.ac.krEmail: minhcuong .AT. icu.ac.kr Office: F641Office: F641 Office Hours: Tuesday 11-12PM, Thursday 2-4PMOffice Hours: Tuesday 11-12PM, Thursday 2-4PM
Please send the instructor Please send the instructor your team informationyour team information Please send the instructor Please send the instructor your informationyour information for for
creating a Unix accountcreating a Unix account Submit your Submit your homework#1homework#1 ( (a URLa URL or or HTML HTML
source)source) by tomorrow by tomorrow
July 2, 2004 3 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University
Review of the Previous LectureReview of the Previous Lecture
Cascading Style SheetCascading Style Sheet Web-based Information IntegrationWeb-based Information Integration
ExamplesExamples Information MediatorsInformation Mediators Information Wrappers (Web Wrappers)Information Wrappers (Web Wrappers)
July 2, 2004 4 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University
Contents of Today’s LectureContents of Today’s Lecture
Basic UNIX CommandsBasic UNIX Commands More on Web-based Information More on Web-based Information
IntegrationIntegration JavaScriptJavaScript
July 2, 2004 5 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University
UNIX Operating SystemUNIX Operating System A A multi-usermulti-user, , multi-taskingmulti-tasking operating system operating system Developed by Developed by Ken ThompsonKen Thompson and and Dennis Dennis
RitchieRitchie at the Bell Lab in early 70’s at the Bell Lab in early 70’s Success factors of UNIXSuccess factors of UNIX
Written in a high-level languageWritten in a high-level language (C language) – (C language) – improving readability and portabilityimproving readability and portability
Support of primitives (system calls)Support of primitives (system calls) – permitting – permitting complex programs to be built efficientlycomplex programs to be built efficiently
A hierarchical file systemA hierarchical file system – easy maintenance – easy maintenance Hiding the machine architecture from the user Hiding the machine architecture from the user – –
allowing programs to be run on different machinesallowing programs to be run on different machines http://www.unix-systems.org/http://www.unix-systems.org/
July 2, 2004 6 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University
Architecture of UNIX SystemsArchitecture of UNIX Systems
Other application programs
cc
Other application programs
Hardware
Kernel
sh who a.out
date
we
grepedvi
ld
as
comp
cppnroff
July 2, 2004 7 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University
Basic UNIX Shell CommandsBasic UNIX Shell Commands cdcd - Changes directories to the one named - Changes directories to the one named pwdpwd - Displays the current working directory - Displays the current working directory lsls - Lists the contents of the current directory - Lists the contents of the current directory ls -lls -l - Same as above, but it lists with more information - Same as above, but it lists with more information mkdirmkdir - Make a directory - Make a directory rmdirrmdir - Remove a directory - Remove a directory catcat - Concatenate or show a files contents - Concatenate or show a files contents cpcp - Copy a file - Copy a file mvmv - Rename or move a file to a different name or directory - Rename or move a file to a different name or directory rmrm - Remove a file - Remove a file logoutlogout - Terminates a Unix Shell session - Terminates a Unix Shell session manman - Access manual pages - Access manual pageshttp://infohost.nmt.edu/tcc/help/unix/unix_cmd.html
July 2, 2004 8 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University
Publishing Web Pages on the ServerPublishing Web Pages on the Server
Copy your files to the ‘Copy your files to the ‘public_htmlpublic_html’ directory ’ directory under your home directory in the serverunder your home directory in the server
Use Use FTPFTP to copy your files in a local directory to to copy your files in a local directory to the server directorythe server directory
ftp vega.icu.ac.krftp vega.icu.ac.kr (login with your user ID) (login with your user ID)
cd public_htmlcd public_html
lcd d:\myweblcd d:\myweb
put index.htmlput index.html ( (mput *.htmlmput *.html))
quitquit Your homepage is now accessible fromYour homepage is now accessible from
http://vega.icu.ac.kr/~youridhttp://vega.icu.ac.kr/~yourid
July 2, 2004 9 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University
Connections Between Connections Between Web Clients and ServersWeb Clients and Servers
A Web BrowserA Web Server
Listen
80
Accept
A Web server is a daemon process that executes in the background waiting for some event to occur
ProcessProcess
ReturnReturn
Connect
Write
Read
July 2, 2004 10 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University
SocketsSockets
A Web BrowserA Web Server
Listen
80
Accept
ProcessProcess
ReturnReturn
Connect
Write
Read
SocketsSockets
A socket is an end point for communication between two machines
A socket is an association of a protocol, address and process to an end point of communication
July 2, 2004 11 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University
Accessing Web Contents from Accessing Web Contents from Java Programs via SocketsJava Programs via Sockets
import java.net.*;import java.net.*;import java.io.*;import java.io.*;……Socket sk = Socket sk = new Socket(new Socket(www.icu.ac.krwww.icu.ac.kr, 80), 80);;OutputStream os = sk.getOutputStream();OutputStream os = sk.getOutputStream();PrintWriter pw = new PrintWriter(os);PrintWriter pw = new PrintWriter(os);pw.println("GET pw.println("GET /index.html/index.html");");pw.println();pw.println();pw.flush();pw.flush();InputStream is = sk.getInputStream();InputStream is = sk.getInputStream();InputStreamReader ips = new InputStreamReader(is);InputStreamReader ips = new InputStreamReader(is);BufferedReader in = new BufferedReader(ips);BufferedReader in = new BufferedReader(ips);String line;String line;while ((line=while ((line=in.readLine()in.readLine()) != null) {) != null) {
System.out.println(line);System.out.println(line);}}
Socket Creation
Write Request
Read Results
July 2, 2004 12 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University
Accessing Web Contents from Java Accessing Web Contents from Java Programs via URL ConnectionsPrograms via URL Connections
import java.net.*;import java.net.*;import java.io.*;import java.io.*;……URL url = URL url = new URL(“http://www.icu.ac.kr”)new URL(“http://www.icu.ac.kr”);;URLConnection urlc = URLConnection urlc = url.openConnection()url.openConnection();;InputStream is = urlc.getInputStream();InputStream is = urlc.getInputStream();InputStreamReader ips = new InputStreamReader(is);InputStreamReader ips = new InputStreamReader(is);BufferedReader in = new BufferedReader(ips);BufferedReader in = new BufferedReader(ips);String line;String line;while ((line=while ((line=in.readLine()in.readLine()) != null) {) != null) {
System.out.println(line);System.out.println(line);}}
URL Object Creation
URL Connection Creation
Read Results
July 2, 2004 13 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University
Java String Manipulation Java String Manipulation Methods for Result ParsingMethods for Result Parsing
int int indexOfindexOf(String str, int fromIndex)(String str, int fromIndex) int int lastIndexOflastIndexOf(String str, int fromIndex)(String str, int fromIndex) boolean boolean startsWithstartsWith(String prefix)(String prefix) boolean boolean endsWithendsWith(String suffix)(String suffix) boolean boolean matchesmatches(String regex)(String regex) String[] String[] splitsplit(String regex)(String regex) String String substringsubstring(int begineIndex, int endIndex)(int begineIndex, int endIndex) String String toLowerCasetoLowerCase()() String String toUpperCasetoUpperCase()()
http://java.sun.com/j2se/1.4.2/docs/api/index.html
July 2, 2004 14 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University
Web Wrapper for Naver.comWeb Wrapper for Naver.com
URL SummaryTitle
July 2, 2004 15 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University
Result Parsing StrategiesResult Parsing Strategies
Structure-based ParsingStructure-based Parsing Analyzes Web pages based on Analyzes Web pages based on tag tag
hierarchieshierarchies Cannot be used for Cannot be used for ill-formed HTML ill-formed HTML
documentsdocuments Pattern-based ParsingPattern-based Parsing
Search for a Search for a unique string patternunique string pattern to locate a to locate a result itemresult item
Needs to identify such unique string patterns Needs to identify such unique string patterns firstfirst
July 2, 2004 16 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University
Structure-based Result ParsingStructure-based Result Parsing
July 2, 2004 17 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University
Pattern-based Result ParsingPattern-based Result Parsing
1.1. Find out a Find out a unique pattern to locate a unique pattern to locate a result itemresult item
e.g., “e.g., “<tr><td><font<tr><td><font” in the Naver result ” in the Naver result pagespages
2.2. Find the Find the prefix and suffix patternsprefix and suffix patterns to to extract an information piece (e.g., URL, extract an information piece (e.g., URL, title, summary) from the result itemtitle, summary) from the result item
e.g., “e.g., “a href=a href=” to extract a URL from a result ” to extract a URL from a result lineline
July 2, 2004 18 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University
Java Implementation of Web WrapperJava Implementation of Web Wrapper
public void WebWrapper(String host, String path, String query, int startIndex, int pageSize) {public void WebWrapper(String host, String path, String query, int startIndex, int pageSize) {try {try { String address = "http://" + host + path + "?where=webkr" + "&query=" + query + String address = "http://" + host + path + "?where=webkr" + "&query=" + query +
"&start=" + startIndex + "1" + “&display=" + pageSize;"&start=" + startIndex + "1" + “&display=" + pageSize; URL url = new URL url = new URLURL(address);(address); URLConnection urlc = url.URLConnection urlc = url.openConnection()openConnection();; urlc.setRequestProperty("Accept", "*/*");urlc.setRequestProperty("Accept", "*/*"); urlc.setRequestProperty("User-Agent", "Mozilla/4.0");urlc.setRequestProperty("User-Agent", "Mozilla/4.0"); InputStream is = urlc.InputStream is = urlc.getInputStream()getInputStream();; InputStreamReader ips = new InputStreamReader(is);InputStreamReader ips = new InputStreamReader(is); BufferedReader in = new BufferedReader(ips);BufferedReader in = new BufferedReader(ips); String line;String line; while ((line=in.readLine()) != null) {while ((line=in.readLine()) != null) {
////System.out.println(line);System.out.println(line);////
}}} catch(Exception e) {} catch(Exception e) { e.printStackTrace();e.printStackTrace();}}
}}
Parsing Results
Query Translation
July 2, 2004 19 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University
Web Web RobotsRobots
A A Web robotWeb robot is a program is a program (agent) (agent) that that collects collects information whileinformation while follow followinging all the links on all the links on a a WebWeb page page
Web Robots = Web Robots = CrawlerCrawlerss = = SpiderSpiderss Web search engines use Web robots to collect Web search engines use Web robots to collect
and index Web documentsand index Web documents A tag to tell Web robots not to index a page: A tag to tell Web robots not to index a page:
<<metameta namename=“=“robotsrobots" " contentcontent=“=“noindex,nofollow”/noindex,nofollow”/>> Crawling methodsCrawling methods::
BreadthBreadth-f-firstirst crawling crawling DepthDepth-f-firstirst crawling crawling
July 2, 2004 20 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University
Breadth First CrawlersBreadth First Crawlers
http://ibook.ics.uci.edu/Slides/39
July 2, 2004 21 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University
Depth First CrawlersDepth First Crawlers
http://ibook.ics.uci.edu/Slides/39
July 2, 2004 22 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University
For each map layer displayed, get the set of place names and classify the documents based on the place names
For each map layer displayed, get the set of place names and classify the documents based on the place names
Classify documents based on the disaster types mentionedClassify documents based on the disaster types mentioned
Cross-product between place names and the disaster-type categories
Cross-product between place names and the disaster-type categories
Plot the document clusters on the map to figure out the major flooding areasPlot the document clusters on the map to figure out the major flooding areas
An Web document collection about ‘China disasters’An Web document collection about ‘China disasters’
Web-based Information Management Web-based Information Management Applications (Example Scenario)Applications (Example Scenario)
Identify Recurring Disaster Areas in China, e.g. Locations of FloodsIdentify Recurring Disaster Areas in China, e.g. Locations of Floods
July 2, 2004 23 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University
Keyword Editor Keyword
Extractor
Search Engines
Place Name Generator
Place Name Extractor
Product Categories
Mapping Clusters
Pipelined components
: Sequential connection
: Pipelined connection
Generate multiple sets of place names
Web-based Information Management Web-based Information Management Applications (Example App. Design)Applications (Example App. Design)
July 2, 2004 24 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University
Problems in Composing Large-scale Problems in Composing Large-scale Information Management ApplicationsInformation Management Applications
Time-consuming to Time-consuming to explore and testexplore and test a large number a large number of optionsof options Hard to choose appropriate services for collectionsHard to choose appropriate services for collections Hard to quickly substitute and test a service within a Hard to quickly substitute and test a service within a
sequence of stepssequence of steps Difficulties of Difficulties of capturing and reusingcapturing and reusing shared patterns shared patterns
of information management stepsof information management steps Difficult to record and recurrently perform information Difficult to record and recurrently perform information
management stepsmanagement steps Necessity of extracting abstract patterns of information Necessity of extracting abstract patterns of information
management steps and reusing themmanagement steps and reusing them Hard to cope with dynamic aspects of Web resourcesHard to cope with dynamic aspects of Web resources
July 2, 2004 25 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University
Characteristics of Large-scale Characteristics of Large-scale Information Management TasksInformation Management Tasks
Incremental developmentIncremental development of information of information management steps for an abstract task goalmanagement steps for an abstract task goal
Recurrent executionsRecurrent executions of the steps of the steps Evolving requirementsEvolving requirements of users of users Shared patternsShared patterns of management steps of management steps Collection-basedCollection-based information processing information processing Dynamic aspectsDynamic aspects of information sources and of information sources and
servicesservices Large and growing number of Large and growing number of component component
servicesservices
July 2, 2004 26 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University
Improvement Improvement GoalsGoals
Significantly reduce construction time, keeping Significantly reduce construction time, keeping costs lowcosts low
Enable very rapid construction/adaptation of Enable very rapid construction/adaptation of new applicationsnew applications
Provide static and run-time diagnostic tools, Provide static and run-time diagnostic tools, facilitating debugging and performance tuning facilitating debugging and performance tuning taskstasks
Rapid Composition and Reconfiguration Rapid Composition and Reconfiguration of Large-scale Custom Applicationsof Large-scale Custom Applications
July 2, 2004 27 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University
JavaScriptJavaScript The goal of JavaScript is to provide The goal of JavaScript is to provide programming programming
capabilitycapability at both the at both the client and serverclient and server ends of a ends of a Web connectionWeb connection
Originally developed by Netscape, as Originally developed by Netscape, as LiveScriptLiveScript Became a joint venture of Netscape and Sun inBecame a joint venture of Netscape and Sun in
1995, renamed JavaScript1995, renamed JavaScript Now standardized by the European ComputerNow standardized by the European Computer
MManufacturers Association as ECMA-262anufacturers Association as ECMA-262 (also (also ISO 16262)ISO 16262)
User interactions with HTML documents inUser interactions with HTML documents in JavaScript use the JavaScript use the event-driven modelevent-driven model of of computationcomputation
July 2, 2004 28 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University
<html><html> <head><title>ICE1338</title><head><title>ICE1338</title> <style type = "text/css"><style type = "text/css"> <!--<!-- p { font-size: 12pt; color: blue; background-color: yellow }p { font-size: 12pt; color: blue; background-color: yellow }
h2, h3 { font-size: 16pt; color: red; font-style: oblique }h2, h3 { font-size: 16pt; color: red; font-style: oblique } -->--> </style></style> <script language = "JavaScript"><script language = "JavaScript"> function displayDate() {function displayDate() { alert("Today's date is: " + alert("Today's date is: " +
new Date() + "!!");new Date() + "!!"); }} </script></script> </head></head> <body <body onLoad="displayDate()"onLoad="displayDate()">>
<br/><br/><h2>Programming for WWW</h2><h2>Programming for WWW</h2>
A Popup WindowA Popup Window
July 2, 2004 29 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University
JavaScript vs. JavaJavaScript vs. Java
Both share similar syntaxBoth share similar syntax JavaScript is a JavaScript is a scripting languagescripting language, not a , not a
programming languageprogramming language JavaScript is an JavaScript is an interpreter-basedinterpreter-based language language JavaScript is JavaScript is dynamically typeddynamically typed JavaScript dJavaScript does not support class-based oes not support class-based
inheritanceinheritance JavaScripts are usually embedded in HTML JavaScripts are usually embedded in HTML
documentsdocuments
July 2, 2004 30 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University
General Syntax of JavaScriptGeneral Syntax of JavaScript
Direct embedding of a JavaScript code:Direct embedding of a JavaScript code:<script language = "JavaScript"><script language = "JavaScript">
-- JavaScript script –-- JavaScript script –</script></script>
Indirect JavaScript specification:Indirect JavaScript specification:<script language = "JavaScript" src = "myScript.js“<script language = "JavaScript" src = "myScript.js“//>>
Identifier form: begin with a letter or underscore,Identifier form: begin with a letter or underscore, followed by any number of letters, underscores, followed by any number of letters, underscores, and digitsand digits
Case sensitiveCase sensitive 25 reserved words, plus future reserved words25 reserved words, plus future reserved words Comments: both Comments: both //// and and /* … *//* … */
July 2, 2004 31 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University
Document Object Model HTMLDocument Object Model HTML
““A platform- and language-neutral interface that allows A platform- and language-neutral interface that allows programsprograms and and scriptsscripts to dynamically access and update to dynamically access and update the the contentcontent, , structurestructure and and stylestyle of documents” of documents”
<html><html><head><head> <title>My Document</title><title>My Document</title></head></head><body><body> <h1><h1>HeaderHeader</h1></h1> <p>Paragraph</p><p>Paragraph</p></body></body></html></html>
http://www.mozilla.org/docs/dom/technote/intro/
var header = document.getElementsByTagName("H1").item(0);var header = document.getElementsByTagName("H1").item(0);
header.firstChild.data = "A dynamic document";header.firstChild.data = "A dynamic document";
July 2, 2004 32 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University
DOM SpecificationDOM Specification http://www.w3.org/TR/DOM-Level-2-HTML/html.htmlhttp://www.w3.org/TR/DOM-Level-2-HTML/html.html e.g.,e.g.,
July 2, 2004 33 Programming for WWW (Lecture#4) In-Young Ko, Information Communications University
Screen OutputsScreen Outputs The model for the browser The model for the browser display display
windowwindow is the is the WindowWindow object object Properties:Properties:
window.documentwindow.document window.screenLeftwindow.screenLeft window.screenTopwindow.screenTop ……
Methods:Methods: alert: alert: confirmconfirm promptprompt
http://devedge.netscape.com/central/javascript/