structured documents
DESCRIPTION
Structured Documents. Week 3 LBSC 690 Information Technology. Outline. Muddiest points Building the Web Building a better Web. Muddiest Points. Encryption Packet vs. circuit switching The TCP/IP “protocol stack”. Encryption. Secret-key systems (e.g., DES) - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Structured Documents](https://reader036.vdocuments.net/reader036/viewer/2022070404/56813b59550346895da44d6a/html5/thumbnails/1.jpg)
Structured Documents
Week 3
LBSC 690
Information Technology
![Page 2: Structured Documents](https://reader036.vdocuments.net/reader036/viewer/2022070404/56813b59550346895da44d6a/html5/thumbnails/2.jpg)
Outline
• Muddiest points
• Building the Web
• Building a better Web
![Page 3: Structured Documents](https://reader036.vdocuments.net/reader036/viewer/2022070404/56813b59550346895da44d6a/html5/thumbnails/3.jpg)
Muddiest Points
• Encryption
• Packet vs. circuit switching
• The TCP/IP “protocol stack”
![Page 4: Structured Documents](https://reader036.vdocuments.net/reader036/viewer/2022070404/56813b59550346895da44d6a/html5/thumbnails/4.jpg)
Encryption
• Secret-key systems (e.g., DES)– Use the same key to encrypt and decrypt
• Public-key systems (e.g., PGP, PKI)– Public key: open, for encryption– Private key: secret, for decryption
• Digital signatures– Encrypt with private key, decrypt with public key
![Page 5: Structured Documents](https://reader036.vdocuments.net/reader036/viewer/2022070404/56813b59550346895da44d6a/html5/thumbnails/5.jpg)
Encryption Issues
• Key length– 128 bits balances speed and protection today
• Trust infrastructure– How do you prevent “bait and switch”?– Who certifies a digital signature is valid?
![Page 6: Structured Documents](https://reader036.vdocuments.net/reader036/viewer/2022070404/56813b59550346895da44d6a/html5/thumbnails/6.jpg)
Encrypted Applications
• Secure Shell (SSH)– Replaces Telnet
• Secure FTP (SFTP)/Secure Copy (SCP)– Replaces FTP
• Secure HTTP (HTTPS)– Used for financial and other private data
• Wired Equivalent Protocol (WEP)– Used on wireless networks
![Page 7: Structured Documents](https://reader036.vdocuments.net/reader036/viewer/2022070404/56813b59550346895da44d6a/html5/thumbnails/7.jpg)
Packet vs. Circuit Networks
• Telephone system (“circuit-switched”)– Fixed connection between caller and called– High network load results in busy signals
• Internet (“packet-switched”)– Each transmission is routed separately– High network load results in long delays
![Page 8: Structured Documents](https://reader036.vdocuments.net/reader036/viewer/2022070404/56813b59550346895da44d6a/html5/thumbnails/8.jpg)
Packet Switching
• Break long messages into short “packets”– Keeps one user from hogging a line
• Route each packet separately– Number them for easy reconstruction
• Request retransmission for lost packets– Unless the first packet is lost!
![Page 9: Structured Documents](https://reader036.vdocuments.net/reader036/viewer/2022070404/56813b59550346895da44d6a/html5/thumbnails/9.jpg)
The TCP/IP “Protocol Stack”
• Link layer moves bits – Ethernet, cable modem, DSL
• Network layer moves packets– IP
• Transport layer provides services to applications– UDP, TCP
• Application layer uses those services– DNS, FTP, SSH, …
![Page 10: Structured Documents](https://reader036.vdocuments.net/reader036/viewer/2022070404/56813b59550346895da44d6a/html5/thumbnails/10.jpg)
TCP/IP layer architecture
Link Link Link Link Link Link
Network Network Network Network
Transport Transport
Application Application
Virtual link for packets
Virtual link for end to end packets
Virtual network service
Link for bits Link for bits Link for bits
![Page 11: Structured Documents](https://reader036.vdocuments.net/reader036/viewer/2022070404/56813b59550346895da44d6a/html5/thumbnails/11.jpg)
The World-Wide Web
Send Request
Page Requested
Fetch Page
Proxy Server
Local copy ofPage requested
Remote Sever
My Browser
Internet
![Page 12: Structured Documents](https://reader036.vdocuments.net/reader036/viewer/2022070404/56813b59550346895da44d6a/html5/thumbnails/12.jpg)
Web Standards
• HTML– How to write and interpret the information
• URL– Where to find it
• HTTP– How to get it
![Page 13: Structured Documents](https://reader036.vdocuments.net/reader036/viewer/2022070404/56813b59550346895da44d6a/html5/thumbnails/13.jpg)
Uniform Resource Locator (URL)
• Uniquely identify web pages on the WWW– Domain name– Directory path– File name
URL: http://www.clis.umd.edu/courses/schedules/fall2003.html
Domain name
Directory path
File name
![Page 14: Structured Documents](https://reader036.vdocuments.net/reader036/viewer/2022070404/56813b59550346895da44d6a/html5/thumbnails/14.jpg)
HyperText Markup Language (HTML)
• Simple document structure language for Web
• Advantages– Adapts easily to different display capabilities– Widely available display software (browsers)
• Disadvantages– Does not directly control layout
![Page 15: Structured Documents](https://reader036.vdocuments.net/reader036/viewer/2022070404/56813b59550346895da44d6a/html5/thumbnails/15.jpg)
Hands On:Learning HTML From Examples
• Use Internet Explorer to find a page you like– http://www.umiacs.umd.edu/~daqingd/simplepage.html
– http://www.glue.umd.edu/~oard
• On the “View” menu select “Source”– Opens a notepad window with the source
• Compare HTML source with the Web page– Observe how each effect is achieved
![Page 16: Structured Documents](https://reader036.vdocuments.net/reader036/viewer/2022070404/56813b59550346895da44d6a/html5/thumbnails/16.jpg)
Hands On: “Adopt” a Web Page• Modify the HTML source using notepad
– For example, change the page to yours
• Save the HTML source on your “M:” drive– In the “File” menu, select “Save As”– Select “All Files” and name it “test.html”
• FTP it to your ~/pub directory on WAM– ftp wam.umd.edu– cd ../pub/– put test.html
• View it– http://www.wam.umd.edu/~(yourlogin)/test.html
![Page 17: Structured Documents](https://reader036.vdocuments.net/reader036/viewer/2022070404/56813b59550346895da44d6a/html5/thumbnails/17.jpg)
HTML Document Structure
• “Tags” mark structure– <html>a document</html>– <ol>an ordered list</ol>– <i>something in italics</i>
• Tag name in angle brackets <>– Not case sensitive
• Open/Close pairs– Close tag may be optional (if unambiguous)
![Page 18: Structured Documents](https://reader036.vdocuments.net/reader036/viewer/2022070404/56813b59550346895da44d6a/html5/thumbnails/18.jpg)
Logical Structure Tags
• Head – Title
• Body– Headers: <h1> <h2> <h3> <h4> <h5> – Lists: <ol>, <ul> (can be nested)– Paragraphs:<p>– Definitions: <dt><dd>– Tables: <table> <tr> <td> </td> </tr> </table>– Role: <cite>, <address>, <strong>, …
![Page 19: Structured Documents](https://reader036.vdocuments.net/reader036/viewer/2022070404/56813b59550346895da44d6a/html5/thumbnails/19.jpg)
Rendering
• Different devices have different capabilities– Desktop – PDA
• Rendering maps logical tags to physical layout– Controls line wrap, size, font…
• Place the title in the page border
• Render <h1> as 24pt Times
• Render <strong> as bold
• Somewhat browser-dependent– Internet Explorer and Netscape make different choices
![Page 20: Structured Documents](https://reader036.vdocuments.net/reader036/viewer/2022070404/56813b59550346895da44d6a/html5/thumbnails/20.jpg)
Physical Structure Tags
• Font– Typeface: <font face=“Arial”></font>– Size: <font size=“+1”></font>– Color: <font color=“990000”></font>
• http://www.barasch.com/excel/colorfonts.htm
• Emphasis– Bold: <b></b>– Italics: <i></i>
![Page 21: Structured Documents](https://reader036.vdocuments.net/reader036/viewer/2022070404/56813b59550346895da44d6a/html5/thumbnails/21.jpg)
Hypertext “Anchors”
• Links make the Web a web!
• Internal anchors: somewhere on the same page– <a href=“#students”> Students</a>
• Links to: <a name=“students”>Student Information</a>
• External anchors: to another page – <a href=“http://www.clis.umd.edu”>CLIS</a>
– <a href=“http://www.clis.umd.edu#students”>CLIS students</a>
![Page 22: Structured Documents](https://reader036.vdocuments.net/reader036/viewer/2022070404/56813b59550346895da44d6a/html5/thumbnails/22.jpg)
Images• <img src=“URL”> or <img src=“path/file”>
– <img src=“http://www.clis.umd.edu/IMAGES/head.gif”>– SRC: can be url or path/file– ALT: a text string– ALIGN: position of the image– WIDTH and HEIGHT: size of the image
• Can use as anchor:– <a href=URL><img src=URL2></a>
• Example: – http://www.umiacs.umd.edu/~daqingd/Image-Alignment.html
![Page 23: Structured Documents](https://reader036.vdocuments.net/reader036/viewer/2022070404/56813b59550346895da44d6a/html5/thumbnails/23.jpg)
Tables<table align=“center”>
<caption align=“right”>The caption</caption>
< tr align=“LEFT”>
<th> Header1 </th>
<th> Header2</th>
</tr>
<tr><td>first row, first item </td>
<td>first row, second item</td></tr>
< tr><td>second row, first item</td>
<td>second row, second item</td></tr>
</table>Example: http://www.umiacs.umd.edu/~daqingd/Simple-Table.html
![Page 24: Structured Documents](https://reader036.vdocuments.net/reader036/viewer/2022070404/56813b59550346895da44d6a/html5/thumbnails/24.jpg)
Frames
• Divide browser pages into separate sections – Useful when you want to scroll separately
• Each section can display an HTML page
• Example 1: menu frame on the left side of a page<frameset cols=“10%,90%" >
<frame src=“template.html"> <frame src=“images.html">
</frameset>
• Example 2:– http://www.scms.rgu.ac.uk/research/ir/members.html
![Page 25: Structured Documents](https://reader036.vdocuments.net/reader036/viewer/2022070404/56813b59550346895da44d6a/html5/thumbnails/25.jpg)
Designing Web Pages
• Key design issues:– Content: What do you want to publish?– Style: How do you want to present it?– Syntax: How can you achieve that presentation?
• Sources of information– Online tutorials (Yahoo points to lots of these)– Technical materials (e.g., the HTML 4.0 spec)
![Page 26: Structured Documents](https://reader036.vdocuments.net/reader036/viewer/2022070404/56813b59550346895da44d6a/html5/thumbnails/26.jpg)
Some Style Guidelines
• Design for generic browsers– And test on every version you wish to support ?
• Provide appropriate “access points”– User needs and navigation strategies differ
• Design useful navigational aids– A Web search may lead to the middle of a site
• Include some indication of currency– Date of last update, “new” icons, etc.
• Indicate who is responsible for the content– Helps readers assess authority
![Page 27: Structured Documents](https://reader036.vdocuments.net/reader036/viewer/2022070404/56813b59550346895da44d6a/html5/thumbnails/27.jpg)
Accessibility Guidelines
• Design for device independence• Maintain backward compatibility
– Provide alternative pages if necessary
• Provide alternatives for aural and visual content– Alt tags for images, transcripts for audio
• Make is easy for assistive devices to work– Combine structural markup and style sheets– Give a title to each frame– Use HTML tables only for tabular data– Use markup to indicate language switching
![Page 28: Structured Documents](https://reader036.vdocuments.net/reader036/viewer/2022070404/56813b59550346895da44d6a/html5/thumbnails/28.jpg)
HTML Editors
• Goal is to create Web pages, not learn HTML!• Several are available
– In Explorer, “File” - “Edit with Front Page”– In Netscape, “File” – “Edit Page” for Composer
• You may still need to edit the HTML file– Some editors use browser-specific features– Some HTML features may be missing entirely– File names may be butchered by FTP
• Tend to use physical layout tags extensively– Detailed control can make hand-editing difficult
![Page 29: Structured Documents](https://reader036.vdocuments.net/reader036/viewer/2022070404/56813b59550346895da44d6a/html5/thumbnails/29.jpg)
HTML Validators
• Syntax checking: cross-browser compatibility– http://validator.w3.org
• Style checking: improved accessibility– http://bobby.watchfire.com
![Page 30: Structured Documents](https://reader036.vdocuments.net/reader036/viewer/2022070404/56813b59550346895da44d6a/html5/thumbnails/30.jpg)
What’s Wrong with the Web?
• HTML– Confounds structure and appearance (XML)
• HTTP– Can’t recognize related transactions (Cookies)
• URL– Links breaks when you move a file (PURL)
![Page 31: Structured Documents](https://reader036.vdocuments.net/reader036/viewer/2022070404/56813b59550346895da44d6a/html5/thumbnails/31.jpg)
Discussion Point: Describing the Structure of Text
• Entities– Span– Type/Attributes
• Relationships– Part-whole– Is-a
![Page 32: Structured Documents](https://reader036.vdocuments.net/reader036/viewer/2022070404/56813b59550346895da44d6a/html5/thumbnails/32.jpg)
What’s a Document?
• Content
• Structure
• Appearance
• Behavior
![Page 33: Structured Documents](https://reader036.vdocuments.net/reader036/viewer/2022070404/56813b59550346895da44d6a/html5/thumbnails/33.jpg)
History of Structured Documents
• Early standards were “typesetting languages”– NROFF, TeX, LaTeX, SGML
• HTML was developed for the Web– Too specialized for other uses
• Specialized standards met other needs– Change tracking in Word, annotating manuscripts, …
• XML seeks to unify these threads– One standard format for printing, viewing, processing
![Page 34: Structured Documents](https://reader036.vdocuments.net/reader036/viewer/2022070404/56813b59550346895da44d6a/html5/thumbnails/34.jpg)
Goals of XML
• Meta language – A toolkit for design markup languages
• Unambiguous markup– Clear span of tags
• Separate markup from presentation– Style info => stylesheet, so easy to change
• Be simple
![Page 35: Structured Documents](https://reader036.vdocuments.net/reader036/viewer/2022070404/56813b59550346895da44d6a/html5/thumbnails/35.jpg)
A Family of Standards
• Definition: DTD– Names known types of entities with “labels”– Defines part-whole and is-a relationships
• Markup: XML– “Tags” regions of text with labels
• Markup: XLink– Defines “hypertext” (and other) link relationships
• Presentation: XSL– Specifies how each type of entity should be “rendered”
![Page 36: Structured Documents](https://reader036.vdocuments.net/reader036/viewer/2022070404/56813b59550346895da44d6a/html5/thumbnails/36.jpg)
XML Example
• View “The Song of the Wandering Aengus” – http://glue.umd.edu/~rba/COURSES/TECHNOLOGY/XML/DTD/
• Built from three files– yeats01.xml – poem01.dtd – poem01.xsl
![Page 37: Structured Documents](https://reader036.vdocuments.net/reader036/viewer/2022070404/56813b59550346895da44d6a/html5/thumbnails/37.jpg)
An XML Example<?xml version="1.0"?>
<!DOCTYPE POEM SYSTEM "poem01.dtd">
<?xml-stylesheet type="text/xsl" href="poem01.xsl"?>
<POEM>
<TITLE>The Song of Wandering Aengus</TITLE>
<AUTHOR> <FIRSTNAME>W.B.</FIRSTNAME>
<LASTNAME>Yeats</LASTNAME>
</AUTHOR>
<STANZA>
<LINE>I went on to the hazel wood,</LINE>
<LINEIN>Because a fire was in my head,</LINEIN>
<LINE>And cut and peeled a hazel wand,</LINE>
</STANZA>
</POEM>
![Page 38: Structured Documents](https://reader036.vdocuments.net/reader036/viewer/2022070404/56813b59550346895da44d6a/html5/thumbnails/38.jpg)
Document Type Definition (DTD)
<!ELEMENT poem ( (title, author, stanza)* )>
<!ELEMENT title (#PCDATA) >
<!ELEMENT author (firstname, lastname) >
<!ELEMENT firstname (#PCDATA) >
<!ELEMENT lastname (#PCDATA) >
<!ELEMENT stanza (line+ | linein+) >
<!ELEMENT line (#PCDATA) >
<!ELEMENT linein (#PCDATA) >#PCDATA span of text a,b a followed by ba|b either a or ba* 0 or more a’sa+ 1 or more a’s
![Page 39: Structured Documents](https://reader036.vdocuments.net/reader036/viewer/2022070404/56813b59550346895da44d6a/html5/thumbnails/39.jpg)
Specifying Appearance: XSL
<xsl:template match="POEM">
<HTML>
<BODY BGCOLOR="#FFFFCC">
<xsl:apply-templates/>
</BODY>
</HTML>
</xsl:template>
<xsl:template match="TITLE">
<H1>
<FONT COLOR="Green">
<xsl:value-of/>
</FONT>
</H1>
</xsl:template>
![Page 40: Structured Documents](https://reader036.vdocuments.net/reader036/viewer/2022070404/56813b59550346895da44d6a/html5/thumbnails/40.jpg)
An XLink Example……<poem xmlns:xlink="http://www.w3.org/1999/xlink"> <author xlink:href="yeatsRDFS3.xml“ xlink:type="simple">W. B. Yeats</author> <poems> <poem1 xlink:href="http://www.geocities.com/Athens/5379/yeats_index.html" xlink:type="simple">The Rose</poem1> <poem2 xlink:href="http://www.geocities.com/Athens/5379/yeats_index.html" xlink:type="simple">The Tower</poem2> </poems> </poem> ……….
![Page 41: Structured Documents](https://reader036.vdocuments.net/reader036/viewer/2022070404/56813b59550346895da44d6a/html5/thumbnails/41.jpg)
Some XML Applications
• Text Encoding Initiative– For adding annotation to historical manuscripts– http://www.tei-c.org/
• Encoded Archival Description– To enhance automated processing of finding aids– http://www.loc.gov/ead/
• Metadata Encoding and Transmission Standard– Bundles descriptive and administrative metadata– http://www.loc.gov/standards/mets/
![Page 42: Structured Documents](https://reader036.vdocuments.net/reader036/viewer/2022070404/56813b59550346895da44d6a/html5/thumbnails/42.jpg)
What’s Wrong with the Web?
• HTML– Confounds structure and appearance (XML)
• HTTP– Can’t recognize related transactions (Cookies)
• URL– Links breaks when you move a file (PURL)
![Page 43: Structured Documents](https://reader036.vdocuments.net/reader036/viewer/2022070404/56813b59550346895da44d6a/html5/thumbnails/43.jpg)
Cookies
• Servers know users by IP address and port– Because that’s where they send the Web pages
• Cookies preserve “state”– Server sends data to the browser– Browser later responds with the same data
• A unique code (server-side state)
• Information about the user (client-side state)
![Page 44: Structured Documents](https://reader036.vdocuments.net/reader036/viewer/2022070404/56813b59550346895da44d6a/html5/thumbnails/44.jpg)
Uniform Resource Names (URN)
• Persistent URLs (www.purl.org)– http://purl.oclc.org/OCLC/PURL/FAQ/
PURL Sever
MyBrowser
PURL
URL
Resource Sever
URL
Page
![Page 45: Structured Documents](https://reader036.vdocuments.net/reader036/viewer/2022070404/56813b59550346895da44d6a/html5/thumbnails/45.jpg)
Summary
• Learning to build simple Web pages is easy– Which is good news for the homework!
• All documents are structured documents
• XML is a flexible markup language toolkits
• The key is to understand its capabilities– XML editors can hide much of the complexity
![Page 46: Structured Documents](https://reader036.vdocuments.net/reader036/viewer/2022070404/56813b59550346895da44d6a/html5/thumbnails/46.jpg)
Before You Go!
• On a sheet of paper (no names), answer the following question:
What was the muddiest point in today’s class?