cit 383: administrative scripting

18
CIT 383: Administrative Scripting Slide #1 CIT 383: Administrative Scripting XML

Upload: sunee

Post on 06-Jan-2016

22 views

Category:

Documents


0 download

DESCRIPTION

XML. CIT 383: Administrative Scripting. Topics. What is XML? XML Structure REXML. eXtensible Markup Language. Extensible descriptive markup language framework Began as subset of Standard Generalized Markup Language (SGML). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CIT 383: Administrative Scripting

CIT 383: Administrative Scripting Slide #1

CIT 383: Administrative Scripting

XML

Page 2: CIT 383: Administrative Scripting

CIT 383: Administrative Scripting

Topics

1. What is XML?

2. XML Structure

3. REXML

Page 3: CIT 383: Administrative Scripting

CIT 383: Administrative Scripting

eXtensible Markup LanguageExtensible descriptive markup language framework

– Began as subset of Standard Generalized Markup Language (SGML).

– To ensure that data remains available after programs that originally created/read it become obsolete or unusable.

<?xml version="1.0" encoding="UTF-8"?><inventory>

<book isbn=“0976694042”><author>Chris Pine</author><title>Learn to Program</title>

</book></inventory>

Page 4: CIT 383: Administrative Scripting

CIT 383: Administrative Scripting

Descriptive vs Presentational

Presentational describe how documents should look<b>text</b> turns on boldface for text

What if you want to change book titles from bold to italics?

Replace won’t work if items other than books are bold.

Descriptive languages focus on the meaning<title>xml and you</title>

Stylesheets describe how to present logical items.

Can just be used for data storage, interchange.

A/K/A logical or structural markup languages.

Page 5: CIT 383: Administrative Scripting

CIT 383: Administrative Scripting

XML-based Languages

• Ant

• Atom

• CML

• MathML

• MML

• MusicXML

• ODF

• OPML

• RDF

• SAML

• SOAP

• SVG

• VoiceXML

• WML

• XHTML

• XUL

Page 6: CIT 383: Administrative Scripting

CIT 383: Administrative Scripting

Evolution of XML

1986 SGML standard published as ISO 8879

1987 Unicode proposal published

1991 First volume of Unicode standard

1996 XML work started

1998 XML 1.0 released as a W3C standard

2001 XML Schema language

2004 XML 1.1 released (not widely used)

2007 Unicode 5.0 published

Page 7: CIT 383: Administrative Scripting

CIT 383: Administrative Scripting

XML Tree Structure<todo>

<title>Monday’s List</title><item>Study for midterm</item><item><priority=10/>Scripting Class</item><item>Bathe cat</item>

</html>

todo

titleTuesday’s List

itemScripting Class

itemBathe Cat

itemStudy for midterm

priority10

Page 8: CIT 383: Administrative Scripting

CIT 383: Administrative Scripting

Elements and Attributes

An element consists of tags and contents<title>Learn to Program</title>

Begin and end tags are mandatory.

<isbn number=“0976694042” />

Attributesnumber=“0976694042”

Elements may have zero or more attributes.

Attribute values must always be quoted.

Page 9: CIT 383: Administrative Scripting

CIT 383: Administrative Scripting

Text

XML declaration specifies character encoding<?xml version="1.0" encoding="UTF-8"?>

EncodingsUnicode: universal character set, UTF-8, UTF-32ISO-8859: 8-bit encodings, 8859-1 is West Europe

Entities&#nnnn; encodes specified Unicode character&name; are named character entities, such as

&lt; is <&gt; is >&amp; is &currency symbols, fractions, Greek letters, math symbols, etc.

Page 10: CIT 383: Administrative Scripting

CIT 383: Administrative Scripting

XML Syntax Rules

1. There is one and only one root tag.

2. Begin tags must be matched by an end tag.

3. XML tags must be properly nested.

4. XML tags are case sensitive.

5. All attribute values must be quoted.

6. Whitespace within tags is part of text.

7. Newlines are always stored as LF.

8. HTML-style comments: <!-- comment -->

Page 11: CIT 383: Administrative Scripting

CIT 383: Administrative Scripting

Correctness

Well-formed– Conforms to XML syntax rules.– A conforming parser will not parse documents

that are not well-formed.

Valid– Conforms to XML semantics rules as defined in

• Document Type Definition (DTD)• XML Schema

– A validating parser will not parse invalid documents.

Page 12: CIT 383: Administrative Scripting

CIT 383: Administrative Scripting

XML Schema Languages

Document Type Definitions Inherited from SGML.No support for all XML.

XML SchemaMost commonly used.Schemas are XML docs.A/K/A WXS, XSD

RELAX NGREgular LAnguage forXML Next GenerationXML and non-XML forms.

<?xml version="1.0" encoding="utf-8" ?>

<xs:schema elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name="Address">

<xs:complexType>

<xs:sequence>

<xs:element name="Recipient" type="xs:string" />

<xs:element name="House" type="xs:string" />

<xs:element name="Street" type="xs:string" />

<xs:element name="Town" type="xs:string" />

<xs:element minOccurs="0" name="County" type="xs:string" />

<xs:element name="PostCode" type="xs:string" />

<xs:element name="Country">

<xs:simpleType>

<xs:restriction base="xs:string">

<xs:enumeration value="FR" /> <xs:enumeration value="DE" /> <xs:enumeration value="ES" /> <xs:enumeration value="UK" /> <xs:enumeration value="US" />

</xs:restriction>

</xs:simpleType>

</xs:element>

</xs:sequence>

</xs:complexType>

</xs:element>

</xs:schema>

Page 13: CIT 383: Administrative Scripting

CIT 383: Administrative Scripting

Ruby XML Parsers

REXML: Ruby Electric XML– Standard with the ruby language.

– Slow on large documents.

libxml-ruby– Ruby bindings for Gnome libxml2 XML toolkit.

– Very fast (30X as fast as REXML).

HPricot– Parses XML as well as HTML.

– Fast (3-4X as fast as REXML).

– Does not check for well-formedness or validity.

Page 14: CIT 383: Administrative Scripting

CIT 383: Administrative Scripting

Types of Parsing

Tree Parsing (DOM-like)– Good for small documents.– Loads entire document into memory.– Simple API

Stream Parsing (SAX-like)– Good for large documents.– User defines callback methods, passes to API.– Parser runs callback methods on pattern match.

Page 15: CIT 383: Administrative Scripting

CIT 383: Administrative Scripting

Tree Parsing

Loads entire XML doc into memory.require ‘rexml/document’

include REXML

input = File.new(‘data.xml’)

doc = Document.new(input)

root = doc.root

Search document as a tree using XPathdoc.elements.each(“ch/section”) do |e|

puts e.attributes[“title”]

end

Page 16: CIT 383: Administrative Scripting

CIT 383: Administrative Scripting

Stream ParsingDefine listener class.

class MyListener include REXML::StreamListener def tag_start(*args) puts “start: #{args.map {|x| x.inspect}.join(‘,’”

endend

Invoke parserrequire ‘rexml/document’require ‘rexml/streamlistener’include REXMLlisten = MyListener.newsource = File.new(‘data.xml’)Document.parse_stream(source, listen)

Page 17: CIT 383: Administrative Scripting

CIT 383: Administrative Scripting

XPath Searches

h.search("p")Find all paragraph tags in document.

doc.search("/html/body//p")Find all paragraph tags within the body tag.

doc.search("//a[@src]") Find all anchor tags with a src attribute.

doc.search("//a[@src='google.com']") Find all a tags with a src attribute of google.com.

Page 18: CIT 383: Administrative Scripting

CIT 383: Administrative Scripting Slide #18

References1. Michael Fitzgerald, Learning Ruby, O’Reilly,

2008.2. David Flanagan and Yukihiro Matsumoto, The

Ruby Programming Language, O’Reilly, 2008.3. Hal Fulton, The Ruby Way, 2nd edition, Addison-

Wesley, 2007.4. Robert C. Martin, Clean Code, Prentice Hall,

2008.5. Dave Thomas with Chad Fowler and Andy Hunt,

Programming Ruby, 2nd edition, Pragmatic Programmers, 2005.