mhe mhe - the print2image2internet consultants combined xml, sgml issues william j. ‘bill’...
TRANSCRIPT
MHE
MHE - the print2image2Internet consultants
Combined XML, SGML Issues
William J. ‘Bill’ McCalpin
MIT, LIT, CDIA, EDP
AIIM 2002 - March 6, 2002
MHE
MHE - the print2image2Internet consultants
About MHE• MHE is the “print2image2Internet”
consulting firm
• MHE’s principals have nearly 40 years of experience in electronic print streams, in taking electronic print streams to imaging systems, and now in taking legacy information to the Internet
• See http://www.mhe-consulting.com
MHE
MHE - the print2image2Internet consultants
About the Speaker• William J. ‘Bill’ McCalpin is a principal at
MHE
• Mr. McCalpin was the first - and for years the only - person in the world to have the MIT, LIT, CDIA, and EDP designations
• Mr. McCalpin serves on the AIIM Accreditation Committee and AIIM Conference Committee
MHE
MHE - the print2image2Internet consultants
About the Speaker (cont.)
• Mr. McCalpin is on the Xplor Board of Directors and is Treasurer
• Mr. McCalpin recently completed a two-year stint as Xploration Editor-in-Chief
• Mr. McCalpin is a frequent speaker at both AIIM and Xplor
MHE
MHE - the print2image2Internet consultants
What Do You Say When They Ask You,
“When Are You Going To Support XML?”
MHE
MHE - the print2image2Internet consultants
But The Real Question Is, “Why Should I Support XML?”
MHE
MHE - the print2image2Internet consultants
Agenda• What is XML?• What do we do in “e-Business”?• When do you want to use XML?• The Right Way and the Wrong Way to use
XML• The Flow of Information• The XML Bubble• The answer to “when” and “why”
MHE
MHE - the print2image2Internet consultants
XML And SGML
• XML is eXtensible Markup Language
• XML is an instance of SGML, Standard Generalized Markup Language, an ISO standard (ISO 8879)
• XML is “extensible” because people and enterprises with common interests get together to define the tags which describe their data
MHE
MHE - the print2image2Internet consultants
XML and HTML
• HTML is a tagged language, but the tags are 40 or 50 “grammatical” tags like <p> or <h1>
• XML is a tagged language, and the tags are (usually) created and agreed to by “domains” or vertical industry segments. E.g. <account_number> or <city>
MHE
MHE - the print2image2Internet consultants
The ‘Document’
• A document is “an organized collection of information in time”
• A document contains information which can be understood by human or machine, and has validity at some period in time
• The information in a document can be organized in many ways - as text, bitmaps, print streams, tagged languages, etc.
MHE
MHE - the print2image2Internet consultants
The New Document
• Per this definition, the document– does not depend on which organization of the
information is used (so long as author and recipient agree)
– does not depend on the medium (paper, film, optical, magnetic or even parchment are all fine)
– does not have to have presentation information, because the recipient may be a machine
MHE
MHE - the print2image2Internet consultants
Three Parts of an XML ‘Document’
Tagged Data (in XML)
Presentation (in XSL or CSS)
Tag Definitions (in DTD or Schema)
MHE
MHE - the print2image2Internet consultants
The XML Document
• Data - data values bounded by XML tags
• Presentation:– CSS - Cascading Style Sheets, like for HTML– XSL - format information in XML
• Tag Definitions:– DTD - Document Type Definitions - old SGML
definition– Schema - definitions in XML
MHE
MHE - the print2image2Internet consultants
Data In the XML Document
• Data is the purpose of an XML document
• Each piece of data is specifically identified by a tag
• Data is organized because the tags match patterns in the DTD or Schema
• An example of data in XML:
MHE
MHE - the print2image2Internet consultants
Data Example in XML<AUTHOR> <NAME>William J. "Bill" McCalpin, EDPP, CDIA, MIT,
LIT</NAME> <JOBTITLE>Principal</JOBTITLE> <AFFILIATION>MHE</AFFILIATION> <ADDRESS> <STREET>1400 Cheyenne Dr.</STREET> <CITY>Richardson</CITY> <STATE>Texas</STATE> <ZIPCODE>75080</ZIPCODE> <EMAIL>[email protected]</EMAIL> </ADDRESS></AUTHOR>
MHE
MHE - the print2image2Internet consultants
Presentation in XML
• Tags in XML don’t have natural formatting (unlike HTML), so if presentation is needed, it must be explicitly defined
• CSS can be used for HTML and XML
• XSL can be parsed by an XML parser, and it can be used by XML and XSLT
• XSL example:
MHE
MHE - the print2image2Internet consultants
Presentation Example• <?xml version="1.0"?>• <xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl">• <xsl:template match="author">• <TABLE WIDTH="100%" BORDER="1" CELLSPACING="0”... <TR>• <TD COLSPAN="2">• <TABLE WIDTH="100%" BORDER="1" CELLSPACING="0”... • <FONT COLOR="#000000"><xsl:value-of select="name"/></FONT>• </TD>• ...• </xsl:template>• </xsl:stylesheet>
MHE
MHE - the print2image2Internet consultants
Why Two Style Sheet Languages?
Style Sheet Format CSS XSL
Can be used with HTML? Yes No
Can be used with XML? Yes Yes
Transformation language? No Yes
Syntax CSS XML
MHE
MHE - the print2image2Internet consultants
DTD/Schema in XML
• The DTD is the “old” (SGML) way of defining not only what tags are valid, but their relative order, number, mandatory/optional attributes, and so on
• The Schema is a total rewrite - written in XML itself - which defines all of the above as well as possible legal values for a tag (e.g., integer, date, days of the week, etc.)
MHE
MHE - the print2image2Internet consultants
Schema Example
• <?xml version="1.0"?>
• <Schema name="sample_schema" ...>
• ...
• <!-- ********** Element Types ************ -->
• <!-- *** data *** -->
• <ElementType name="author">
• <element type="name" minOccurs="1" maxOccurs="1"/>
• </ElementType>
• ...
• </Schema>
MHE
MHE - the print2image2Internet consultants
What is “e-Business”?• Of course, e-Business is really just doing
business using 100% electronic methods such as the Internet
• In e-Business, we do transactions or exchange information using electronic media rather than the usual paper media
• e-Business can broken down into two parts:– B2C– B2B
MHE
MHE - the print2image2Internet consultants
B2C• B2C is “Business to Consumer”
• Your business generates the information, and a consumer receives it
• The consumer is normally interested only in the data and its presentation
• Thus, in this scenario, the consumer needs only an XML document and CSS/XSL - which is more or less the same as HTML!
MHE
MHE - the print2image2Internet consultants
Important Fact #1
• When you are engaged in B2C, and the recipient is a consumer with a “thin” client, then HTML is usually sufficient– Supplying the data in XML is usually a waste
of time, because the recipient gets no additional value from the XML over HTML
– XHTML is just HTML which is XML compliant
MHE
MHE - the print2image2Internet consultants
B2B• B2C is “Business to Business”
• Your business generates the information, and another business receives it
• Frequently, the recipient is not a person, but a software process in the business
• Thus, in this scenario, the recipient often needs only the XML data and the reference to the DTD or Schema - no presentation may be needed!
MHE
MHE - the print2image2Internet consultants
Important Fact #2
• When you are engaged in B2B, and the recipient is a software process, then XML is often the most appropriate format– Binary data formats may be smaller, but will
require more work and more maintenance– Don’t send presentation information unless the
recipient actually wants your presentation information!
MHE
MHE - the print2image2Internet consultants
When Do I Use XML?
• As we have seen, XML is best suited for the preservation of the “author’s” content
• And (X)HTML is best suited for presentation of information to an enduser
• And this leads us to...
MHE
MHE - the print2image2Internet consultants
Important Fact #3• In today’s market:
– XML is better utilized when communicating with a “thick” client - that is, most B2B in which a software process is the recipient
– (X)HTML is better utilized when communicating with a “thin” client - that is, most B2C in which an Internet browser is the recipient
• And when is this not true?
MHE
MHE - the print2image2Internet consultants
Exceptions to Fact #3
• XML can be used in B2C when the browser is used with so much Java and other local applications that the overall process resembles a thick client
• (X)HTML can be used in B2B if the recipient is just a human being rather than a software process, e.g., when information is transmitted only to be viewed
MHE
MHE - the print2image2Internet consultants
CML Chemical Markup Language
• One of the early “vertical” implementations of XML
• The official site is http://www.xml-cml.org/
• A “better” site is http://www.ch.ic.ac.uk/chimeral/
• CML uses the trio of tagged data, Schema, and XSL
MHE
MHE - the print2image2Internet consultants
A CML XML Document
<molecule title="caffeine" id="mol_caffeine">
<formula>C8 H10 N4 O2</formula>
<string title="CAS">58-08-2</string>
...
</molecule>
CML Data
MHE
MHE - the print2image2Internet consultants
The CML Schema• <?xml version="1.0"?>• <Schema name="cml_dev_karne" xmlns="urn:schemas-microsoft-
com:xml-data" xmlns:dt="urn:schemas-microsoft-com:datatypes">• ...• <!-- ********** Element Types ************ -->• <!-- *** data *** -->• <ElementType name="molecule" content="eltOnly" model="open"
order="many">• <element type="formula" minOccurs="0" maxOccurs="*"/>• ...
CML Schema
MHE
MHE - the print2image2Internet consultants
A CML Stylesheet• <xsl:template match="molecule">
• <TABLE WIDTH="100%" BORDER="1" CELLSPACING="0" CELLPADDING="3" BORDERCOLOR="#CCCCFF" BGCOLOR="#EEEEFF">
• <TR>
• <TD COLSPAN="2">
• <FONT COLOR="#0000AA">Formula
• <FONT COLOR="#000000"><xsl:value-of select="formula"/></FONT></TD><TD>
• ...CML XSL
MHE
MHE - the print2image2Internet consultants
The CML Document
• Note that each data item is tagged
• Note that each tag matches the standard Schema
• Note that the data is used to create a complex image in the browser - but not the only possible image!
caffeine.xml
MHE
MHE - the print2image2Internet consultants
A Print to XML/HTML Conversion
• Print stream does not contain any metadata, only data and presentation information
• Tags cannot be meaningful unless they are reverse-engineered
• The result might be only the tagged data and the stylesheet
• Too often, the XML looks like:
MHE
MHE - the print2image2Internet consultants
Bad XML Example• /* text positioning information */• .ps0{position:absolute;top:533px;left:29px;width:40px;}• .ps1{position:absolute;top:533px;left:317px;width:38px;}• .ps2{position:absolute;top:533px;left:454px;width:90px;}• ...• /* font properties information */• .ft1{font-weight:bold;font-size:22px;}• .ft2{font-size:17px;}• .ft3{font-size:11px;}
• <!-- text starts here -->• <SPAN CLASS="ps0"><NOBR>Account Number</NOBR></SPAN>• <SPAN CLASS="ps1"><NOBR>12345</NOBR></SPAN>• <SPAN CLASS="ps2"><NOBR>Name</NOBR></SPAN>• ...
bad HTML example.html
MHE
MHE - the print2image2Internet consultants
An Image to XML Example
• Most information may not be tagged– <invoice>– <account_no>12345</account_no>– <name>Bill McCalpin</name>– <data>70 02 02 02 02 FE A7 47 47 48 03 F9
A7 42 27 4A 74….</data>– </invoice
MHE
MHE - the print2image2Internet consultants
The Flow of Information
• E-Business is about the flow of information between parties as well as within the enterprise
• Traditionally, as information moves through the business process, we lose as much information as we add
• Look at how we used to treat information:
MHE
MHE - the print2image2Internet consultants
As Information Flow Used to Be
Generation Composition Distribution Archival
Data Data
awareness(metadata)
Data Presentation
information
Toner onpaper
Rasterimage
MHE
MHE - the print2image2Internet consultants
As Information Flow Used To Be
Data
Data awareness (metadata)
Data
Presentation information
Zap!
Toner on paper
Archive
Scan
X’010101’(bits)
Composer
MHE
MHE - the print2image2Internet consultants
As Information Flow Is Today
Generation Composition Distribution(push orpull)
Archival(presentationformat likePDF)
Data Data
awareness(metadata)
Data Presentation
information
Data Presentation
information Distribution
metadata
Data Presentation
information
MHE
MHE - the print2image2Internet consultants
As Information Flow Is Today
Data
Data awareness (metadata)
Data
Presentation information
Zap!
Web page, emails, etc.
Transform
Text and graphics
Composer
MHE
MHE - the print2image2Internet consultants
As Information Flow Should BeGeneration Compo-
sitionDistri-bution(push orpull)
Archival(XML)
Data Data
awareness(metadata)
Data Data
awareness(metadata)
Presentationinformation
Data Data
awareness(“metadata”)
Presentationinformation
Distributionmetadata
Data Data
awareness(“metadata”)
Presentationinformation
Distributionmetadata
MHE
MHE - the print2image2Internet consultants
As Information Flow Should Be
Data
Data awareness (metadata)
Data
Presentation information
User
Data awareness (metadata)
WAP
Web page
archive
paper
Complete XML documents
MHE
MHE - the print2image2Internet consultants
Or, As In The XML Bubble...Web page
Archive
Data & metadata
Data & metadata
Data & metadata
Process
Process
Add presenta-tion
Cell phones
B2B applica-tions
MHE
MHE - the print2image2Internet consultants
Important Fact #4
• Use XML to delay the loss of important information
• Don’t throw away information until you commit the document to a final format which can’t support it
• In other words, keep the information in XML as long as possible
MHE
MHE - the print2image2Internet consultants
PolicyPrint
Reports
1:1Mark.
Billing
EDI
Com-pliance
CampaignManage.
CRM
Pol. &Proc.
Archive
Notices
New Sales
HR
Reprints
MHE
MHE - the print2image2Internet consultants
PolicyPrint
Reports
1:1Mark.
Billing
EDI
Com-pliance
CampaignManage.
CRM
Pol. &Proc.
Archive
Notices
New Sales
HR
Reprints
XML
EBPP
MHE
MHE - the print2image2Internet consultants
PolicyPrint
Reports
1:1Mark.
Billing
EDI
Com-pliance
CampaignManage.
CRM
Pol. &Proc.
Archive
Notices
New Sales
HR
Reprints
XML
Bubble
EBPP
MHE
MHE - the print2image2Internet consultants
PolicyPrint
Reports
1:1Mark.
Billing
EDI
Com-pliance
CampaignManage.
CRM
Pol. &Proc.
Archive
Notices
New Sales
HR
Reprints
XML
Bubble
EBPP
MHE
MHE - the print2image2Internet consultants
PolicyPrint
Reports
1:1Mark.
Billing
EDI
Com-pliance
CampaignManage.
CRM
Pol. &Proc.
Archive
Notices
New Sales
HR
Reprints
XML
Bubble
EBPP
MHE
MHE - the print2image2Internet consultants
PolicyPrint
Reports
1:1Mark.
Billing
EDI
Com-pliance
CampaignManage.
CRM
Pol. &Proc.
Archive
Notices
New Sales
HR
Reprints
XML
Bubble
EBPP
MHE
MHE - the print2image2Internet consultants
PolicyPrint
Reports
1:1Mark.
Billing
EDI
Com-pliance
CampaignManage.
CRM
Pol. &Proc.
Archive
Notices
New Sales
HR
Reprints
XML
Bubble
EBPP
MHE
MHE - the print2image2Internet consultants
Today’s Billing Process + XML
BillingExtract
Print/Format
DataBase
PostProcess
XMLApp.
MHE
MHE - the print2image2Internet consultants
Driver
XMLApplicationswith business rules
Driver
Driver
DriverEmail
MHE
MHE - the print2image2Internet consultants
Why Should I Support XML?
• I should support XML in B2B, unless the recipient wants only to view my presentation
• I should support (X)HTML in B2C, unless the recipient has a thick client which can utilize the XML (cf. Quicken and OFX)
MHE
MHE - the print2image2Internet consultants
How Should I Use XML?
• Once information is in XML, I should keep it there as long as possible
• I should use industry accepted DTDs and Schemas
• I shouldn’t even think of “well-formed” XML (syntactically correct but no DTD/Schema) as real XML, to avoid confusion
MHE
MHE - the print2image2Internet consultants
A Final Note
• The World Wide Consortium (www.w3c.org) is the standards body for the generic protocols of XML, such as XML syntax itself, XSL, RDF, etc.
• Most “domain” or vertically centric XML definitions are supported by the verticals themselves, e.g., CML, GEML (Gene Expression Markup Language), etc.
MHE
MHE - the print2image2Internet consultants
A Final Note, Part Deux
• At www.xml.org, there are nearly 100 Schema/DTDs listed from 31 different industries, from AIML (Astronomical Instrument Markup Language) to RecipeML (Recipe Markup Language) – yes, XML for the kitchen.
• Also see Robin Cover’s excellent work at xml.coverpages.org/sgml-xml.html
MHE
MHE - the print2image2Internet consultants
Contact Information
William J. ‘Bill’ McCalpin MIT, LIT, CDIA, EDP
PrincipalMHE
1400 Cheyenne Dr.Richardson, Texas 75080-3921 USA
(972) 231-3660 (v) (972) 690-4521 (f)[email protected]
www.mhe-consulting.com