why sgml (retro alert 1995)

31
(1995) figure list para document title Sub-title + Why SGML? The Need for SGML Course Module * Module knowledge information information data data ... + + ? * * First delivered: 1995 www.gollner.ca

Upload: joe-gollner

Post on 08-May-2015

2.200 views

Category:

Technology


0 download

DESCRIPTION

A presentation developed and delivered in 1995. It was designed to be part of a larger introduction to SGML. It is interesting today because it foregrounds many (if not all - and perhaps a few extra) of the themes being touched upon in discussions of Intelligent Content. It needed to be shared just in case someone thought that this was all new.

TRANSCRIPT

Page 1: Why SGML (Retro Alert 1995)

(1995)

figure list

para

document

title

Sub-title

+

Why SGML?

The Need for SGML

Course

Module *

Module

knowledge

information information

data data

...

+

+

?

* *

First delivered: 1995

www.gollner.ca

Page 2: Why SGML (Retro Alert 1995)

(1995)

What is SGML?

SGML stands for the Standard

Generalized

Markup

Language

SGML is an international (ISO) standard

ISO 8879:1986 Information Processing - Text and

Office Systems - Standard Generalized Markup

Language (SGML)

Page 3: Why SGML (Retro Alert 1995)

(1995)

What is SGML? Informal Definitions

SGML is a system and processing

independent means of representing,

creating, managing and exchanging

information.

SGML is an “intelligent markup language”

that protects the accessibility, usability, life

expectancy and value of information.

Page 4: Why SGML (Retro Alert 1995)

(1995)

Why SGML? A Meditation on a Paper Clip

The paper clip is a

low-tech version of

hypertext – facilitating

the physical association

of documents & fragments.

Often used in addition to

electronic files where

such associations cannot be

easily shown or enforced.

Page 5: Why SGML (Retro Alert 1995)

(1995) SGML was created

to better manage documents Publications

Training Manuals

Specifications

Documentation

Reports

Correspondence

Policies

Procedures

Standards

Plans

Directives

Commentaries

Proposals

Page 6: Why SGML (Retro Alert 1995)

(1995) Most Information

is held in Documents

Document Information Database Information

10% 90%

IM Budget

Allocations 90% 10%

Page 7: Why SGML (Retro Alert 1995)

(1995) Structured Database

Information

Formalized

Processes

Relational Structure

Strict Definitions Limited Access

Stable Organizational

Boundaries

Limited Flexibility

Page 8: Why SGML (Retro Alert 1995)

(1995)

Document Information

A Document is a meaningful organization of

Information

A Document is meaningful because it is

communicated between people to achieve

specific goals

A Document combines multiple media types

together in an organized, but not strictly

predictable, form that people can use

Page 9: Why SGML (Retro Alert 1995)

(1995)

Document Information Features

Chapter Title Section Title

1

Multiple

Dynamic

Processes

Wide and

Variable

Access

Hierarchical Structure

Variable Definitions

Variable Organizational

Boundaries

Page 10: Why SGML (Retro Alert 1995)

(1995)

Document Information Conclusions

Document Information does not fit within the

conventional Database paradigm

Database Information is organized

according to the needs of the Computer

Document Information is organized

according to the needs of the User

Few of the assumptions within the Database

Paradigm apply to Documents

Page 11: Why SGML (Retro Alert 1995)

(1995) Document Management

Technology Today

Page 12: Why SGML (Retro Alert 1995)

(1995)

Documents and Computers

Computers help us create more paper faster

Computers help us format printed

documents more efficiently and at less cost

Computers have not helped with the

management consequences

Page 13: Why SGML (Retro Alert 1995)

(1995)

The Document Explosion

The volume of documents is growing

exponentially

The visibility of document-based

transactions is increasing

The rise of the Internet and Enterprise

Integration dramatically alters the potential

user community of a document

Documents are becoming more complex,

larger and more varied in format

Page 14: Why SGML (Retro Alert 1995)

(1995)

Management Breakdown

Traditional Records Management practices

and technologies cannot cope with the

volume, complexity, or volatility of computer-

generated documents

The typical response has been to extend the

Database paradigm to document information

Given currently-used technology, the best

that can be done is the “Electronic Filing

Cabinet” (old tools made electronic - again)

Page 15: Why SGML (Retro Alert 1995)

(1995)

What’s Wrong

Computers traditionally store documents as

“objects”

Computers know very little (almost nothing)

about these objects some management information (author, version, date)

little awareness of document content

less awareness of document structure

Computers can only associate some

information with the objects as the objects

have no inherent “intelligence”

Page 16: Why SGML (Retro Alert 1995)

(1995)

New Technologies

Applications have evolved to redress some

of these shortcomings

“Electronic Filing Cabinets” associate

management information with document

objects and physically control events

Full-Text Retrieval technologies have been

used to access Document “Content”

Word Processors are used to infer the

structure of documents based on format

(styles and templates)

Page 17: Why SGML (Retro Alert 1995)

(1995)

Electronic Filing Cabinets

In an “Electronic Filing Cabinet”

environment, management information is

associated with these “objects”

Document objects that leave the sphere of

control are no longer managed

Chapter Title Section Title

1

Chapter Title Section Title

1

Chapter Title Section Title

1

Chapter Title Section Title

1

Sphere of Control

Page 18: Why SGML (Retro Alert 1995)

(1995)

Full-Text Retrieval

Create external indices of the textual content

of a document

Various text indexing algorithms are used to

support searches by word, by text string,

proximity, exclusion and so on

Useful but imprecise as document volume

increases

New technologies arising to improve search

precision (lexicon-based, links to metadata)

Page 19: Why SGML (Retro Alert 1995)

(1995)

Word Processors

Evolving to include basic management

information (profiles)

Evolving to include template structures

(document types)

Management and structural information only

accessible through Word Processor

application (directly or via API)

These new Word Processing features are

not generally used

Page 20: Why SGML (Retro Alert 1995)

(1995)

Proprietary Documents

The basic problem is that traditional

documents are produced and maintained in

a proprietary and non-intelligent format

Electronic Documents are simply paper

documents in a more reproducible form

Electronic Documents are printed for use

People retain and use hardcopy “files”

New Applications still assume a static

environment and single format use

Page 21: Why SGML (Retro Alert 1995)

(1995)

Proprietary Formats

Word Processing applications offer an

enhanced implementation of the typewriter,

the copy editor and the typesetter

Word Processing applications Add formatting instructions to text

Execute formatting instructions to produce an output

(operating system and printer interface)

Formatting Instructions are specific to the

application that created them and the

platform on which they were created

Page 22: Why SGML (Retro Alert 1995)

(1995)

Procedural Markup Processing Instructions

Chapter Title

Section Title

1

12 pt. bold Helvetica

10 pt. bold Helvetica

8 pt. Times

on 10 pt. leading

8 pt. Times

on 10 pt. leading

7 pt. Helvetica bold

Page 23: Why SGML (Retro Alert 1995)

(1995)

Proprietary Markup Typical of Word Processors

[Center][Und On]SGML[Und Off][Hrt]

[Hrt]

[Font: Helvetica 10pt]

[Indent]Introduction[Hrt]

[Hrt]

[Font: Times Roman 8pt]

[Tab]Someday [Italic On]information

[Italic Off] will be free.[Hrt]

Position

Style

Font

Page 24: Why SGML (Retro Alert 1995)

(1995) Binary Storage Formats Highly Proprietary and

Optimized for Performance

ÿWPC-$ �

� ûÿ� 2 �� � � B ÿÿH W

Z ­ � � �� #| x � �

cpi) Courier 12pt (10cpi) Courier 12pt (10cpi) (Bold) CG Times (WN)

(Italic) ÿÿÿÿÿÿÿÿÿÿÿÿÿÿHP LaserJet

III HPLASIII.WRS Û�x �-�Œ

��@É ‡Ï� � ,�È ,�,�4Y-œJX�@Ð�� � � �ÐÓ�� USCE� �Óûÿ� 2 Ø�

ÿÿ1 O� ÿÿ… € � ÿÿ� R ÿÿ Ÿ Courier 12pt (10cpi) Courier 12pt

(10cpi) (Bold) CG Times (WN) (Italic) CG Times (WN) (Bold Italic) Univers (WN) Univers (WN)

Q���X�˜þþþþþþþÿÿÿÿÿÿÿÿþÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿûÿ� 2 _

��@�

� ÿÿd J� ��@� ®� ÿÿq î

�" ‚ ÿÿÿÿ5�ÿÿ…�ÿÿû�ÿÿÿÿÿÿ@�ÿÿÿÿÿÿ^;C`cc±›CCCc±CCCCccccccccccCCDZÇc±zz

…�zr��CY…o¦…�z�zco�z¦zooCCCcccccYcY7cc77Y7�ccccMM7cY…YYMYcYc± ;; !cc

c Rc c c zczczczczc±……YzYzYzYzYC7C7C7C7…c•c•c•c•c•c•c•c•c;Yzc•c•c

�coY�czczczczc…Y …Y…c zczczc�c�c � �c�c�c�ccccccc Y …Yo7 oR

…c …c •c;;zM zRcM;;N; \ ccCc\\cc ;cc±±cF ccc±F CC ;;;;;; ;;;

; ;; ; CFtC±nn ± ± ÅyyÑ

2 co ±7¥ �c Ÿ Å Ñ ¥ \\™™™

HP LaserJet!

Page 25: Why SGML (Retro Alert 1995)

(1995)

Proprietary Documents

Are proprietary to the originating software

Limit or obstruct cross-platform interchange

Are non-intelligent

provide no consistent mechanism to determine

document context, content, or structure

provide no means to enhance automation

Support only one output rendering (print)

Will become obsolete

Information in an obsolete format

is itself obsolete!

Page 26: Why SGML (Retro Alert 1995)

(1995) Portability Problems Paper remains the format for

Document Interchange

Chapter Title Section Title

1

Chapter Title Section Title

1

Chapter Title Section Title

1

Page 27: Why SGML (Retro Alert 1995)

(1995)

Low Document Intelligence Marginal Automated Support

for Business Processes

Lack of Document Intelligence prevents

computers from providing effective

document management or workflow support

Paper remains the working medium

Chapter Title Section Title

1

Approval

Review

Page 28: Why SGML (Retro Alert 1995)

(1995) Single Output Formats

Create Additional Costs

WP Printed

Documents

Conversion $

CD ROM

Conversion $

WWW

Conversion $

Database

Proprietary

Formatting

Page 29: Why SGML (Retro Alert 1995)

(1995) Obsolescence Information must survive when

Products become obsolete

Multimate

WPS Plus

Display Write

Lotus Manuscript

Lanier

Wang

Mass-11

WPS-8

CPT

Word-11

NBI Legend

Xywrite

Where are they now?

Page 30: Why SGML (Retro Alert 1995)

(1995)

Summary

Traditional computing technology and

management practices are failing to cope

with the increasing volume of documents

Non-Intelligent, Proprietary document

formatting restricts document manageability,

portability, utility, quality, affordability,

suitability for multi-format publishing, and

longevity.

Business is therefore conducted in paper!

Page 31: Why SGML (Retro Alert 1995)

(1995)

Are your information assets

frozen in Proprietary Formats?