Download - Differences Between HTML and XHTML
-
8/4/2019 Differences Between HTML and XHTML
1/18
Differences Between HTML and XHTML
This page is currently being revised. Some information is incomplete or missing.
Please note that the information in here is based upon the current spec for (X)HTML5.Some of the issues technically do not apply to previous versions of HTML.
Although HTML and XHTML appear to have similarities in their syntax, they are
significantly different in many ways.
Note: As the current WHATWG document is a draft, this section will need to
track to a moving target.
Overlap Language
There is a community who find it valuable to be able to serve HTML5 documents whichare also valid XML documents. They may, for example, use XML tools to generate thedocument, and they and others may process the document using XML tools. These
documents are served as text/html.
This language is sometimes called "polyglot". It is the overlap language of documents
which are both HTML5 documents and XML documents. Guidelines are listed below forhow one can construct such a polyglot document which will work in either environment.
Besides following the well-formedness rules of XML, there are some other restrictions to
which one must adhere (for the sake of text/html documents).
This wiki web page is an example of such a document. You can parse it with an XMLparser or an HTML parser.
MIME Types
FeatureHTML
RequirementXHTML Requirement Notes
Mime
Type
Must use
text/html.
Must use an XML MIME type,
such as application/xml or
application/xhtml+xml.
It is the MIME type that
determines what type of
document you are using. Any
document, including adocument authored with the
intention of being XHTML,
served as text/html is
technically an HTML
document.
-
8/4/2019 Differences Between HTML and XHTML
2/18
-
8/4/2019 Differences Between HTML and XHTML
3/18
purpose
s.
Error
Handling
HTML does not have a well-
formedness constraint, noerrors are fatal. Graceful error
handling and recoveryprocedures are thoroughlydefined.
Well-formedness errors are
fatal
Ensurethere are no
well-formedness
errors.
Charact
er
Encodin
g
(includi
ng XML
Declara
tion,
meta)
The XML declaration is
forbidden (treated as a bogus
comment, but such style ofcomments are deprecated),
but the meta element with a
charset attribute may be
used instead.
If the encoding is unspecified
in HTML, it should be
determined throughimplementation specific
heuristics or fallback to a
default value (Note: thissection of the spec is not yet
finished).
The XML declaration may be
used tospecify the character
encoding, while meta is only
allowed as case-insensitive
"UTF-8" (and is ignored if
included).
The default characterencoding for XHTML is,
according to XML rules, UTF-
8 orUTF-16.
If you need
to include
XML 1.1-
only
markup, if
you do not
wish to
convert the
encoding
of the
document
to UTF-8
or UTF-16
(since use
of other
encodings
also
requires a
declaration
), or if you
wish to
define an
external
SYSTEM
DTD in the
DOCTYPE
but use
standalone
=yes
(redundant
?), you
must use
an XML
Declaratio
n for
XHTML,
but this
http://wiki.whatwg.org/wiki/FAQ#How_do_I_specify_the_character_encoding.3Fhttp://wiki.whatwg.org/wiki/FAQ#How_do_I_specify_the_character_encoding.3Fhttp://wiki.whatwg.org/wiki/FAQ#How_do_I_specify_the_character_encoding.3Fhttp://wiki.whatwg.org/wiki/FAQ#How_do_I_specify_the_character_encoding.3Fhttp://wiki.whatwg.org/wiki/FAQ#How_do_I_specify_the_character_encoding.3F -
8/4/2019 Differences Between HTML and XHTML
4/18
may not be
allowable
in the
future in
HTML.
For futurecompatibili
ty, it would
be best to
avoid
XML 1.1-
only
markup,
convert to
UTF-8 or
UTF-16
(probablyUTF-8
which
could allow
use of a
meta tag),
and avoid
use of a
SYSTEM
DTD
(rendering
thestandalone
=yes
unnecessar
y),
respectivel
y. Do not
use ameta
tag, unless
it is UTF-8
(and
included inthe first
512 bytes
of the
document),
in which
case it is
probably a
-
8/4/2019 Differences Between HTML and XHTML
5/18
good idea
to include
it for the
sake of
HTML (as
)
in case you
cannot
specify
such in a
content
header.
Namesp
aced
elements
Elements and attributes for
known vocabularies (HTML,
SVG and MathML) areimplicitly assigned to
appropriate namespaces,
according to the rulesspecified in the parsing
algorithm. Elements in the
HTML, SVG, or MathML
namespaces may have anxmlns attribute explicitly
specified, if, and only if, it
has the exact value
"http://www.w3.org/1999/xhtml" (see namespace
declaration). The attribute has
absolutely no effect. It is
basically a talisman. It isallowed merely to make
migration to and from
XHTML mildly easier. Whenparsed by an HTML parser,
the xmlns attribute itself ends
up in no namespace. Foreign
elements are also not treatedas being in another
namespace and will have no
effect except for displayingby default as inline elements
(and be aware that self-
closing elements cannot beused as such since
The XHTML namespace
must be declared for HTML
elements according to therules defined by the
Namespaces in XML
specification. Namespacesmust be explicitly declared.
The xmlns attribute ends up
in the"http://www.w3.org/2000/
xmlns" namespace. Foreign
elements can be used
independently of HTML
elements, as long as they areassigned to their own
namespace.
Declare
HTML
namespaces(or other
namespaces
) explicitlyand do not
prefix
XHTML
elements.
Do not
depend on
the
behavior offoreign
namespace
d elements
in an
HTML
setting; if
you need to
include
these, you
will
probablywish to set
this foreign
markup
via CSS todisplay:n
one. You
should
http://www.w3.org/1999/xhtmlhttp://www.w3.org/1999/xhtmlhttp://wiki.whatwg.org/wiki/FAQ#What_is_the_namespace_declaration.3Fhttp://wiki.whatwg.org/wiki/FAQ#What_is_the_namespace_declaration.3Fhttp://www.w3.org/TR/REC-xml-names/http://www.w3.org/2000/xmlnshttp://www.w3.org/2000/xmlnshttp://www.w3.org/1999/xhtmlhttp://www.w3.org/1999/xhtmlhttp://wiki.whatwg.org/wiki/FAQ#What_is_the_namespace_declaration.3Fhttp://wiki.whatwg.org/wiki/FAQ#What_is_the_namespace_declaration.3Fhttp://www.w3.org/TR/REC-xml-names/http://www.w3.org/2000/xmlnshttp://www.w3.org/2000/xmlns -
8/4/2019 Differences Between HTML and XHTML
6/18
unrecognized elements will
be treated as though they are
non-void; thus one cannot, forexample, type
in HTML or it will be treatedas though there is no
immediate closing tag).
Namespaced prefixes are notallowed on HTML elements;
a prefixed xmlns attribute
cannot be used even if it isdefined in the XHTML
namespace.
explicitly
close (not
self-close)
all empty
elements
defined ina non-
XHTML
namespace
, since
otherwise
when used
in HTML,
HTML will
treat them
as though
they havenot been
closed.
Namesp
aced
attribut
es on
HTMLelement
s
Attributes of the form
xmlns:prefix may not be used
on HTML elements.
The xmlns:prefix attributes
end up in the"http://www.w3.org/2000/
xmlns" namespace.
Do not use
namespace
d
attributes
on HTML
elements.
Do not
depend on
thebehavior of
foreign
attributes
in an
HTML
setting.
Namesp
ace
attribut
es on
foreign
element
s
Elements in the SVG
namespace may have anxmlns attribute specified, if,
and only if, it has the exact
value"http://www.w3.org/2000/
svg". The attribute is optional
because the namespace isimplied during parsing.
Elements in the MathML
The SVG and MathML
namespaces must be declared
for SVG and MathMLelements, respectively,
according to the rules definedbyNamespaces in XML. Thexmlns and xmlns:prefix
attributes end up in the"http://www.w3.org/2000/
xmlns" namespace.
http://www.w3.org/2000/xmlnshttp://www.w3.org/2000/xmlnshttp://www.w3.org/2000/svghttp://www.w3.org/2000/svghttp://www.w3.org/2000/xmlnshttp://www.w3.org/2000/xmlnshttp://www.w3.org/2000/xmlnshttp://www.w3.org/2000/xmlnshttp://www.w3.org/2000/svghttp://www.w3.org/2000/svghttp://www.w3.org/2000/xmlnshttp://www.w3.org/2000/xmlns -
8/4/2019 Differences Between HTML and XHTML
7/18
namespace may have an
xmlns attribute specified, if,
and only if, it has the exactvalue
"http://www.w3.org/1998/Math/MathML". The attribute
is optional because the
namespace is implied duringparsing.
Foreign elements may also
have an xmlns:xlink
attribute specified, if, andonly if, it has the exact value"http://www.w3.org/1999/
xlink". This attribute isoptional, even if XLinkattributes are used, because
the namespaces for XLink
attributes is implied during
parsing.
When parsed by an HTML
parser, the xmlns and
xmlns:xlink attributes end
up in the"http://www.w3.org/2000/
xmlns" namespace.
XLink
attribut
es
Foreign elements may use the
attributes xlink:actuate,
xlink:arcrole,
xlink:href, xlink:role,
xlink:show, xlink:title
and xlink:type. These
attributes are placed in the"http://www.w3.org/1999/
xlink". The prefix used must
be "xlink".
XLink attributes may be
specified on foreign elements
using any prefix, subject tothe conformance rules defined
byNamespaces in XML. The
XLink namespace must bedeclared according to the
conformance rules defined byNamespaces in XML if XLink
attributes are used within thedocument.
Do not use
XLink
attributes
on HTML
elements
and do not
depend on
them on
foreign
elements aswill not
work as
such in
HTML. If
being used,
ensure theyhave the
http://www.w3.org/1998/Math/MathMLhttp://www.w3.org/1998/Math/MathMLhttp://www.w3.org/1999/xlinkhttp://www.w3.org/1999/xlinkhttp://www.w3.org/2000/xmlnshttp://www.w3.org/2000/xmlnshttp://www.w3.org/1999/xlinkhttp://www.w3.org/1999/xlinkhttp://www.w3.org/1998/Math/MathMLhttp://www.w3.org/1998/Math/MathMLhttp://www.w3.org/1999/xlinkhttp://www.w3.org/1999/xlinkhttp://www.w3.org/2000/xmlnshttp://www.w3.org/2000/xmlnshttp://www.w3.org/1999/xlinkhttp://www.w3.org/1999/xlink -
8/4/2019 Differences Between HTML and XHTML
8/18
appropriate
XLink
namespacedefined.
XML
attribut
es
Foreign elements may use theattributes xml:lang, xml:id,
xml:base and xml:space.
These attributes are placed in
the"http://www.w3.org/XML/1
998/namespace". The prefix
used must be "xml".
HTML elements may use thexml:lang attribute. The
attribute in no namespacewith no prefix and with the
literal localname "xml:lang"
has no effect on languageprocessing (as does "lang".
HTML elements must not use
the xml:base, xml:space, or
xml:id attributes.
Any element, includingHTML elements, may use the
attributes xml:lang, xml:id,
xml:base and xml:space.
These attributes are placed in
the"http://www.w3.org/XML/1
998/namespace". The prefix
used must be "xml".
Though
they can beused on
foreign
elements,
do not use
xml:base,
xml:id, orxml:space
on HTML
elements;
use both
xml:langand lang
attributes
whenever
one is to be
needed on
HTML
elements.
Attribut
es
Names are not case sensitive.
Attribute minimization isallowed (i.e. omitting the
equals sign and the value).
Names are case sensitive (and
lower case). Attributeminimization is not allowed.
Use lower
caseattribute
names. Donotminimize
attributes.
Non-namespaced
attributes
not
belongingto HTML
will be
included inthe DOM
tree and
accessibleto script
and
stylesheets,
http://www.w3.org/XML/1998/namespacehttp://www.w3.org/XML/1998/namespacehttp://www.w3.org/XML/1998/namespacehttp://www.w3.org/XML/1998/namespacehttp://www.w3.org/XML/1998/namespacehttp://www.w3.org/XML/1998/namespacehttp://www.w3.org/XML/1998/namespacehttp://www.w3.org/XML/1998/namespace -
8/4/2019 Differences Between HTML and XHTML
9/18
but it is
discouraged
to use thesedue to the
potential
for futurenaming
conflicts;data-
attributes
can be used
instead, orif in an
XML-only
environment,
namespacedattributes.
Attribut
e values
White space characters arenot normalized. Unquoted
attribute values are allowed.
Fixed or default attributevalues ...?
White space characters arenormalized to single spaces
(unless attribute is of CDATA
type?). Unquoted attributevalues are not allowed.
Default attribute values could
conceivably be defined with aDTD.
Createwhitespace
in attribute
valueswhich is
already
normalized(converted
to single
spaces).Alwaysquote
attribute
values. Do
not rely on
defining
default or
fixed
attribute
values (or
elementswith
exclusively
element
content) in
a DTD
(unless it
matches
http://www.w3.org/TR/REC-xml/#AVNormalizehttp://www.w3.org/TR/REC-xml/#AVNormalize -
8/4/2019 Differences Between HTML and XHTML
10/18
HTML
behavior).
Space
charact
ers
The space characters are
defined as:
U+0009CHARACTER
TABULATION
U+000A LINE FEED
U+000C FORM
FEED
U+000D CARRIAGERETURN
U+0020 SPACE
The space characters aredefined as:
U+0009
CHARACTER
TABULATION
U+000A LINE FEED
U+000D CARRIAGE
RETURN
U+0020 SPACE
The
difference is
theinclusion of
Form
Feed.Form
feed
charact
ers arediscour
aged in
XML1.1.
Do not use
the formfeed
character.
The
DOCTY
PE
A DOCTYPE is a mostly
useless, but required, header.
The DOCTYPE is usedduring parsing to determing
the parsing mode. The
keywords "DOCTYPE",
"PUBLIC" and "SYSTEM", and
the name "html" are treated
case insensitively. The systemidentifier"about:legacy-
compat" (and the public and
system identifiers forprevious versions of HTML)
are case sensitive.
Conforming HTML
documents are required to use (case
insensitively) or the legacy-
compat version .
When using the obsolete but
conforming DOCTYPEs
based on the HTML 4.0 and4.01 Strict DTDs, the system
The DOCTYPE is optional.
XML rules for case sensitivity
apply (everything is casesensitive).
Either of the DOCTYPEs
defined in HTML5 may be
used, or any other customDOCTYPE. If the public
identifier is specified, the
system identifier must also bespecified. The obsolete status
of the obsolete permittedDOCTYPEs defined forHTML does not apply to
XHTML. Any DOCTYPE
may be used, subject to the
conformance rules defined byXML.
Use of an internal subset is
permitted according to therequirements of XML. Some
validating XML processors
may dereference the system
identifier, if used, but mostbrowsers use non-validating
Use the
empty
DOCTYPE
with no
SYSTEM
or
PUBLIC
identifiers
and no use
of internet
subset.
-
8/4/2019 Differences Between HTML and XHTML
11/18
identifier is optional. The
obsolete but conforming
DOCTYPEs based on
XHTML 1.0 Strict and
XHTML 1.1 may also bespecified.
Use of an internal subset isforbidden. The system
identifier is never de-
referenced by HTML
implementations.
processors.
Element
names
Element names are case
insensitive.
Element names are case
sensitive and lower-case.
Only use
lower-case
element
names (aswith
attributes).
Void vs.
Non-
void
Element
s
Void elements only have a
start tag; end tags must not bespecified for void elements,
and it is impossible for them
to contain any content. Atrailing slash may optionally
be inserted at the end of the
element's tag, immediatelybefore the closing greater-
than sign. For non-void
elements (e.g., ), the
trailing slash is a parsing error(ignored and thus treated as
unclosed).
Void elements may use either
the empty-element tag syntax(EmptyElemTag) or use a start
tag immediately followed by
an end tag, with no content inbetween. While it is possible
for the element to contain
content, this is non-conforming.
For void
elements
(e.g.,
), donot include
content or
use a
closing tag;
only use a
self-closing
element
with
closing
slash at the
end (with a
space
preceding
it for the
sake of
older
browsers).
For non-
void
elements,
i.e., where
content
-
8/4/2019 Differences Between HTML and XHTML
12/18
can exist
(e.g.,
),
always use
an explicit
closing tag(not a self-
closing tag)
even if
there is no
content.
Unexpe
cted end
tags
Unexpected end tags (in
HTML, an unexpected
or
can cause the start tagto be implied before it).Unexpected end tags are well-
formedness errors.
Do not add
end tags
unless thereis an
explicit and
properlynested open
tag before
it.
End tag
with
attribut
es
?An end tag with attributes is
not allowed.
Do not useend tags
with
attributes.
Raw
text
elements
RCDAT
A
element
s
Foreign
element
s
Normal
element
s
Optiona
l tags
Forsome elements, the start
and/or end tags are optional
and are implied by certainspecified conditions. For
example, the end tag for the p
element is implied by a
End tags must be explicitly
included for all elements,
except empty elements usingtheEmptyElemTagsyntax.
Always use
end tags (or
self-closingtags for
void
elements).
http://wiki.whatwg.org/wiki/HTML_vs._XHTML#HTML_Elements_with_Optional_Tagshttp://wiki.whatwg.org/wiki/HTML_vs._XHTML#HTML_Elements_with_Optional_Tagshttp://wiki.whatwg.org/wiki/HTML_vs._XHTML#HTML_Elements_with_Optional_Tags -
8/4/2019 Differences Between HTML and XHTML
13/18
subsequent p element.
Omitting the end tag for other
elements is a parse error andvarious error recovery
procedures are appliedappropriately.
Comme
nt
syntax
Comments must start with thefour character sequence
"" (bogus comments such
as those beginning with "
-
8/4/2019 Differences Between HTML and XHTML
14/18
R-THAN
SIGN ('>')
character.
Processing
Instruct
ions
HTML does not allow
processing instructions anddeprecates the boguscomments which appear in
their form, whether in the
form .
XHTML allows the use ofXML processing instructions
which are only closed by "?
>".
Avoid ">"
inside
processinginstruction
s (as these
will close
the
"instructio
n"
(comment)
prematurel
y) (or one
must strip
outprocessing
instruction
s entirely).
Processing
instruction
s might
need to be
avoided
entirely in
case
HTMLmay in
future
disallow
them
completely.
CDATA
sections
is a a
bogus comment. The
sequence of characters "]]>"
in content when it does notmark the end of a CDATA
section is just regular
character data.
is a CDATA
section. The sequence of
characters "]]>" in content
when it does not mark the endof a CDATA section is a well-
formedness error.
Ensuresequence
"]]>" in
content isescaped
(not
necessaryto escape in
attribute
values). Do
not use
CDATA
-
8/4/2019 Differences Between HTML and XHTML
15/18
sections.
Unescap
ed
Special
Charact
ers
Unescaped ampersands
(U+0026 AMPERSAND - &,
instead of&) are
permitted within the content
ofnormal elements,RCDATAelements,foreign elements
and attribute values where
they are not considered to beambiguous ampersands, andwithinRaw text elements.
Unescaped less than signs
(U+003C LESS-THAN SIGN
-
-
8/4/2019 Differences Between HTML and XHTML
16/18
er data
valid
set of
unicodecharact
ers in
XML1.0 is
limited
beyond
that inHTML
(we
need tospecify
this
here).
Element-specific parsing
In HTML, the script and style elements are parsed as CDATA elements. (Note:
the definition ofCDATA differs from that in XML). In XML, they're parsed as
normal elements (which means that things that look like comments are treated as
realcomments, and things that look like start tags actually are start tags).
In HTML, the title and textarea elements are parsed as RCDATA elements.
(Note: The definition ofRCDATA differs from that in SGML and there is no
RCDATA in XML).
In HTML, if scripting is enabled, the noscript element is parsed as an CDATA
element. If scripting is disabled, it's parsed as a normal element. In XHTML, theelement is always parsed as a normal element, and can't really be used to stop
content from being present when script is disabled.
In HTML, the iframe, noembed and noframes elements are parsed as CDATA
elements. In XHTML, they are parsed as normal elements, and therefore do not
stop content from being used.
In HTML, tags for certain elements, which appear out of context, are ignored.
This includes caption, col, colgroup, frame, frameset, head, option,
optgroup, tbody, td, tfoot, th, thead, tr.
In XHTML, table elements may contain child tr elements. In the HTML
serialisation, due to backwards compatibility constraints, this is not possible
(though it may be done through DOM manipulation). The plaintext element has a special parsing requirement in HTML. (It is,
however, forbidden.)
In HTML, a line feed that immediately follows a pre, listing ortextarea start
tag is ignored.
Many other special handling of edge cases and error conditions, not all of whichare listed here, occur in HTML. (such as?)
-
8/4/2019 Differences Between HTML and XHTML
17/18
The following are void elements in HTML (see void elements in table): In head
(base, link, meta), in body (area,br, col, embed, hr, img, input, param)
HTML Elements with Optional Tags
Element Start Tag End Tag
html optional optional
head optional optional
body optional optional
li required optional
dt required optional
dt required optional
p required optional
colgroup optional optional
thead required optionaltbody optional optional
tfoot required optional
tr required optional
th required optional
td required optional
rt required optional
rp required optional
optgroup required optional
option required optional
Scripts
document.write() and document.writeln() cannot be used in XHTML, they
can in HTML.
In XHTML, the use of the innerHTML property requires that the string be a well-
formed fragment of XML.
DOM APIs are case sensitive in XHTML and some are case insensitive in HTML.(This does not apply to elements which are not in the HTML namespace)
o Element.tagName and Node.nodeName return the value in uppercase inHTML but lower-case in XHTML (Node.localName is consistent now, asof HTML5).
o Document.createElement() is case insensitive (the canonical form is
lowercase).o Element.setAttributeNode() will change the attribute name to lowercase.
o Element.setAttribute() is case insensitive (the canonical form is
lowercase).
-
8/4/2019 Differences Between HTML and XHTML
18/18
o Document.getElementsByTagName() and
Element.getElementsByTagName() are case insensitive.o Document.renameNode(). If the new namespace is the HTML namespace,
then the new qualified name will be lowercased before the rename takes
place.
In HTML, Document.createElement() will create an element in the HTMLnamespace. In XML (including XHTML), the namespace is defined by both
DOM2 and DOM3 to be null.o In XHTML, browsers lack interoperability in this area. In Firefox and
Safari, the namespace is dependent upon the MIME type. In Opera, it's
dependent upon the root element.
XPath expressions targeted at pre-HTML5 browsers need to use the XHTMLnamespace for XHTML and null for HTML. (HTML5 browsers would use the
XHTML namespace even in HTML.)
Stylesheets
Selectors, as used in CSS, match case sensitively in XHTML, but caseinsensitively in HTML.
CSS requires special handling of the body element in HTML for painting
backgrounds on the canvas, which do not apply to XHTML.
For polyglot documents, use lower-case element selectors and style the html and
body elements appropriately (?).