You may not use XHTML (anymore), but when you write HTML, you may be more influenced by XHTML than you think. You are very likely writing HTML, the XHTML way.
What is the XHTML way of writing HTML, and what is the HTML way of writing HTML? Let’s have a look.
HTML, XHTML, HTML
In the 1990s, there was HTML. In the 2000s, there was XHTML. Then, in the 2010s, we switched back to HTML. That’s the simple story.
You can tell by the rough dates of the specifications, too: HTML “1” 1992, HTML 2.0 1995, HTML 3.2 1997, HTML 4.01 1999; XHTML 1.0 2000, XHTML 1.1 2001; “HTML5” 2007.
XHTML became popular when everyone believed XML and XML derivatives were the future. “XML all the things.” For HTML, this had a profound effect: The effect that we learned to write it the XHTML way.
The XHTML way of writing HTML
The XHTML way is well-documented, because XHTML 1.0 describes in great detail in its section on “Differences with HTML 4”:
- Documents must be well-formed.
- Element and attribute names must be in lower case.
- For non-empty elements, end tags are required.
- Attribute values must always be quoted.
- Attribute minimization is not supported.
- Empty elements need to be closed.
- White space handling in attribute values is done according to XML.
- Script and style elements need CDATA sections.
- SGML exclusions are not possible.
- The elements with
id
andname
attributes, likea
,applet
,form
,frame
,iframe
,img
, andmap
, should only useid
. - Attributes with pre-defined value sets are case-sensitive.
- Entity references as hex values must be in lowercase.
Does this look familiar? With the exception of marking CDATA content, as well as dealing with SGML exclusions, you probably follow all of these rules. All of them.
Although XHTML is dead, many of these rules have never been questioned again. Some have even been elevated to “best practices” for HTML.
That is the XHTML way of writing HTML, and its lasting impact on the field.
The HTML way of writing HTML
One way of walking us back is to negate the rules imposed by XHTML. Let’s actually do this (without the SGML part, because HTML isn’t based on SGML anymore):
- Documents may not be well-formed.
- Element and attribute names may not be in lower case.
- For non-empty elements, end tags are not always required.
- Attribute values may not always be quoted.
- Attribute minimization is supported.
- Empty elements don’t need to be closed.
- White space handling in attribute values isn’t done according to XML.
- Script and style elements don’t need CDATA sections.
- The elements with
id
andname
attributes may not only useid
. - Attributes with pre-defined value sets are not case-sensitive.
- Entity references as hex values may not only be in lowercase.
Let’s remove the esoteric things; the things that don’t seem relevant. This includes XML whitespace handling, CDATA sections, doubling of name
attribute values, the case of pre-defined value sets, and hexadecimal entity references:
- Documents may not be well-formed.
- Element and attribute names may not be in lowercase.
- For non-empty elements, end tags are not always required.
- Attribute values may not always be quoted.
- Attribute minimization is supported.
- Empty elements don’t need to be closed.
Peeling away from these rules, this looks a lot less like we’re working with XML, and more like working with HTML. But we’re not done yet.
“Documents may not be well-formed” suggests that it was fine if HTML code was invalid. It was fine for XHTML to point to wellformedness because of XML’s strict error handling. But while HTML documents work even when they contain severe syntax and wellformedness issues, it’s neither useful for the professional — nor our field — to use and abuse this resilience. (I’ve argued this case before in my article, “In Critical Defense of Frontend Development.”)
The HTML way would therefore not suggest “documents may not be well-formed.” It would also be clear that not only end, but also start tags aren’t always required. Rephrasing and reordering, this is the essence:
- Start and end tags are not always required.
- Empty elements don’t need to be closed.
- Element and attribute names may be lower or upper case.
- Attribute values may not always be quoted.
- Attribute minimization is supported.
Examples
How does this look like in practice? For start and end tags, be aware that many tags are optional. A paragraph and a list, for example, are written like this in XHTML:
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</p>
<ul>
<li>Praesent augue nisl</li>
<li>Lobortis nec bibendum ut</li>
<li>Dictum ac quam</li>
</ul>
In HTML, however, you can write them using only this code (which is valid):
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit.
<ul>
<li>Praesent augue nisl
<li>Lobortis nec bibendum ut
<li>Dictum ac quam
</ul>
Developers also learned to write void elements, like so:
<br />
This is something XHTML brought to HTML, but as the slash has no effect on void elements, you only need this:
<br>
In HTML, you can also just write everything in all caps:
<A HREF="https://css-tricks.com/">CSS-Tricks</A>
It looks like you’re yelling and you may not like it, but it’s okay to write it like this.
When you want to condense that link, HTML offers you the option to leave out certain quotes:
<A HREF=https://css-tricks.com/>CSS-Tricks</A>
As a rule of thumb, when the attribute value doesn’t contain a space or an equal sign, it’s usually fine to drop the quotes.
Finally, HTML–HTML — not XHTML–HTML — also allows to minimize attributes. That is, instead of marking an input
element as required and read-only, like this:
<input type="text" required="required" readonly="readonly">
You can minimize the attributes:
<input type="text" required readonly>
If you’re not only taking advantage of the fact that the quotes aren’t needed, but that text
is the default for the type
attribute here (there are more such unneeded attribute–value combinations), you get an example that shows HTML in all its minimal beauty:
<input required readonly>
Write HTML, the HTML way
The above isn’t a representation of where HTML was in the 90s. HTML, back then, was loaded with <table>
elements for layout, packed with presentational code, largely invalid (as it’s still today), with wildly varying user agent support. Yet it’s the essence of what we would have wanted to keep if XML and XHTML hadn’t come around.
If you’re open to a suggestion of what a more comprehensive, contemporary way of writing HTML could look like, I have one. (HTML is my main focus area, so I’m augmenting this by links to some of my articles.)
- Respect syntax and semantics.
- Validate your HTML, and ship only valid HTML.
- Use the options HTML gives you, as long as you do so consistently.
- Remember that element and attribute names may be lowercase or uppercase.
- Keep use of HTML to the absolute minimum
- Remember that presentational and behavioral markup is to be handled by CSS and JavaScript instead.
- Remember that start and end tags are not always required.
- Remember that empty elements don’t need to be closed.
- Remember that some attributes have defaults that allow these attribute–value pairs to be omitted.
- Remember that attribute values may not always be quoted.
- Remember that attribute minimization is supported.
It’s not a coincidence that this resembles the three ground rules for HTML, that it works with the premise of a smaller payload also leading to faster sites, and that this follows the school of minimal web development. None of this is new — our field could merely decide to rediscover it. Tooling is available, too: html-minifier is probably the most established and able to handle all HTML optimizations.
You’ve learned HTML the XHTML way. HTML isn’t XHTML. Rediscover HTML, and help shape a new, modern way of writing HTML — which acknowledges, but isn’t necessarily based on XML.
Having started my journey during the ‘dark’ times of
<marquee>
and<table>
as the method of laying out a website I appreciate the nostalgia of writing HTML like HTML.Still, I wouldn’t suggest adopting all of these suggestions.
Failing to use end tags (or void elements) properly can result in errors unless you’ve purposefully told your linter to ignore them.
I would also hesitate to recommend omitting quotes because well… it doesn’t offer any benefits.
File size isn’t reduced by any meaningful amount.
You’re introducing possible fail-points in your HTML, making maintainability a potential issue.
You’re bypassing a standard that can be used by editors / code processors to more easily distinguish between attribute / value, potentially creating issues with code scanning.
And I vehemently disagree about omitting default values as this again created a maintainability issue (and perhaps edge-case issues in some browsers?).
<input>
is not as clear as<input type="text">
and as a developer you need to remember that you may not be the only person working on a project. Saving a few bytes by omitting 12 characters isn’t useful if it causes confusion in understanding what your HTML is actually supposed to render.….
Back in ‘the day’ all-caps HTML seemed to be the standard and as I recall the argument was that it was more readable / easier to distinguish between HTML and Content.
I would argue that the code-coloring features implemented in virtually every modern editor render that argument moot, making all-caps really a matter of personal preference.
….
Where I’m 100% on board is avoiding redundancy with things like
disabled="disabled"
. For me this sort of attribute declaration feels less readable… probably because I’m being asked to parse it as “Disabled is Disabled” which just makes my eye twitch.while I’d gladly use these features for minification actually writing code requires it to be readable and predictable so all “you can skip X” become “even though syntax allows it you can not skip X”, closing tokens and consistency of attributes makes things much easier to handle by a human being, even if machines don’t care
being a bit verbose in source code makes your life in future much easier
You keep saying HTML where you should be saying SGML. There are two ways of writing HTML: as SGML or as XML. And both are valid with HTML5.
XHTML is not “not the HTML way”, it’s just not the SGML way.
And for that matter, XML is essentially a stricter subset of SGML.
What you refer as Linux is actually GNU + Linux etc etc
Sorry, but I strongly disagree with the point that end tags are not required. Consider this case:
It’s not intuitive what the result should be.
<div>
is a block element, so the<ul>
will terminate before it, right? Wrong.<div>
is nested under the last. Ok, well the<strong>
tag has got to terminate before it right? Wrong! All the content onward is bold. Leaving out the end tags cause confusion and introduce problems. Conceptually it’s possible, but in practice, it should never be done.Ok, but why should we? What are the benefits?
HTML contributes very little to overall page sizes and this “modern” way of writing makes pages harder to process.
I’m old enough to learn HTML the HTML way, switched to XHTML way because I liked it better, but I’m open to be persuaded.
Good thinking.
However not closing ‘s in particular causes extra spaces to be added as part of the element’s
firstChild
text node. This is the same issue you have when you add newlines in places with s which are styled by css and you get visual spacings between elements you didn’t want. For example an icon at the end of a label on a button will have slightly more space between the text and the icon than otherwise.But in this case, it is completely avoidable – since omitting the closing tag, causes extra spaces and newlines to be parsed as part of that element.
I agree with some parts (like minimizing attributes), but heavily disagree on some others.
Leaving out quotes for attributes, especially for URLs, is a horrible idea. Just because you can, doesn’t mean that you should. And consistency is often better than full-on minimalism.
I mean, you don’t have to use line changes in JavaScript, but it’s horrible to read if you don’t use any. Same for semicolons, I prefer to always use them; consistency over maximal minimalism.
Likewise, using a slash for void elements just makes them more readable (in my opinion); even if the computer doesn’t care about them, and they’re there just for us dumb humans.
Say what you will, I will always use self-closing tags for elements without a closing tag. My OCD disallows me from being satisfied by an opening tag and no indication of the tag closing.
Oof. I don’t wanna go back to HTML where Tags are sometime closed and sometimes not, sometimes uppercase sometimes not. Those HTML files looked terrible and deserved to die. Same as js slowly is overtaken by typescript, I sincerely hope that w3c will apply similar code quality mechanics to HTML.
Please don’t encourage people do write bad code, just because it is possible. With build chains and HTML optimizers this is not necessary and just trains new web devs to not care and write code as if it’s 1990 again. Please don’t.
These are personal syntax preferences presented as “best practices”, and that theme runs through the supporting articles.
I agree with some of your preferences, and it’s good to be reminded we have multiple valid options. But the reality is they just don’t matter.
No one ever made a slow website into a fast one by leaving out closing tags. Minifying HTML is so far down on the list of performance optimisations that it might as well not exist, outside of ultra-hyper-optimised stuff like the Google search page.
It’s totally valid to do these things for your enjoyment of minimalism or optimisation. Perhaps some of them make your code more readable too.
I just think the performance argument is misleading. You’ve disguised something subjective with a veneer of objectivity.
There are also downsides to some of these practices. One is the mental overhead of remembering things you really don’t need to remember. I’d rather fill my head with poetry than memorise rules for when I can leave out quotes on HTML attributes. Frankly, who cares? Just always quote them, and move on with your life.
Again, there’s nothing wrong with caring about this because you find it interesting. We’re all geeks in different ways and let’s celebrate that! But that’s different from persuading the working developer to care. Enjoy riding your hobby horse, but preferably don’t present it as practical transport for those who just want to get to work.
And of course, you can use tools to do this for you. I don’t like the mindset that says you must know all the rules before you use the tools. Tools don’t just automate repetitive tasks, they also free us from the drudgery of learning irrelevant arcana. They let us lean on other people’s expertise — like yours!
Then again, for many sites it may be questionable whether it’s worth the complication of adding the tool, given the near-zero performance benefit of minifying HTML. It might depend on how easy it is to add to your existing setup.
Simplicity is rarely as simple as minimalists like to make it sound. Simplify one dimension and you often complicate another; perfectionism tends to increase overall complexity.
I don’t think I will ever be comfortable not using closing tags in my lists!
Great read, but I’d just give it to
Pug
to format everything for meMy basic takeaway after reading this anarchic manifesto is: Just because you can, doesn’t mean you should. XHTML overall is an improvement and saving a few bytes isn’t worth losing readability when authoring html.
I think XHTML was too strict and put a high cost for a low gain, but it had point. Readability counts (a lot) in any language syntax. Enforce will not be the answer but I think encourage things like always close tags and do not suppress default arguments helps (again, a lot) on the readability of the code.
Again, I don’t think enforce codestyle is the answers but encourage a more readable code is always a good thing
Most of them I follow yes. Except for a few
Closing empty tags. Used to do that a lot back in 2012 because of compatibility issues.
And some things I do that’s the XHTML way because it is easier
All hex entity references must be lowercase, be it color codes, or IDs it is much more manageable to keep it all lowercas
Sorry, I cannot agree to most of this. Using XHTML syntax, I can visually check source code and see an appropriate structure. I don’t always have access to tools that validate the HTML.
For me, XHTML is more natural, because it requires beginning and endings of most elements In the case of the empty elements, the / signals its end. It also brings some sanity to those who come from backend languages or even Javascript, as the syntax in those languages require beginning and ending tokens.
I will still use name attributes on occasion, as that spec is an attribute and not concerned with beginning and ending an element. However, CSS selectors and parts of the Javascript API use ID attributes, so that is likely why the name attribute feel out of favor, with the exception for forms.
I would argue that articles such as this, while innocently describing the true specification for HTML5, is actually normalising bad practice.
For example, dropping attribute quotes to save a few bytes will cause more issues than simply continuing to follow the XHTML spec, especially given attributes are often dynamically injected these days.
Similarly, dropping closing tags causes untold woes when the HTML is more than a few lines or worked on by a few developers, and you’re not sure if it’s supposed to be nested or simply an error.
There were reasons everyone preferred XHTML back in the early 00’s. We shouldn’t forget those.
Unfortunately, many non-browser parsers do expect optional and closing tags. Bing, for instance, expects optional head and body tags or it could fail to read some metadata. Some link previewers also fail.
Optional closing slashes, quotes, etc. are completely unnecessary and should be picked up by typical HTML minifiers.
A lot of the XHTML points actually made sense. You don’t HAVE TO close an element in HTML but it helps to know where it ends (but people don’t really understand the concept of “elements” in HTML).
One thing influenced by XHTML that has never gone away and is so deeply, deeply entrenched in the web dev world is the typing of HTML element selectors in lowercase within the CSS. This stems from when XHTML said that within the HTML that HTML element names should probably be in lowercase but would have to be if you wanted to validate your document as XML, and then if you did that you would have to type your references to them in your stylesheet in lowercase as well or otherwise the matching of elements in the HTML to their corresponding elements in the CSS would not work. Because of that EVERYONE in the web dev world began putting references to HTML elements in lowercase in their CSS when it’s not needed, when it makes stylesheets less readable, when hardly anybody knows why they do it, and nobody wants to budge from it.
When you see:
#header UL LI A:link
It is much easier to quickly tell that UL LI A refers to HTML elements than the following does:
#header ul li a:link
Especially when sifting through tons of CSS code. And then when that confusion occurs it’s harder for developers to grasp the difference between IDs, elements, classes, etc. And then that confusion makes it easier for devs to be too okay with DIV soup.
Controversial stuff! The OCD in me rejects this
I would love to see the VS Code plugin that converts from XHTML way of writing to HTML of writing.
Yes. VSCode html code correction is driving me nuts when I innately sense it’s pointless
I want to write HTML the HTML way but my IDE keeps shouting at me. ;) The default code style settings always seem to be XHTML-HTML.
I may be old fashioned (I first learned HTML in the ’90s and I made my first commission using it in the early 2000s) but I find that learning and respecting the XHTML mantra helps you be a better front end developer, because it’s less sloppy and more predictable. Just because HTML is more permissive doesn’t mean we should lower our standards to it. That’s how I feel about it, anyway.
I feel like minified attributes already became standard. But optional closing tags and for me are like semicolon in JavaScript. Yes, some people write without it and always keep an eye of the circumstances, but for me that harms the readability of the code.
I remember that I found a weird edge case where a library was not generating
</li>
tags and adding<!doctype html>
caused the website layout I was working on to break. I just fixed the library to generate</li>
tags and it solved the problem.I like avoiding tags such as
<head>
because I know browsers are smart to add those and I’m not making websites for bots to spam my comments forms, but in general I close my tags. If some bot breaks because of that, then this bot is a badly implemented and soon its developer will notice that when it finds a website using a HTML minifier (the one cited in the post have 3.9 million downloads per week). I still use<html>
because thelang
attribute.Weirdly I have saw more people closing self-closing tags (like
</link>
and</br>
), because Firefox highlights those as errors in `view-source:, than the opposite. Well, many people also forget to close non-self-closing tags too: Firefox also highlight those errors too and I have seen those a lot.I don’t think write “HTML the HTML way” makes it better in any way, on the other hand I really think it makes less safer. Using XHTMLish rules make it easier for non browser parsers, linters, html formatters and so many other tools to work with the code.
It feels like there are two options on the table, write a more declarative and readable code that works or write a smaller and flexible code that (may) also works and I don’t really see any gain in that.
I’ve been through this whole path and even remember being shocked seing
<option>
could be closed!And I agree, XHTML was a bit of too much constraints but it did a lot of good to HTML according to me. It brought a cleaner code, less space for interpretation and more consistency.
Browsers are extremely tolerant but it doesn’t mean we have to push these boundaries, they bring pretty much nothing except a lighter code (which is still important but considering you ship JS libraries beside…).
Consistency is definitely the key, do what you want but stay consistent at least!
Good article. I hadn’t really given this much thought. Which is your point. :-)
One note of English syntax, “Documents may not be well-formed,” is an ambiguous construction. It sounds like it’s illegal for documents to be well-formed. Maybe, “Documents need not be well-formed,” or “Documents may be not-well-formed.”
Great read! The part about not closing tags reminds me a lot about those kind of people who write their JavaScript without semicolons. Yes it works, but I think it’s just barbaric ;-)
Minimalism isn’t always the best idea. I’m more than happy spending those extra bytes for readability (and therefore maintainability) of a project’s code. This applies to HTML as well als CSS and JS.
It’s possible to set up a minifier during build, so that you can keep your XHTML-HTML in the repo, and ship only clean HTML-HTML – that’s what I did and am happy ;)
There is one important consideration, though, that you might want to be aware of.
Is not, semantically, the same as
In the second case, image is, technically, a part of paragraph. It might also affect your presentation, if you are not careful.
Great! Personally, the less you write… the better!
Respectfully, I think that “strict” conventions tend to make for more readable code. Minimizing code to the point that it is non-obvious what it does might shave off a few bits, but only at great cost to legibility.
For me, XHTML makes so much more sense; not because it’s XML compatible (though that’s a huge bonus); but because it means there’s a handful of rules which you can consistently obey to have valid code. When you start saying “actually HR doesn’t need a (self-)closing tag” then you start having to remember a list of “the rules do apply to these things, but not to these things, unless you’re in this context in which case you may also need to do this…”; i.e. needless confusion and complexity rather than simple consistency.
There are some things with the XHTML approach that are bad; e.g.
'readonly="readonly"'
is a tautology; why not'readonly="true"'
… Just adding'readonly'
is sort of OK (it’s less wordy, and if the default’s always false it’s clear), but again that’s inconsistent, so I’d rather take the hit of having it equal something than just having the attribute name there.I also wonder how many hours are spent globally in HTML design discussions over each element/attribute, which could be spared by saying “let’s just follow the generic rule” allowing time to be invested in more productive discussions about creating new functionality / adding value.
Nice overview about xhtml and html,
one missing element to mention is the ‘hard hierarchy’ of a xhtml page through the code structure.
Tags can’t cross them in xhtml :
text A
text B
[ because of rules I can’t demonstrate crossing tags by html and by xhtml with the bold tag ].
text A and text B could display “bold” in html but not in xhtml. It’s a light loss for an effective style tool.
a constraint required in your pages : the doctype, as you don’t write it, your page goes ‘quirk mode’, it’s a great tool for all codes in your browser.
While in ‘quirk mode’, nobody can prefer xhtml or html, because of the remanent display of your pages.
It’s a large open door for all coders at work, beginners or experts. It’s flexible code written and read.
You can play with html/xhtml by deleting tags at their middle and see the display and render working as fine as possible too. ( it’s permissive, not well formed , but quirk mode ).
xhtml had put out far the sgml style, and use open and close tag in front, so readability is made easy ( because of start and end of ‘code areas’ too ).
single tags then required / : “” like a closure.
xhtml is the base for ‘xml datas’ integration at first, it’s a layout for xml supports at birth, and all xml derivatives.
Browsers today handle lots of ways to code now !
capitals letters are quiet ! REALLY !
Thanks for your work.
I will stick to the XHTML way, thank you. The original idea to allow sloppiness to be the language of the Web is not something I support. Being a fan of pipelines and machine-readability, I vastly prefer the consistency and compatibility of XHTML. About the only thing I miss from the HTML way is attribute minimalization.
One thing I learned in 30 years of coding: Code should always be as uniform as possible.
Consistency is a lot easier than special rules. I don’t want to have to remember which tags need closing. I don’t want to have to look out for potentially-wrong syntax. Why would anybody want to write uppercase tags? Readability? That’s what syntax highlighting is for.
General rules mean less strain on your memory. Fewer decisions to make. Fewer mistakes.
Every time you have special rules, you have to explain those to new team members, and they will get them wrong. Every single time.
While in principle, just HTML is leaner and cleaner, the reality is it makes a bunch of assumptions. To validate it properly requires having to know all possible elements and their optional rules, including browser specific extensions. Otherwise, we’re left with the mess from back in the day when most browsers had to be very forgiving of the horribly broken HTML which was rampant at the time. Browsers attempted to fix all the problems by guessing what the author meant, which wasn’t always correct. At least with proper XML formatting, any document can be structurally validated/interpreted without knowing special rules (for better or worse). And while the use of extension elements isn’t as bad as before HTML4/5 standards, it still has to be accounted for.
Though, personally, the moment they used just “html” as a catch all DOCTYPE for the standard (and not at least “html5”), I lost faith in its standardization.
Less is more – less code = faster for front and backend.
Never made sense to me to add closing tags on lists – so I didn’t. Nor did I add the backslash for br or hr tags.
Still doesn’t make sense to use strong versus b for bold; or, em versus i for italics.
There are more like this imho. Or, is it IMHO. LOL
My goodness, I got so much flax in a Discord Channel because I ventured out and started not closing my tags. My code still validated perfectly. The Abuse – from a hysterical crowd, so I just left. Thanks for the article. I love semantic HTML. And coming from the old days of XHTML .. (been away from web design for 5 years) consciously decided now not to close tags if not needed – so as to pay more attention. Didn’t know about the quotes, will remove them too. I always validate my HTML, always.