I’m sure you’ve read the latest validation debate (yes, they are discrete, the blogosphere isn’t actually one long validation matters/doesn’t matter row :)). Ethan of Sidesh0w started it off with a post at WaSP, Keith exploded it, and 73 comments later over at Asterisk, no conclusion has been reached. Here’s my view, with a little spin-off mini-rant at the end for kicks.
First, let’s lay down some definitions. HTML can be good, XHTML can be better. I’ve got no problem with using HTML, as long as it’s done correctly. However, it’s undeniable that, although HTML won’t go away, XHTML is how future documents should be served. So we may as well jump on ship now rather than wait and try to convert when it becomes essential. But anyway, if we assume that we’re using XHTML, then we can seperate the term ‘validation’ into two component parts:
This is where I draw the line: validation (ie, adhering strictly to the DTD, that means not doing things like putting block level elements inside inline level elements) isn’t altogether too important. Well-formedness (encoding your ampersands, opening and closing elements in the right order), however, is. That’s because otherwise the document will be really difficult to parse, and, if you’re serving as application/xhtml+xml, the parser will give up and present a confusing error to the user (although I’m not necessarily in agreement that this should be the standard behaviour for XML parsers).
When working with HTML, we can get away with a whole plethora of errors and the parser will still present the page to the user, mostly by guessing. So here it’s less important to encode your ampersands, because it will only matter in the rarest of cases, and not send the parse to an abrupt doom. But I’d still say it’s important to give the parser as many clues as possible (read: make your documents well-formed), because it’s unlikely that parsers will become more forgiving.
So, validation ≠ well-formedness, validation is less important than well-formedness.
But (spot the mini-rant coming), why do we have to encode our ampersands at all? 99.9% of the time it’s a pain. This is true of so many of the W3C standards: they ban certain things on the grounds that in a worst possible scenario, things could go badly wrong. Why has XHTML sent as text/html been labelled ‘evil’? Because, if a user does a list of things in a certain, exact, order, things go badly wrong. We have to hack around a whole list of worse-case scenarios. We’re reaching for a kind of code-utopia which is backfiring in the form of making writing decent code for the web a pain in the arse.
But is this a bad idea? In my opinion, no, it’s not. Maybe another solution could be reached that means the masses can code easily without the worse-case scenarios popping up, but then again maybe this isn’t possible. Maybe this is the only way to do things. In which case, think about it: why does accessibility and usability go to such extremes to make websites so easy to use, assuming virtually nothing about the user? Because, and here’s the crunch point, the web is open to anyone. Anyone has the right to use, browse casually, be treated with a respect (which is what usability is trying for).
Extrapolating this, anyone, therefore, has the right to create code. The coder has every right to expect a perfect XHTML standard. For everything to behave as you think it should. And while non-encoded ampersands are easy to spot (the parser tells you where they are!), realising that none of your query-string variables are getting through because you haven’t encoded ampersands would take a whole lot longer to figure out.
That’s why standards are so debateable.