‘Perfect’ semantics

Having read Andy Budd’s entry on how total seperation of content which remains fully semantic and the presentation isn’t possible, my thoughts on this subject bubbled to the top of my mind once more.

Not just XHTML

Do you know what the ‘X’ in ‘XHTML’ stands for? It’s (taking the English language in a rather liberal manner) eXtensible. That means that in the future, we could add different bits and pieces to XHTML without spoiling the general structure of it — the very principle of the internet. Imagining browser support is perfect, we could add different XML tags to our document and (along with the correct namespace declarations) the page would still work, while adding extra function or content. In theory this is one way we could achieve perfect semantics: if there’s no tag in XHTML that would serve our purpose, we make one up (for example, <navigation>).

But this method is flawed. For semantics to exist we need a well-developed markup language. One big RSS debate (once we get past the debate of what RSS actually stands for!) is what <description> should be used for — full posts or just excerpts. This argument would cease to exist if RSS held some semantics about it. The meaning of <description> would be defined, and we would use it in that fashion, and ignore the other.

Semantics is also lost when we dip into XML. XML doesn’t carry any meaning around with it’s tags — we need a markup language to do that. Does <table> mean a piece of funiture or a set of rows and columns? We need a subset of XML to define that for us. <table> in XHTML can only mean a set of rows and columns, and if there existed a FunitureML, <table>’s semantics under that would also be defined.

However, ignoring the second half of the promise contained within application/xhtml+xml’s name would be a huge blow, and we should instead try to counter this problem rather than not use the problem-producing code.

Giving semantics to made-up elements

So what can be done to combat this problem? How about using something similar to RDF, which instead of describing how a document is made up and its relationship with other documents structurally, we could have a document that describes the meanings of all the unknown tags within a document, like a glossary. For example, if using our made up tag <navigation>, we insert a new definition key-value pair into the glossary that tells anyone that’s interested that any data inside <navigation> is a set of links used to navigate the site.

But this doesn’t carry across all the benefits of semantics. Google’s ‘define:’ feature relies on the definition of <dl> set into stone. It would be extremely hard for a robot to parse a glossary which was meant to be read by humans.