xmouse

How to: semantics

Semantics: you may have heard it of it before, but you probably haven’t. Chances are though, if you’re a beginner web developer that’s reading recent articles, you may have heard the slogan ‘Tables Are Evil’. Indeed, this is a convulted version of the slightly longer but more politically correct version, ‘Tables for layout reasons Are Evil’.

Huh?

Okay: you’re using the table element to lay out your pages, I’m guessing. Yeah? Using a couple of trs to simulate presentational effect? I’ll let you have it simply and bluntly: Using markup to create presentational effects is just plain wrong!! That’s because we markup in a markup language like XHTML or HTML which describes data with tags. For example, code describes the text inside as being in code, or a computer language. You can probably guess the meanings of del, blockquote and link too, and if you’ve coded for a bit longer, you may understand ul (unordered list), pre (preformatted text) and strong (strongly emphasised text).

Get it? Every tag means something. I could go on with examples, but the idea is that humans reading documents will be able to use the data in a better way than in textfile, as the usage and meaning of all of the data will already have been explained. If humans can do it, robots can do it too, and indeed advanced speech software can already read out web pages, with a different, emphasising voice for em sections, etc.. Another example: Google’s define feature uses the meaning of dl to pick out examples of defined terms.

Semantics is this idea, and also the art of using the correct meaning element for a particular situation. Where most people get it wrong is when they use one tag that gives the wrong meaning to a piece of data, for the same presentational effect. Example: using blockquote to indent text. As I said before, don’t touch presentation with markup. That’s CSS’s job.

(The HTML 4.01 spec contains the meanings of most of the existing elements. If you get lost, that’s the place to go.)

Why?

Semantics generally gives a better structure to your document. Look at a site created with a table based layout, then compare it to your average CSS based website (this one, for example). The CSS one will generally have less code (and therefore a smaller file size, which means quicker downloads and less bandwidth costs). Not only this, the sections are well defined and seperated through use of the div tag. A tabular layouted site may have parts of the same data distributed throughout the document, or piles of meaningless junk between the real content (we call the amount of semantically rich data to semantic junk the signal to noise ratio). A high signal to low noise ratio means a whole wealth of useful things, a high Google ratio certainly being among them!

Also, dumb computers can understand your pages, as you’re using a database system which describes itself. Humans seem to have the inate ability to extract semantics from plain text, up to the point where we know what tone of voice the text was written in, so to speak. Markup languages are languages created so that computers too can have this ability. Ever wonder why nearly every book has a glossary? They’re useful. Semantics provides a socially-defined glossary for HTML.

Semantics is debateable

Dan Cederholm’s SimpleQuiz is a very popular discussion into semantics. I mentioned that semantics is the art of using the correct element to describe the correct data, and this implies human decision, and where we have a non-boolean result (a grey area) there evolves discussion. Anyway, it’s a great read, and will set you straigth on the path of semantics, and what a fantastic tool semantics is.

But basic semantics is easy: don’t use (non-existant, I might add) elements such as b or i to achieve presentational effect, instead use semantically rich elements to actually describe what you’re doing with the text. Using b to give extra-strong emphasis to some text? Instead, use the strong-emphasis element, strong. Using it to make something stand out as a section header? Use one of the header tags, h1 (most important, one to a page), h2, h3 through h6 (least important). I’m sure you can find more examples of ways to apply semantics.

That’s basically all there is to semantics. There’s plenty of information online, a quick Google search should sort out some more information for you if you require some. Also, feel free to post comments with questions or email me if you prefer (dave{at}xmouse.ithium.net).