Semantic Intention

A recent mailing list thread had me thinking today about the old problem of where semantics exist. They aren’t hard-coded in your (X)HTML tags, for example, which probably sounds like a strange thing to say. But it’s true… to an extent. Markup languages define how you can structure documents with a lot of precision, but can only weakly enforce what the tags themselves mean.

If that’s not making sense, consider a schema. It tells you what permutations of elements and attributes constitute a valid document instance, nothing more, and certainly not whether you applied the tags you used as designed or intended. The XML spec is not a helpful guide, either. It describes the rules for elements and attributes, but not what they mean, or how to codify or extract any such information. SGML is no different, and while HTML has tried to shake itself of both heritages recently, it’s no less susceptible to this problem as a result.

Semantics typically only find their way into markup through prose specifications that define the purpose of tags. But few people read specifications; most fall into the trap of reading the names of elements and inferring their own semantics. It’s why you try to match names to meanings when you create a grammar, but the names alone only say so much.

The easy example of this problem is emphasis. Tell me the difference between em, strong, b, i from their tag names. Now throw in CSS styling for italics and bolding. Are you sure you can build a completely unambiguous usage model from these tags, one that is readily apparent to everyone else trying to use them without heavy commentary from you?

Of course, this is where specifications have to come into the picture to layer semantics on top of the names. The HTML5 spec has tried to create a meaningful model for emphasis tags, for example, but there was a simpler model for them in 4.01 that rarely got implemented properly out in the wild. Try explaining to someone that just because a keyword is bolded doesn’t mean it has importance per the strong definition, that the bolding only distinguishes the keyword visually.

Does it mean there’s no point in trying to semantically tag content? Of course not. There’s value for accessibility in being able to distinguish emphasis for stressing or importance versus emphasis for drawing attention to keywords versus emphasis purely for visual reasons, for example. There’s a lot more value in meaningful structure, and in separating primary from secondary content. Getting content tagged semantically is more than just an annoying exercise in making data architects happy.

(The misapplication of strong, em, b and i is one reason why emphasizing of the content in ATs gets disabled, but if we never fix the problem the tags might as well be thrown out since they all amount to visual styling in that case. Granting that some users will always hate hearing the words stressed in an aural rendition and will opt to disable emphasis, but that’s a decision the user should always be able to make.)

Semantics are just a hard problem to solve perfectly, especially since a specification like HTML can’t possibly account for every possible use of its general purpose tags. You’re often on your own in niche cases that don’t fit nicely into the defined semantics, and there is often more than one way to tag content (and depending on the context, it is perfectly valid that similar structures could be represented differently).

The other problem of codifying semantics is what do the semantics mean when rendered? One of the confusions around em and i and strong and b is the presumption that they are always presented as italics and bolding, respectively. But that is one interpretation of how to visually present content so tagged. How does one convey the semantics in another medium? How does a TTS engine convey emphasis over mood, for example?

Throwing more confusion in, a word can be intended to be stressed without any use of italics (e.g., the tagging enables aural renderings). Try explaining to someone that they should add em¬†tags to content that is not visually emphasized to support an alternate presentation where that information is needed (e.g., scansion symbols in poetry could indicate what is stressed and unstressed, but you’d still want to tag the word that needs stressing even though it has no visual distinguishment).¬†Likewise, bolding could indicate emphasis not importance, depending on the context.

Unfortunately, there isn’t a nice simple wrap-up to this post. HTML markup often seems deceptively simple, but you have to be constantly vigilant about whether the semantic system you’re building up in your head only exists there.

Leave a Reply

Your email address will not be published. Required fields are marked *