Semantic Overload

So, does EPUB 3.0.1 adding both RDFa and Microdata attributes officially qualify it for semantic overload? Are there just too many ways to express semantics: epub:type, ARIA role, RDFa, microdata and even microformats can all be used.

The answer is probably yes and no, with any fault you might be inclined to find lying at the feet of the W3C where these things are ground out. One of HTML’s drawbacks has been the lack of a standardized way of expressing meaningful information about the structure and content of documents. It’s led to the current proliferation of mechanisms, each of which serves a useful function, but the sum of which invariably makes for confusion.

A common question originally with EPUB 3 was why the need to add yet another mechanism to the pile by the addition of the epub:type attribute. This attribute was defined for semantic inflection, or the ability to make statements about the structure of your content (e.g., this generic HTML section element represents a chapter, or this generic hyperlink is actually a reference to a footnote). But that’s sort of what the ARIA role attribute does, too, right?

Well, yes and no (get used to that answer). The ARIA role attribute was designed to identify the structures of a web page for accessibility purposes, and has a closed enumeration of values. That’s great for what it does, but the HTML5 spec goes one step further and requires that the role attribute, when used, must only have one of these ARIA-defined values. That effectively slams the door on custom semantics, such as all those defined in the EPUB Structural Semantics vocabulary.

That incompatibility was why the epub:type attribute was minted, and given a note that it was designed to be more like the generic W3C role attribute (unfortunately, you can’t swap that more general purpose attribute in and the ARIA attribute of the same name out).

You could try to argue here for microformats instead of a new attribute—and people have—but support never materialized around this approach. My own personal distaste for microformats is that they intrude into the authoring sphere when you implement them through the class attribute (granting that there will always be attributes like rel and rev that are well suited to token-based values).

Whatever original designs there might have been for the class attribute, it’s now the de facto standard for apply CSS styling, and no matter how you try to tweak microformats to be semantic and not just class groupings I always see problems. How can you answer in any reliable way which class values are semantic indicators and which are presentational groupings, for example, especially when the semantics are intended to be extensible? The stopgaps, like prefixing semantic values to make them unique, make for a poor man’s metadata framework (and that’s not touching on the lack of a clear processing model).

I’m sure if you were to poll the WG members you could find manifold reasons why we didn’t go the microformats route. Probably more important from the overall format perspective than any personal complaints like mine was consistency across metadata expressions. The package document already had RDFa-like metadata, so implementing another mechanism in the content gets us back to the proliferation I started out to look at.

But that’s the interplay of epub:type, role and class/microformats in as much of a nutshell as I can put it: epub:type exists to work around the limits of ARIA role, acting like W3C role, and avoiding the ambiguity of the class attribute. The dream remains to find a solution in conjunction with the W3C that would do away with the need for epub:type, but what that will be, and when it will materialize, is anyone’s guess at this point.

Knowing the confusion around semantic inflection, it probably makes for some head scratching to think that a similar issue was introduced by adding two semantic enrichment technologies in RDFa (Lite) and Microdata in the now-wrapping-up 3.0.1 revision. The EPUB WG in fact avoided adding either during the 3.0 revision because it simply wasn’t clear if one or both would survive (or the remote chance that something new might have been spun out from them). It doesn’t seem, at this time, that either is going away soon, even if microdata’s future appears to remain tenuous.

Can you pick one over the other for ebooks is the question you need to clearly answer if you want to do away with the other.

Search engine optimization could factor in too a usage decision, but only if you’re putting content out on the web generally. Since that’s the exception not the rule, it tends to relegate microdata’s usability. It’s main currency comes from schema.org, but schema.org too often feels like bloat (in the sense that more is included than is actually supported). I could see the CreativeWork class being useful for package metadata, but it’s realistically too limited a vocabulary once you push beyond simple identification. Since you can’t effectively use microdata with other vocabularies, and extending schema.org isn’t a simple process, I’m not sure what traction microdata/schema.org will find. Factor in that the publication metadata, at least for books, is travelling in an ONIX message for use in distribution channels, and the applications are further reduced.

RDFa Lite is probably the more likely to find adoption, since it’s suited to enabling expressions using custom vocabularies. The Advanced Hybrid/Layout working group has already proposed using the attributes with a custom vocabulary to identify components of a region of interest. I’m not sure if this vocabulary or use will survive, but it points to what I expect will be more of the norm: custom vocabularies that enable publishing-specific needs. That you can use RDFa Lite for everything you can use microdata for also points to convergence around it, as there’s nothing people hate more than being told to do essentially the same thing two different ways depending on the context (and microdata is not preferred over RDFa lite as far as the major search engines now claim).

But prognosticating the future is a pointless exercise. Each may find its own niche in EPUB, or the opposite of what I just wrote could play out. Only time will tell.

To wrap up, though, although EPUB has quite a number of semantic mechanisms, it’s hard when you break them down to see which ones could be done without. While it might seem like there is overload in terms of numbers of technologies, the reality is that they all have useful interplay that allows for richer documents.

It can just feel like a lot to learn sometimes.

Leave a Reply

Your email address will not be published. Required fields are marked *