Evolution of Publishing Semantics

Think you’ve finally got your bearings in the shifting world of publishing semantics? Well, get ready for another change. One of the things I’ve been working on over the last year is to help develop a specification that will bring many of the semantics defined in the EPUB Structural Semantics Vocabulary more directly into the HTML/accessibility world via integration with the ARIA role attribute.

A first draft of what is now called the Digital Publishing WAI-ARIA Module is available for review — but it’s still too early for use, so don’t rush out to use it after reading. What is being sought at this time is feedback, so I’m going to give a little rundown of it and its development for this post for anyone not familiar with its development.

History Repeats Itself

Dealing in publishing semantics keeps making me think of a line from an old 80s song… Oh no, oh no, round and round we go.

This digital publishing module represents the third time I’ve helped build a semantics vocabulary. The first time was as part of the ANSI/NISO Z39.98 Authoring and Interchange specification. I started working on that back around 2008 when I was still at CNIB. We needed rich semantics to produce high-quality braille, as did others participating in the work, so we also developed what can only be called the behemoth of all digital publishing modules. It covered everything everyone could think of using.

But key to note was that it was developed as an aid to internal production of formats, and to facilitate the exchange of data. It wasn’t written specifically with an eye to presenting the semantics to end users, as EPUB 3 didn’t exist and the ARIA role attribute didn’t handle publishing semantics.

DAISY to EPUB

Right at the tail end of the development of the Z39.98 specification came the start of the development of EPUB 3, and one of the goals of EPUB 3 was to bring richer semantics to publications. Back in 2010-2011, though, HTML5 was still changing and shifting and there was talk of it taking another decade to become a recommendation. I wasn’t in on the discussions that led to minting the epub:type attribute — I was still wrapping up that DAISY spec — but legend has it that the HTML folks pointed us at a namespaced attribute as the most appropriate way to extend HTML as we needed to add semantic inflection on elements.

The ARIA role attribute was also considered at that time, but it wasn’t extensible, or there was a lot of contention around its extensibility (see the defunct XHTML role attribute as an example). In the end, we got extensibility, but with some baggage that by using a namespaced attribute we were further tied to XHTML.

An attribute without a vocabulary would be pretty useless, of course, so from the first culling of the DAISY vocabulary was born the EPUB structure vocabulary. Its paring down was an attempt to find a more reasonable set of semantics commonly used by publishers. If that sounds like a lot of guesswork, well, this sort of work is always interpretive. It’s still quite a large vocabulary, though.

The focus of the semantics also shifted in bringing them into EPUB. EPUB 3 was a merging of mainstream and accessible republishing, so the vocabulary was designed to (ideally) meet two goals: enriched structure for the production side, but also actionable semantics for consumption.

It’s hard to say a few years out that the potential for semantics on the reading side has played out all that well yet. Reading systems largely haven’t taken advantage of the semantics to provide richer information about the text, or to allow automatic skipping of content. There have been some breakthroughs, like pop-up footnotes, but the potential still remains somewhat untapped.

Part of the reason for that failure is that the reading-side potential is already covered by ARIA, if imperfectly for publishing. In the DAISY days, reading systems would have been built to work off independent features like epub:type, but there’s no point having a competing technology when one already exists. Who wants to build another ARIA?

EPUB to ARIA

That long story brings us to today… integration of publishing semantics into ARIA, which began as part of the digital publishing work going on in W3C. In looking to EPUB 3.1 and beyond, specifically at integration of EPUB with plain old non-XML HTML, the ARIA role attribute could potentially be a replacement for the XHTML-bound epub:type attribute (although that’s probably a longer-term goal, and not certain).

Long story short, another cull began last fall, to look at reducing the EPUB vocabulary further to pare the list down to the most essential of semantics for navigating and consuming a document. The intention was to present to the ARIA folks a list of the most broadly-used terms across as much of the publishing sphere as possible, but with particular focus initially on trade publishing. Ensuring that these terms would be valuable for accessibility, and not just semantics for semantics sake (also called production use), was a prime concern.

The resulting list was much tighter than either preceding vocabulary, and, in my rarely humble opinion, the chance to really sit down and review the definitions led to a much better and more easily understood vocabulary. (I’m hoping we can backport a lot of these changes to EPUB in the next revision.)

But enough of the history lesson on how we got from there to here. To the vocabulary itself…

Why the prefix?

The first thing you’re going to notice when you review the digital publishing module is that every one of the terms is prefixed with “dpub-“. It’s clunky, but that’s the web way of namespacing things, and avoids collisions. Here’s how you would indicate that a section is a chapter, for example:

<section role="dpub-chapter">

Compare and contrast with how the same is achieved in EPUB right now:

<section epub:type="chapter">

A namespace on the term is not the most elegant solution in some ways, but I guess elegance has to take a backseat to practicality sometimes.

The insurmountable problem that necessitates a prefix is that digital publishing isn’t the only field that ARIA roles can be expanded to, so there has to be a way to prevent the same term with different meanings in different domains from colliding. An example often cited is that publishing has a “part” for a collection of chapters while the SVG module could conceivably introduce a “part” for engineering drawings. By giving up ownership of the means of implementation we also have to cede that publishing is no longer preferred but one piece.

One of the real drawbacks of this approach is that it makes naming clunky outside of the web. You can’t drop that “prefix” for internal production, for example, and get valid results without modifying your validator, as it’s a hardcoded part of the name. (Side note: validation of the dpub-* terms is not possible at this time anyway, since the vocabulary is still under development.)

It’s also not clear if “dpub” is the greatest name for prefixing, as people might miss that there are terms that are broadly useful outside of digital publishing (glossary, index, etc.). No one has yet come up with a better prefix, though.

Another longer-term oddity is that some of the terms defined in the module might move into ARIA core, in which case their prefixes would be dropped (i.e., if there’s no perceived chance of collision, or if in the case of a collision the role we’ve defined would be preferential). We then live in a world where you have to mix-and-match prefixes, but that’s somewhat already the case given that there are some useful roles already defined in ARIA core.

But I’m not trying to be hyper-critical of integration, only noting some of the issues that come up when you start looking at how to merge existing work.

States and Properties

If you’re at all familiar with ARIA, you’re familiar that roles are accompanied by states and properties. The good news is that you probably won’t have to dig deep into these to apply the publishing semantics. Most are simple landmarks that you can attach to your markup.

But, with ARIA comes a lot of power to enhance the information available to the reader. For example, it’s been suggested that you could inform the reader which chapter number they’re at, even if it’s not part of the heading using a couple of property attributes:

<section role="dpub-chapter" aria-posinset="15" aria-setsize="34">
   <h2>Frack This!</h2>

Here we’re saying that this is chapter fifteen of thirty four. (This usage isn’t valid to the module as it is defined right now, which is why I’ll keep repeating the caveat not to use these roles yet.)

We’ll be providing more information to the module about the most relevant states and properties to use as development moves along.

Extensibility

One of the drawbacks of the ARIA role attribute is that it is not extensible by anyone, or at least that’s not the intention of it. If you want to extend the roles that are available, the expectation is that you work with the maintainers to develop a module.

The reason for the restriction is a prudent one. As the role attribute has accessibility implications (i.e., it can trigger behaviours in assistive technologies) you can’t be too cavalier about which roles you insert. For example, applying the alert role will cause the text to be immediately read to the user. That’s probably not what you want if you’re just adding a sidebar.

Using the publishing roles won’t cause any side-effects like that. Most are based off the landmark role, which is used by assistive technologies to build a list of key points on the page the user can reach (very much the same as EPUB’s landmarks nav, but limited in scope to the current page).

I’m getting the feeling that that’s probably more than enough for one day, especially for a module that isn’t yet ready for deployment.

Feedback

I’ll end simply by reiterating that the module is only a rough first draft at this point, so there’s still plenty of time to make changes. If you spot something that doesn’t look right, or want to suggest additions or modifications, the comment period is open until September 11. And that’s just for ensured consideration for the next draft; you can comment whenever you want so long as development is still going on.

There are a variety of ways you can provide feedback listed in the status section, but the best option is to log a bug in the github tracker. That’s where all issues end up however they come in.

One Reply to “Evolution of Publishing Semantics”

  1. Hi,

    Thanks for your perspective on this important issue! Any semantic vocabulary would imply a particular context, so it’s a tough call for the Web, but could be just enough for ePub. Please keep this topic active while you blog next time. Thank you!

Leave a Reply

Your email address will not be published. Required fields are marked *