XHTML5/CSS Namespaces

Due to my lazy and haphazard reading of Twitter, I only just spotted a question about whether to use CSS namespaces or escape the colons in namespaced element/attribute names, since I have a few selectors using both in the accessibility guidelines on the IDPF site.

It seemed a bit after the fact to respond, but I wanted to give the topic some further explanation here, as anyone who can say anything meaningful about namespaces in under 140 characters deserves an award of some kind.

First, a word about namespaces and prefixes. If you think they’re impenetrably complex and evil, let me first try to dispel that they’re a difficult a concept. (Granting that they do get complex in some advanced implementation scenarios, and are rarely high on the fun list to manually implement.)

Namespaces, in a nutshell, simply provide a means of identifying which elements/attributes are from which grammar, as XML was designed to allow composition of documents by reuse of specialized grammars like MathML and SVG.

(If you’re gearing up to argue there are only a few grammars and they aren’t likely to collide so let’s do away with namespaces, HTML5 already did that for the web. I’m not out to argue the merits and failures of that approach today, only explain their use in XHTML5, so back to the topic at hand…)

A namespace is just a unique identifier in the form of a URI. The namespace name, as this URI is called, is not required to point to anything on the web — it’s not required to be a functioning URL — although people often expect to find a schema or specification at the location. Its only purpose is to identify.

Namespace names are defined using the xmlns namespace declaration pseudo-attribute (“pseudo” as in it’s not a real attribute you can manipulate through the DOM or via XSLT, even though it looks like one). You see this “attribute” every time you create an XHTML page, for example:

<html xmlns="http://www.w3.org/1999/xhtml/">

Here it says the root html element and all its descendants are in the XHTML namespace by default. If any of those descendants define their own namespace, then the rule of closest proximity kicks in, and all descendants of that element are in the new namespace until another is encountered, and so on:

<html xmlns="http://www.w3.org/1999/xhtml/">
   <head>...</head>
   <body>
      <p>Still in the XHTML namespace here</p>
      <math xmlns="http://www.w3.org/1998/Math/MathML/">
         <!-- changed to MathML inside here -->
      </math>
      <p>Back to XHTML here</p>
   </body>
</html>

Another feature of namespace declarations is that you can define a prefix for all the elements in the namespace. The prefix you choose doesn’t make a whit of difference to a machine, but does make it easier for humans to read the markup and quickly determine which tags are in which namespace.

We could declare a prefix for all HTML elements like this:

<htm:html xmlns:htm="http://www.w3.org/1999/xhtml">

We still use the xmlns pseudo-attribute to declare the namespace, but the prefix is now declared after the colon that follows it. The htm: prefix will now have to be used on every HTML element for your document to be valid.

As a result, prefixes are typically only useful for embedded grammars, as it’s incredibly cumbersome to add a prefix to every element in the default grammar. You wouldn’t want to add htm: to each element’s opening and closing tag as you’re hand authoring a web page, that’s for sure:

<htm:html xmlns:htm="http://www.w3.org/1999/xhtml">
   <htm:head>
      ...
   </htm:head>
   <htm:body>
      ...
   </htm:body>
</htm:html>

It also makes it that much more difficult to find the changes in namespace if every element has a prefix. (Prefixing every element does have its benefits in content management systems where content components may have no “default” grammar, for example, or where content recomposition into new forms is common.)

One of the great confusions about prefixes is that they must mean something to the element/attribute name itself, being attached to them and all, when in fact they don’t. With one special exception — the xml:* attributes defined in the XML specification — prefixes are not fixed in stone, they’re just a shorthand. What is important is the namespace name you’ve mapped the prefix to.

All of these are exactly the same, in other words:

<mathml:math xmlns:mathml="http://www.w3.org/1998/Math/MathML/">

<math:math xmlns:math="http://www.w3.org/1998/Math/MathML/">

<m:math xmlns:m="http://www.w3.org/1998/Math/MathML/">

<foo:math xmlns:foo="http://www.w3.org/1998/Math/MathML/">

<svg:math xmlns:svg="http://www.w3.org/1998/Math/MathML/">

<math xmlns="http://www.w3.org/1998/Math/MathML/">

That’s right, exactly the same. The prefix is not a real part of the element name, so when reading namespaced elements the first thing to do is mentally strip everything up to and including the colon. In each case we have a math element, and we can see from each of the prefix declarations that all of these math elements are from the same MathML namespace, even the silly one using an svg prefix.

But here’s where the catch is: if you can use any prefix you want for an element, how do you write effective CSS that can match all the cases?

To many people’s surprise, namespaces have had CSS support pretty much since XHTML first rolled around. The mechanism just took a long time to get formalized as a W3C standard.

When there wasn’t universal support, and with so much XHTML being served as text/html and not being correctly treated as XML, the result was that in order to write a selector to match namespaced elements you typically had to include the prefix in the selector by escaping the colon, like this:

mathml\:math { border: 1px solid blue; }

(Colons indicate the start of a CSS psuedo-classes and pseudo-elements, which is why they must be escaped.)

The above selector matches only one tag in your markup, however: <mathml:math>. It will not also match <m:math>, even if both are mapped to the same namespace. What you get by escaping the colon is a literal match of the element name as written in the markup, completely ignoring namespaces.

And this is where namespaces used to cause fits, because so little content was delivered properly. If I use this escape syntax, I have to write a rule for every possible prefix in order to create a style sheet that can be reused across all my content. Using the previous MathML as an example, it might lead to all these declarations being needed:

mathml\:math,
math\:math,
m\:math,
foo\:math,
math { border: 1px solid blue; }

A ridiculous approach to namespaces, right? Of course, but it was never intended to be done this way.

CSS Namespaces to the rescue!

This specification formalized the syntax for declaring namespaces in CSS, and matching namespaced elements, so you can focus on styling instead of every permutation of prefix and element name.

To declare the default XHTML namespace, you add an @namespace rule like this:

@namespace "http://www.w3.org/1999/xhtml/";

To declare additional namespace prefixes, you add @namespace rules but with the prefix to use declared between:

@namespace m "http://www.w3.org/1998/Math/MathML/";

You can now use the prefix m for MathML elements regardless of the actual prefix used in the content file. That’s right, this prefix will match all of the MathML variations above, including without a prefix, just like you’d expect since they’re all the same.

When it comes to matching elements, the namespace specification makes one notable variation from how you traditionally use prefixes: a pipe character is used to separate the prefix from the element name (again, to avoid collisions with pseudo-classes and elements). For example, to match any math root element you would write:

m|math { border: 1px solid blue; }

And that’s namespace matching in CSS.

But guess what else, namespaces aren’t required to style namespaced elements in XHTML! Everything I just wrote defines how to write precise CSS rules, but if you don’t care about painting broad strokes you can leave out namespaces entirely. Were you wondering why your CSS styles for XHTML elements have always worked without your ever having declared a default namespace in your CSS before?

So long as your style sheet does not define a default namespace, the CSS 3 Selectors specification defines that any element selector without a prefix matches that element name in any namespace.

Or less jargony, if my namespace-less style sheet contains just this one rule:

math { border: 1px solid blue; }

It is the equivalent of matching:

*|math { border: 1px solid blue; }

Where the asterisk here is a shorthand for any namespace. All your unprefixed HTML elements get applied by this rule. You potentially open your content up to styling you don’t want by relying on this match-all pattern, but I’ll return to where this can be useful in a moment.

To start wrapping this up, though, that has to be enough of the story of why you have two syntaxes seemingly doing the same thing. One is a literal HTML-based approach to handling prefixed elements, and the other a proper namespace-based XHTML one.

What you’ll find with all modern browsers that render application/xhtml+xml is that the colon-escaping syntax is not supported, so you can’t just pick and choose which method you prefer (just like how namespaces aren’t supported in text/html renderings).

The question then is why would you ever want to use the colon-escaped syntax in an EPUB 3 content, and answers are hard to come by. A reading system incorrectly rendering your content as text/html is about the only possibility, but how much concern does that warrant?

You’re going to have to follow all of the polyglot markup rules if you want real cross-compatibility between XHTML/HTML renderings, and probably rely on the magical *| handling for CSS as much as you can to avoid namespace complications, so prefixes and colon escaping might only come into play if you plan to leverage the epub:type attribute for styling. You don’t get the same match-all assistance for attributes, so this will not work:

section[type~='introduction'] { ... }

You must define the namespace and use a prefix:

section[epub|type~='introduction'] { ... }

Which finally gets us to the reason I included both mechanisms in the guidelines site. Even though I wrap it up as a distributable EPUB, before jerry-rigging the right application/xhtml+xml headers through PHP, the web version was being served as text/html, and that was causing the CSS application problems I noted above styling the epub:type attribute. I included both selectors to cover both cases, and then left them once I got things working as they’re harmless co-existing.

That CSS isn’t there because it’s something I would normally consider useful for an EPUB 3 file…

Long story, eh?

UPDATE:

I’ve revised the epub:type attribute selectors in this post after reading Romain’s salient point below that I’d written a weak selector. For anyone not versed in the nuances of attribute value matching, the class attribute is special in that it is natively treated as a space-separated list of values when you use the dot notation. That’s why you can have a class like this:

<p class="style1 style2 style3">

and write a selector like this to match any p tag with the style2 class:

p.style2 { ... }

When matching against other attributes, however, you have to account for whether the attribute contains a single value, a space-separated list or a hyphen-separated list.

For example, my original epub:type selector in the blog post was written like this:

section[epub|type='introduction']

But as Romain noted, this will only match this exact markup:

<section epub:type="introduction">

It would not match this tagging with two semantics:

<section epub:type="introduction frontmatter">

As = is a literal test, I would have to rewrite the CSS selector like this:

section[epub|type='introduction frontmatter']

But, as you might image, if I switch the semantics around so that frontmatter is first, I need yet another selector. It’s just tempting fate to use equality tests on attributes that are defined to allow space-separated lists of values, such as epub:type.

To match any value in a space-separated list, you have to use the ~= operator:

section[epub|type~='introduction']

This syntax now matches introduction regardless of whether it is the only value or is one of a space-separated list.

The ~= syntax is what the dot notation simplifies for class, so the example at the start is exactly equivalent to this:

p[class~='style2'] { ... }

The other selector type not touched on is |=, which matches values in a hyphen-separated list. I’ve never found myself using it, but the selectors specification makes a good case that it can be used to match on language subcodes. You can decide whether you want to let that little piece of knowledge slip quietly off from short term memory into oblivion.

Tags: , ,

  1. Romain Deltour’s avatar

    Great post Matt!

    If I was to nitpick (as you know I often do :p), and even if it’s not the point of the article, I’d suggest to use the whitespace-separated list syntax of attribute value selectors for any style based on epub:type. For example:

    section[epub|type~='introduction'] { ... }

    There’s low chance that CSS will ever have selectors that could cover EPUB’s prefix declaration mechanism (for epub:type vocabulary associations), but at least the syntax above provides support for multi-valued epub:types.

    Reply

    1. matt.garrish’s avatar

      Thanks, Romain! Great point, as it’s easily forgotten that there are semantics like front/body/backmatter that can be used to complement others, and which will break a simple equality test.

      Reply

    2. Jorge’s avatar

      > The question then is why would you ever want to use the colon-escaped syntax in an EPUB 3 content, and answers are hard to come by. A reading system incorrectly rendering your content as text/html is about the only possibility, but how much concern does that warrant?

      I find that paragraph somewhat astonishing considering that what follows is essentially an explanation of how CSS namespaces are incompatible with the web. Who would want the same content to be interchangeably valid for the web and an ebook, right? I find it far from a situation “hard to come by”. I mean, you had the problem yourself when creating the accessibility guidelines, content which is as appropriate on the web as packaged in an ebook (by the way, whenever you work on an EPUB ebook, use the extension .xhtml for your XHTML files, so that your browser loads local files as application/xhtml+xml—i.e. renders them with the strictness and rules of XML, which is what ereaders will do, instead of plain HTML— and thus supports CSS namespaces along with its | prefixing syntax).

      Considering that the whole point of EPUB3 switching exclusively to XHTML5 was embracing the web, I personally think they blew it by then deciding to use an incompatible technology for the structured vocabulary specification.

      By the way, very interesting article.

      Reply

      1. matt.garrish’s avatar

        Right, but you don’t get a choice in terms of which media type you want render an EPUB 3 as.

        I was only pointing out there that colon escaping does nothing in EPUB 3 content when it is rendered as designed, and if you’re only making EPUB 3s for rendering on compliant EPUB 3 reading systems, it’s completely useless to add colon-escaped equivalents. I still can’t think of any use for it in that context, if those qualifications make that statement clearer.

        And in my defense, I did say you need to consider all the polyglot markup guidelines if want content/style sheets that can work for both renderings. If you want compatibility with a non-XML rendering, you’re certainly making a fair point in that you should either stay away from re-using the epub:type attribute for styling or also include an escaped equivalent.

        I won’t pick up on the bait, though. My only goal is to explain what we have, not what might be better!

        (And the guidelines site is dynamic beyond just correcting the headers, which I had to do because I have no control over the IDPF servers. The headers and footers for the web are templated, the checkpoints are repurposed from a single source, etc. I have a perl script that builds the static pages from the PHP for the distributable EPUB. Agree those built pages should have .xhtml extensions, but I hadn’t even noticed it was spitting out .html. Thanks for pointing that out, nonetheless!)

        Reply

Reply

Your email address will not be published. Required fields are marked *