One of the cool new additions to EPUB 3 is the ability to include schema.org metadata. Although, in truth, it’s actually two new additions in one… or is that one new addition in two?
The obvious way to include schema.org metadata is in XHTML content documents now that both RDFa and microdata attributes are in the soon-to-be-final 3.0.1 update. If you’re wondering why you would want to do this, think about your book being opened up to indexing by a search engine. Rich metadata in the source sure isn’t going to harm your chances of improving your discoverability.
The less obvious place where schema.org properties can be used is in the package document metadata. The 3.0.1 revision saw the prefix ‘schema:’ reserved for adding this metadata, meaning you don’t have to declare it in an epub:prefix attribute every time you want to use it.
So what are the cases that might prompt you to include schema.org metadata in the package? Well, the primary one that led to the prefix being added was to be able to express accessibility and educational metadata. The LRMI project already has a set of properties in schema.org, and the a11y metadata project is about to see four properties added with a few still under discussion and hopefully to be added not long after. I won’t go into details of those now, as they’re both on the radar for EDUPUB, so there will be more info coming (I know because it’s on my plate), but here’s a quick example of how you could indicate that the publication includes captions for video content:
It’s as simple as that when it comes to using schema.org properties now. Well, almost…
(If the mechanics of metadata make your head spin, best to stop reading now. I’m thinking a new accessibility hazard is needed to warn people my posts may cause delirium and fatigue:
There are some limitations to the metadata you can express in the package document, which is what I’ll be looking at in this post. For one, EPUB has a quirky form of metadata that’s sort of like a really reduced form of RDFa but limited to property expressions. There’s no equivalent to the typeof/itemtype attributes, for example, in part because all primary metadata expressions apply to the publication, and without nesting of
meta tags there is no reason to set a new type. That’s really a pain when you get into the kind of structured metadata many schema.org properties take, and also makes it ambiguous what item type you’re taking properties from.
So where does that leave you in terms of how to use schema.org metadata… that part’s a little unclear. The “publication” can be interpreted a number of different ways in schema.org. My personal take is to treat the package metadata section like an instance of the CreativeWork type and use whatever properties are allowed in it to make statements. Reaching for the Book type might seem more natural, but as EPUB is intended to be usable across publishing formats, it’s a little limiting. You certainly won’t do any harm if you limit yourself to the Book properties, since it is a subset of CreativeWork. (You can find the whole list of available derivations of CreativeWork on its definition page or in the full hierarchical list.)
It does kind of stink that you can’t say what item type you intended to use, though. And it does kind of stink that the EPUB model sort of encourages treating all of schema.org like a buffet. But after a while you get used to the smell.
The inability to nest metadata is more of a kick in the old you-know-whats, at least to a metadata purist like me, as you lose the ability to create rich expressions. It’s not the end of the world in a lot of cases — schema.org is pretty lax about how you express metadata — but you’re going to have to use string values where a schema.org property prefers a subtype. Take the author property, for example. If you were to express this in HTML, it might look like this:
<span property="author" typeof="Person"><span property="givenName">Matt</span> <span property="familyName">Garrish</span></span>
The above is just scratching the surface of what you can say about someone using the Person type. In the EPUB package, on the other hand, all you can define is a property and value, so there’s no way to indicate that the value is a subtype or to provide any context to the name:
<meta property="schema:author">Matt Garrish</meta>
As I said, this simplicity isn’t the end of the world for basic property expressions, but it does get to be a nuisance when you do want to express structured metadata.
But then I’m trying to suggest that schema.org metadata is arriving as a replacement for the DCMES elements. It’s more just a complement at this time, since those elements are so engrained. It does leave me lamenting that schema.org was a couple of years too late arriving, though, or we might have had an alternative.
It would certainly be interesting to see whether the CreativeWork metadata could replace the metadata EPUB has right now. It would take a major new revision that puts RDFa Lite attributes into the package metadata (another development a couple of years too late), plus allow nesting, but the horror that is hacking meaning onto the DCMES elements has to end some day.
For now, adding the prefix in 3.0.1 addresses a key need for a11y and educational metadata, so I can live with the ambiguities that come alone with grafting the mechanics onto the EPUB metadata.