Okay, so I’ve been working at the problem of how to express complex schema.org types in EPUB over the holidays as part of putting together an integration guide for the LRMI and accessibility properties, so figured I’d share a few discoveries.
One consideration I don’t think gets enough airing is that the package document
link element is not an equivalent of the HTML
link element. They look an awful lot alike in terms of the attributes they can take, but the EPUB element is strictly for attaching resources to the publication. Those resources could be used in the rendering of the publication, but do not have to be supported by reading systems. Metadata records are the obvious example of a resource that could be useful, but equally could be ignored, and most of the relationships define record formats.
link element, on the other hand, is not as strict. You can use it to associate resources, most notably CSS style sheets, but you can also use it to establish more general relationships between resources. When adding metadata, you can also use it for controlled vocabularies such as the BookFormatType enumeration.
It’s really tempting to translate HTML usage into the package document, but you have to resist the urge. For example, you’d probably want to tag the LRMI
link (adding strikethrough to highlight this is bad practice):
<link rel="schema:isBasedOnUrl" href="http://example.com/Probability-and-Statistics-book.html"/>
But to be valid to EPUB you have to include the URL as a text string in a
meta tag, like this:
It kind of makes me cry to look at, but that's just the way it is.
Since the package document metadata doesn't implement RDFa, doesn't allow nesting and requires non-empty
meta tags, my first impression was that it would be a headache to implement any property that doesn't accept some form of basic text value (i.e., any property whose expected type is another schema.org type).
I have to backtrack from that initial analysis, as the headache is actually no worse than modifying any property using the
refines attribute. The only open headache I have is what to make the value of the property that contains the subexpressions.
But to back up a bit,
educationalAlignment is a prime example of the kind of complex property I'm talking about. It's expected type is an instance of the schema.org AlignmentObject, and basically is a container for a set of sub-properties that express details of the alignment. Here's an example from the LRMI wiki on the W3C WebSchemas site:
<li itemprop="educationalAlignment" itemscope itemtype="http://schema.org/AlignmentObject"> <meta itemprop="alignmentType" content="teaches"/> <span itemprop="targetName"><a itemprop="targetUrl" href="http://asn.jesandco.org/resources/S11435AF"> Determine whether two events are mutually exclusive and whether two events are independent.</a></span> </li>
As you can see, there are three subproperties being expressed:
targetUrl. (I'm not concerned that
targetUrl is nested inside
targetName in this markup; all three will be treated as siblings.)
As I mentioned already, the
refines attribute was implemented exactly to allow subexpressions to be defined, so looking at this example the obvious solution is to drop the nesting and just attach the subproperties to
educationalAlignment. And that's the general idea, so you'd probably start out with a quick reformulation like this:
<meta property="schema:educationalAlignment" id="ea01"></meta> <meta refines="#ea01" property="schema:alignmentType">teaches</meta> <meta refines="#ea01" property="schema:targetUrl">http://asn.jesandco.org/resources/S11435AF</meta> <meta refines="#ea01" property="schema:targetName">Determine whether two events are mutually exclusive and whether two events are independent.</meta>
The obvious problem remains that
meta tags can't be empty, so the initial
educationalAlignment declaration is invalid. While it's easy to suggest all kinds of hacks to put in, trying to come up with a predictable and meaningful solution is a little harder.
Here are the two rules I came up with to work around the problem:
- EPUB must not allow the kind of value laxity tolerated generally by search engines. If a property expects another type as its value, that type must be used (e.g., you can't just drop a string value in, like people seem to do for the schema.org
authorproperty). This rule will ensure that reading system developers can predict what the value of properties will be, and which can be ignored, so there will never be a question of whether a value is real or a placeholder.
- The value of these properties must be the name of the item type being used. Again, this makes for predictability and also allows you to indicate what type you're using. In the above case the property only expects an
AlignmentObject, but it will be useful generally where options are possible (e.g.,
Organization). Remember, the rules have to be generally applicable to any schema.org metadata, not just the educational and accessibility properties.
Applying to the above, then, we just have to add the type to the empty property:
<meta property="schema:educationalAlignment" id="ea01">AlignmentObject</meta> <meta refines="#ea01" property="schema:alignmentType">teaches</meta> <meta refines="#ea01" property="schema:targetUrl">http://asn.jesandco.org/resources/S11435AF</meta> <meta refines="#ea01" property="schema:targetName">Determine whether two events are mutually exclusive and whether two events are independent.</meta>
I haven't yet found a downside to this approach, but if someone spots a potential gotcha please feel free to point it out in the comments...