I’m sure at some point you’ve downloaded a preview of an ebook from your ebookstore of choice. Sometimes the content is utterly useless in terms of deciding whether you want to purchase the book — a bunch of front matter splatter. Is reading the copyrights, preface, dedication, etc. really useful? What if there’s an index at the back you’d like to look at? What if you’re the content creator who wants the reader to have access to that index?
Of course, sometimes you get useful previews, but I’m trying to highlight the problem of previews being in the hands of vendor: you can never be completely sure what will be in them. Unless the publisher runs the ebookstore, it’s not their choice what you see.
The EPUB Previews specification seeks to flip that paradigm back around so that the content creator is the one who decides what goes in the preview. The specification is not yet a recommendation, but should be sometime (hopefully early) this year. As it’s one of the more stable specifications the working group has published, I thought I’d give it a quick run-through, but caveat emptor if you try to jump the gun on it becoming a recommendation.
The specification actually defines two ways that you can create previews. The one you might expect is to create a standalone publication containing only the preview content, or what is called a preview publication. The other method is to identify the preview content within the package document, allowing the ebookstore or reading system to generate the preview (whether as a standalone publication in the case of an ebookstore, or by limiting access only to the specified preview content in the case of a reading system). Not surprisingly, these are referred to as embedded previews in the spec.
We’ll dig into each of these preview types for the rest of the post…
There’s not a lot surprising about standalone preview publications. As you’d expect, a preview publication is essentially a stripped-down version of the full publication, containing only the content you want to give readers for free.
It doesn’t mean that a preview will come DRM free, but it means that reading systems should allow full access to the content.
So if a preview is just a minimal publication, why a spec to define it, right? Well, there are a few necessary requirements that the spec standardizes even for these most basic of previews:
- First and foremost, it standardizes how you identify a preview publication. You have to add a
dc:typeelement with the value “
preview” to the package metadata:
- It also recommends identifying the parent publication using the
dc:sourceelement. For example, the ISBN of the parent could be specified like this:
A related recommendation is to not assign the parent identifier to the preview (i.e., don’t use the same ISBN in a
dc:identifierin both). As a preview is typically not considered a distinct work, it’s actually recommended not to assign it an ISBN at all.
- And finally, the specification standardizes how to provide a link to obtain the parent publication, using the
linkelement with the “
acquire” relationship value. A link to the ebookstore page where the reader can buy the full publication could be included as follows:
<link href="http://example.org/book/9781448103706" rel="acquire" type="text/html" />
If you wanted to point to an OPDS catalogue entry, you’d similarly add a link like this:
<link href="http://example.org/book/9781448103706.atom" rel="acquire" type="application/atom+xml;type=entry;profile=opds-catalog" />
The takeaway from the above is that identification of a standalone preview publication is primarily a metadata issue; there are no restrictions or requirements on the content itself. A possible complete set of preview-specific metadata only looks like this:
<package …> <metadata xmlns:dc="http://purl.org/dc/elements/1.1/"> <dc:type>preview</dc:type> <dc:source>urn:isbn:9782367932095</dc:source> <link href="http://example.org/book/9781448103706" rel="acquire" type="text/html" /> <link href="http://example.org/book/9781448103706.atom" rel="acquire" type="application/atom+xml;type=entry;profile=opds-catalog" /> … </metadata> </package>
But that’s not to suggest that there aren’t any content issues you’ll have to consider when putting together a preview. The primary one is linking to the unavailable content, which we’ll look at next.
One of the open questions about previews — both standalone and the embedded ones we’ll look at next — is what to do with the table of contents, specifically linking to the content that’s not available as part of the preview. You typically want to give the reader a full table of contents with the preview, even if the bulk of the content is not accessible, if your goal is to convince them there’s plenty of other content worth paying for.
Like many things you’ll find in EPUB, how best to do that is debatable, and that debate is still open for previews. The one thing you absolutely cannot do is strip the
a tags from the table of contents entries, as it’s invalid and epubcheck will barf errors at you.
The currently proposed solutions to this problem are:
- Include a generic page with a message that the content is only available in the full publication, and modify all links to content not in the preview to point to it.
- Remove the
hrefattribute from all
atags that point to content not included in the preview.
I’m partial to the latter approach because it potentially makes clearer what content is in the preview. If you look at the table of contents and only find a few active links, it’s obvious what you have access to. With a dummy page, all the links will appear active, so it will be harder for readers to determine which parts they can read. (Of course, many will just flip through the content provided and not look at the table of contents, but they don’t count!)
The open question about stripping the
href attributes is what reading systems will do: are they going to explode? I ask that facetiously, but it’s a valid question that still needs real-world testing. Removing the
href attribute is perfectly valid to the navigation document requirements, but whether developers have accounted for links without
href attributes is another matter. It might be safer to provide a dummy page.
You’ll also need to handle linking to unavailable content in the viewable content. I’d again probably opt to remove the
href and style the links grey to give a visual cue that they are there but not active. From an accessibility perspective, it might be misleading to only convey potential linking only to sighted readers, but balancing that out I can’t imagine AT users being happy with a whole bunch of links that don’t go to real content.
Embedded previews are a lot more interesting than standalone previews, at least if you like markup solutions that can integrate with an existing publication. Don’t get discouraged if they sound kind of complex to create, as I’ll come back to why it’s not actually the case at the end.
The metadata for embedded previews is largely the same as what we just looked at for standalone publications, but where it’s expressed differs, as embedded previews are defined in the new package document
collection element. (FYI, the
collection element is basically a generic means of defining new package features without having to overload the file with more and more new elements.)
The key differences from a standalone preview are: 1) the collection’s
role attribute defines that it contains a preview (you don’t specify a
dc:type at the package metadata level, as the entire publication is not a preview); and 2) you can omit specifying a
dc:source since the preview is embedded in its parent publication (if a standalone preview is generated from an embedded preview, the program generating it would be responsible for adding the proper metadata).
<collection role="preview"> … </collection>
The one consistency is that you provide a link to where to purchase the book in the package metadata, but I’m not going to repeat the same info I already detailed above. I’m lazy that way.
Anyway, it’s great that we have an empty preview
collection element now, but you still have to tell the reading system/vendor which part(s) of the publication are previewable for it to be useful. There are two steps to doing this, which basically amount to mimicking the EPUB manifest and spine.
The first requirement is to create the manifest of resources necessary to render all the preview content, which is done by embedding a
collection with the role of “
manifest“. In it, you use
link elements to point to the location of each resource.
For example, an embedded preview consisting of a preface and first chapter might also list a supplementary style sheet and images as follows:
<collection role="preview"> <collection role="manifest"> <link href="css/epub.css" media-type="text/css" /> <link href="xhtml/nav.xhtml" media-type="application/xhtml+xml" /> <link href="xhtml/preface.xhtml" media-type="application/xhtml+xml" /> <link href="xhtml/chapter01.xhtml" media-type="application/xhtml+xml" /> <link href="images/c01-img01.xhtml" media-type="image/jpeg" /> <link href="images/c01-img02.xhtml" media-type="image/jpeg" /> </collection> … </collection>
Note that the publication’s navigation document is included as-is; you do not modify it yourself. The reading system (or vendor) is the one who processes its links, which means you have no control over whether they are removed and/or a dummy page inserted (or what the text of that dummy page says). I’m not sure if the lack of author control is a deficiency in the spec or not, but it sort of feels like one. I might have to raise that when work resumes.
The manifest probably sounds like a pain in the you-know-what to create, because it appears at first glance to just duplicate a part of the package manifest, but that difference is exactly the reason for its existence. If you want to generate a standalone publication from the embedded information, or just want to extract the preview from the larger publication, you need to know all the resources required in the rendering. The package manifest is no help, as it lists everything. To extract only the necessary files, without the burden of processing each content document to find out what it references, someone has to create the list. And that someone is going to be you if you’re a content creator.
The next step is to define a spine, or reading order, for the preview content documents, which is also done using
link elements. These links are direct children of the preview manifest and follow the manifest collection.
Although the previous manifest contained six resources, only two are actual content documents that are to be rendered: the preface and first chapter. Here then is the full embedded preview, placing these two documents in the preview spine:
<package …> … <collection role="preview"> <collection role="manifest"> <link href="css/epub.css" media-type="text/css" /> <link href="xhtml/nav.xhtml" media-type="application/xhtml+xml" /> <link href="xhtml/preface.xhtml" media-type="application/xhtml+xml" /> <link href="xhtml/chapter01.xhtml" media-type="application/xhtml+xml" /> <link href="images/c01-img01.xhtml" media-type="image/jpeg" /> <link href="images/c01-img02.xhtml" media-type="image/jpeg" /> </collection> <link href="xhtml/preface.xhtml" /> <link href="xhtml/chapter01.xhtml" /> </collection> </package>
I’ll briefly note here that it’s not a requirement to give access to an entire content document. If you specify a fragment identifier on the link, that indicates that the reader is only allowed to access the content up to, but not including, that point.
For example, if we only wanted to allow access to the first few paragraphs of each chapter, we could add the id “preview-end” to the paragraph where access ends and then include links like these:
<collection role="preview"> <collection role="manifest"> … </collection> <link href="xhtml/chapter01.xhtml#preview-end" /> <link href="xhtml/chapter02.xhtml#preview-end" /> … </collection>
One bit I haven’t shown is that you can also include a
metadata element in the preview collection, but as no metadata is required at the preview level, there’s no spec-related reason to do so. Detailing completely optional elements falls into my optional list of things to do. Metadata is obtained directly from the package metadata section, since it’s generally not selective to specific documents, which is why the acquisition link stays there.
But to wrap this section up, I’ll quickly return to my promise to explain why creating an embedded preview is not so bad as it might seem. If you think about creating the collection manifest and spine equivalents, is it really any more complex than taking a publication and stripping it down? You can copy and paste manifest and spine items from the publication and just tweak them to be links.
You also don’t have to worry about all the manual work to make the standalone publication valid, like removing manifest and spine items and tweaking the table of contents and links. With an embedded preview, it’s the vendor or reading system that has to do all that work. Granted, you lose control over the final product, but that may not be a pressing concern so much as getting the content you want to the reader.
Choosing a Preview Type
To be honest, it’s probably not worth answering this question at this time, but what the hey. You’re not going to be able to make previews as outlined above until both the 3.0.1 and previews specs are finalized anyway, as epubcheck errors are going to get in your way.
That said, if I were to predict a future for previews, I’d expect that content creators will be asked to provide embedded previews with their publication, but vendors will generate standalone previews for distribution to reading systems, sending the full publication only after the content has been purchased. I can’t really see embedded publications living outside a vendor ecosystem, at any rate, reasons including:
- Bandwidth — Do you really want readers downloading the full publication if potentially only a small percentage of readers who sample the ebook will buy it? In the case of simple headings-and-text novel with no embedded fonts of other resources that would bloat the container size, this consideration is probably moot. If you have a rich multimedia publication, with all the content embedded in the container, it’s not a decision to take lightly.
- DRM — Although it’s not stated that embedded previews can only exist within a restricted EPUB, it doesn’t make a lot of sense to include a preview in unlocked content. You might not be the one applying the DRM (the vendor would likely be the one doing that), but an open EPUB is an open EPUB to reading systems. How the reading system determines when to unlock the content is left to vendors to best determine within their ecosystems, so DRM’ed ebooks floating around waiting for some random vendor to unlock them is unlikely.
Preview publications are more likely to fill the void of content creators wanting to get their publications out to the reading public independent of a vendor ebookstore. You can make such things right now, of course, but links to buy the content have to be embedded in the content. The preview specs addition of acquisition links will provide greater flexibility, especially since standardizing them together with the preview identifier, will allow reading systems to recognize previews and present the purchase options to the reader.
But I see that this quick post has turned into a weight tome, so I’ll cut myself off at this point. The last thing I’ll say is that as the specification is still only a working draft, there’s plenty of time to make comments and requests if you find it lacking. Feedback is best directed at the google issue tracker, as always.