Although support for text and audio synchronization in EPUB reading systems still isn’t complete or perfect, we’ve been learning more about the optimal ways to generate the content.
I keep having conversations about overlays and then keep forgetting to write down what the issues and resolutions are, so decided to make that the topic for the day.
This post will probably be a bit of a hodgepodge of ideas, so you’ll just have to bear with me.
Reflowable and Fixed Layout
Content dynamically paginated by the reading system… author-defined fixed layout pages… who cares when we’re just talking about synchronizing prerecorded audio with the text, right?
Wrong. Your choice of content flow is actually more significant than you might first think, at least at this stage.
For one, if you choose to do a plain old reflowing publication, your media overlays aren’t going to work in iBooks. It only handles media overlays when they’re associated with fixed layout documents, so your users won’t have access to the audio. Maybe that’s not a concern for you, but it is an impediment to getting accessible books in the mainstream.
That said, there is a sneaky advantage in not dealing with reflowable documents. Not that I expect Apple had designs on sneakiness, but it makes an easy segue into the next problem: dynamic page turns.
It’s easy and predictable for a reading system to keep the current page synchronized with the text/audio playback when every document represents a page. That’s fixed layouts in a nutshell. When all the synchronization points have been played, it means the current page is finished and its time to move on to the next.
By not creating/supporting reflowable text, you never have to think about the problem of what happens when the page boundaries are fluid. If your content is synchronized to the paragraph, for example, what happens when first half the paragraph is at the bottom of one page and the latter half at the top of the next, particularly when that next page is not visible?
Naturally enough, anyone following the text with the audio will be stuck listening as the narration continues on to the unseen content, as the page turn won’t occur until the next paragraph is reached.
The problem doesn’t go away if you synchronize to the sentence, either, it’s just mitigated somewhat by the shorter length of a sentence. Only word-level synchronization avoids this problem entirely (which is why you don’t find it with text-to-speech playback).
You can’t blame the reading systems for this potential oddity of media overlays, either. While it might look like a reading system bug when playing, pouring resources into heuristic tests to try and guess the right point to flip the page is unrealistic (e.g., average guess of seconds per word times the number of words showing and flip the page when the resulting time has elapsed).
It really shouldn’t be incumbent on reading systems to make up for a content design decision, but it will affect your decision making if you want to produce full text and audio synchronized books. The more granular you can get the better, but as I outlined in the book it can come at a cost.
To finish the earlier thought, there’s simply no win if you want full text and full audio synchronization and playback for a reflowable book in iBooks, at least at this time. If you’re only looking at structured audio (see the next section), you might want to consider putting the headings into fixed layout documents. But a is always a hack…
The first thing to note is that EPUB 3 does not have an equivalent of DAISY NCX-only audio files. Audio has to be synchronized to text in EPUB, and media overlays are attached to content documents, so there must be at least one content document in the spine (no referencing SMIL files in the spine, like DAISY 3 did).
What that means is that you have to include the publication headings, and synchronize to them to create the most minimal of EPUB audio books.
The first thought you might have at this point is to use the EPUB navigation document as your sole spine entry, and synchronize the audio to the entries.
A conceptual problem with this approach, however, is that the navigation document is a list of links with “table of contents” as its heading. Since the user will have to enter the first document before activating overlays playback, discovering the table of contents and realizing that playback is synchronized to its entries not exactly intuitive.
If you drop in and out of playback to move around, it’s also going to be confusing to find yourself in a list instead of being able to move by heading shortcuts, but I’m not sure if that’s a real concern or not. Navigating lists and headings isn’t the same, is more the key.
It’s consequently recommended that you include proper HTML headings in the content documents and synchronize to them.
File Size Optimization
If you’re used to DAISY production, including the entire content of a publication in a single file might not seem out of the ordinary, but you really don’t want to perpetuate this habit into EPUB 3.
Reading systems don’t generally handle loading an entire book in one go, particularly ones running on low-power devices. You can overwhelm the reading system memory, which is why EPUB content is broken up by major section and assembled as a single work using the spine.
The other problem with having all the content in one file is that you’ll have to generate one massive media overlay file, since only one media overlay can be attached to any given content document. That’s another performance hit in itself, especially when added on top of the memory necessary for rendering the content.
Follow the content chunking recommendations for EPUB when creating media overlays. Break your publication up by chapter, section or whatever major structural feature makes sense. By extension, your media overlays will be broken up into similarly smaller chunks.
That’s all I wanted to put down for now, so I’ll keep this post shorter than some of the recent ones and cut off here. If you have any feedback to offer on your own attempts at accessible EPUBs using overlays, particularly coming from DAISY production, feel free to drop them in the comments.