I have another long-standing interest in text-to-speech rendering from my time at CNIB, where the two main outputs we were generating were xml for braille full-text production and synthetically voiced DAISY 2.02 back matter components.
The reason we were TTS’ing back matter was that spending time reading indexes and bibliographies is an enormous waste of human resources — it’s a lag on getting books out to readers and would result in a precipitous drop in total output.
Very few people ever read the back matter, too, at least in general circulating libraries like we had. TTS meant that we didn’t have deny readers information that otherwise would have been omitted.
But to the point of this post, when I first saw the enhancements in EPUB 3 to improve text-to-speech playback, and a means of distributing high-quality text for rendering on the client side, I had stars in my eyes. Here was a way to bring high-quality voicing without huge audio downloads. But two plus years on, how close are we to realizing the potential?