If you don’t follow IDPF specification development closely — I can forgive you if you don’t, though I can’t imagine why not — a new baby is on its way: indexes. The first working draft is now available on the IDPF site, and, all going well, it will become a recommendation probably around the same time as the 3.0.1 revision specifications.
I’m not going to write about the indexes spec today, though; well, at least I’ll be trying not to. More on my mind is the general misunderstanding of indexing in a digital world, one where engineers equate an art with a simple science. Indexing is not just keyword listing, after all, despite that being a pervasive belief in the ebook world. Re-read that last sentence a few times before going on, if you need. It’s important to get.
I don’t know where the idea that keyword matching could stand in for a good index came from, or why people invest effort in the belief that algorithms of keyword matching are going to make for fantastic automated indexes, but such are the oddities of life. I guess the approach makes sense if you don’t think too deeply about an index, that it’s nothing more a list of locations where keywords are used. (I guess.)
But I suppose you have to consider the expectations of the people automating indexes, and their exposure to indexes. My experience has been that in the sciences people see an index typically boiling down to keyword matching because the prose is very literal. If a function or construct is mentioned in a computer science book, it is mentioned by name. If it is mentioned, it’s probably a reasonably accurate point of reference. And so keyword matching becomes indexing.
But that logic only holds, if at all, in a very limited scope of works. The humanities are certainly a completely different beast. Keyword matching often leads you along a path of frustration, as the concepts and ideas being conveyed may make minimal or no use of the term they’re classified under. Sometimes one term might be standing in for another, but unless you know that you’re reading an entire index to figure out what you might have missed.
Indexing in these works is always more art than science when done well. It requires someone sitting down with the work who can ferret out and map the concepts the author is expressing, weeding out the important from the useless. It requires someone who can see the underlying correlations, not just adding a glorified text search for the reader.
Fortunately, the new EPUB indexes spec will help bring indexing back to the indexers, the true masters of the art. Indexes should be a tool for reading system developers to exploit, to open up discovery and navigation of the content in a whole new way, made by the pros, not a hack pieced together from bits and bytes.
But I’ll stop before I get too far into rambling and hyping the spec. I just wanted to make note that it’s nice to see an overlooked part of the rapid conversion to digital finally getting its due…
UPDATE: The working draft was posted this morning, so I’ve updated the first para to include a link.