Kevin Kelly and Book Scanning

It being near the end of the year, I find myself in retrospective mode, so I’ve got an excuse not be very topical in reviewing Scan This Book! by Kevin Kelly of the New York Times, written back in May this year. I’ve just finished reading it (it’s that long – doesn’t the NYT have editors?) and I can’t resist a pop.

Kelly says some interesting things about the future of digitised books. For example:

“Turning inked letters into electronic dots that can be read on a screen is simply the first essential step in creating this new library. The real magic will come in the second act, as each word in each book is cross-linked, clustered, cited, extracted, indexed, analyzed, annotated, remixed, reassembled and woven deeper into the culture than ever before.”

There are some similarities in what he says to my own little imaginings in my essay here on Webtorqe, written a couple of weeks before Kelly’s article. I came at the similarities from a different angle though, and don’t see his “universal library” as being like an old-world library with better indexing, I see it as the basis for the arrival of new human societies that are based not on geographic location but on commonality of information and interest. The key to this is the arrival of true de-centralisation, the rise of collaborative and other types of information filtering, and the fact that humans have a limited capacity to process and act on so much input.

More nuttily perhaps, I also see information ubiquity as the raw material that will eventually produce the next stage of life on earth: that of intelligent machines. Humans will pass the baton of intelligence to machines by moving that intelligence into the digital realm. I’m with Vernor Vinge on this one. When that happens, the human era will be over, and we will be as significant to the future of ourselves and our planet as animals are to humans today. Of course, that’s hopefully a long way off yet, but we are inching towards it, and in creating systems that are designed to manage information (like RDF and semantic interfaces) we pave the way for it.

So much for rabid futurology. Despite the title of Kelly’s article, scanning actually represents a significant barrier on the road to Kelly’s digitised, interwoven, cultural revolution. He doesn’t seem to be the only person to overlook a rather basic problem with scanning books (or rather, using OCR to scan them, which is I assume what he means): the errors.

When a scanner picks up the shape of a letter, and the software recognises that as an ASCII character code and records not the shape but the code, there is a lot of margin for error. An “eye” can look like an “ell”, a zero like an “oh”, etc. Sure, technology can improve this, but in order to get to Kelly’s utopia, are we happy to pull our entire literary heritage through an automated procedure in the hope that nothing significant is lost in doing so? If not, then the best hope we have is Project Gutenberg’s Distributed Proofreaders (currently doing about 130 books a month). Otherwise, future generations may wonder why the founding fathers wrote “We the people… in odour to form a more perfect onion…”

To the best of my knoweldge, the recent Microsoft and Google partnerships with the British Library and the Bodleian aren’t OCR efforts – they’re simply making pictures of the books. Hardly conducive to Kelly’s vision.