A Problem With Visualising Data

Data visualisation (“dataviz” or more broadly, “infoviz”) appears to serve two main purposes. The first is to show data to people who are not analysts or experts. This is so that they can understand some or all of something that has already been identified in that data. The assumption here is that raw tables, or perhaps bunches of charts or diagrams, don’t easily reveal what’s going on. An example of this would be Tufte’s favourite graphic, which summarises a large amount of what would otherwise be rather uninspiring figures about temperature, troop numbers and the positions of rivers on a route.

The second purpose is to help analysts and experts discover things in raw data that would be difficult to find by other means. An example of this (perhaps, because I’m not an expert in the domain) might be PrognoSim, which visualises the effect of medical interventions on patients.

However, it seems to me that the vast majority of demonstrations of data visualisation are in the context of showing something that has already been discovered by other means. That is, for the first purpose I set out above. That’s of course a valuable thing to do in many cases (well, unless you’re talking about the work of David McCandless). But isn’t the really exciting role of infoviz to do with helping us discover things in large datasets that would otherwise be difficult or impossible to uncover?

Other than in the field of business information dashboards, it seems next to impossible to find clear examples of experts using infoviz (other than standard charts and graphs) to discover things. Where are the case studies of using visualisations to uncover new, surprising and actionable information? Mischa Weiss-Lijn (who has a PhD in this stuff), showed me some research into this question, which mostly confirms my suspicions that infoviz is largely absent in the analysis of complex statistics (for boring old reasons of practicality, mostly).

When we look at the second mode of use, the utility of a visualisation is completely dependent on how easy it is to decode. Seth Godin says he hates Tufte’s favourite “March to Moscow” because Godin can’t understand it within 3 seconds (his blog post on this now oddly un-googlable), and meanwhile, everyone loves David McCandless.

All of which makes me a little depressed. Does this mean that in 100 years time we’ll still be stuck with pie charts and line graphs?