Tag Clouds: The Final Word

Regular readers of Webtorque will know that I’ve droned on about tag clouds several times. Here I go again, but this time, it’s final. I promise. It comes of a brief discussion about our opinions about tag clouds at work this week, which was a good opportunity to summarise what I thought about them – and over a nice cheese sandwich, as it happened.

Tag clouds are good at doing a very specific task very well, but are also hideously misused to the point of utter meaninglessness in a great many contexts. While I don’t think there was any researched intention behind their first use as we know them today, it turns out they are extremely good at giving a semantic summary of a large body of text. As such they offer a level of abstraction above the traditional synopsis, and this can be valuable in the right context.

Perhaps the canonical example of a useful tag cloud is that for comparing state of the union addresses, as demonstrated by IBM’s Visual Communication Lab research into data visualisation types. As with all good infoviz devices, the cognitive cost of the cloud is very low (accessibility and dyslexia issues aside), since here there is no confusion about where the words are from. It’s also very effective: ask what the predominant themes of each address was and you’d get an answer within about a second. That would compare to about 10 seconds for a 200-word synopsis, and perhaps 20mins to read the whole thing. If you needed to choose which speech to use for an example of a tub-thumper out of 50 addresses, you could do a lot worse than to use tag cloud summaries to winnow down some candidates very quickly.

So, bearing mind the “instant” nature of the value of tag clouds, we could formulate the characteristics of a good one as follows:

  • The user must not have to think about how it’s constructed (ie where the words are coming from).
  • It must operate in a context where summarizing of a lot of data is of high value.
  • The meaning of the relative sizes/colours/position of the words must be self-evident (frequency of occurrence being the safest bet)

One of the best real-world examples of the above that I’ve found was on a (sadly defunct) film review site. Each film had been reviewed and tagged by users, with those tags represented as a small cloud against each film entry. Therefore, as you scrolled down looking at each film, words like “romantic” “action” “fun” and “shlock” jumped out at you to allow a very quick decision about whether to investigate the film further (eg by reading the reviews or the plot summary).

Examples of arse-grindingly bad tag clouds are unfortunately far in the majority. Certainly, anything designed to be used as navigation is doomed, as is anything in a context of search. There is barely any reason for anyone care that we have more “boutique” hotels than we have “historic” ones. Worse, why would we want to make people think they should click on “New York” rather than “Budapest” simply because more people book rooms in the former? The latter objection is also one I have about pushing “popularity” in an ecommerce context: it just doesn’t make any sense to tell people they should be buying a washing machine and not the laptop they in fact want.*

That tag clouds seem to get pressed into such inappropriate service may be down to their early (and fairly good) use on blogs, where the cloud represented the content of the blog for newcomers who otherwise would have no idea (the Observer Magazine blog – designed by the saintly Ben Hammersley – called the cloud the “zeitgeist” for this reason). They seemed to transmogrify into navigation devices after that, and then it all went to hell.

* It was later pointed out to me that in an ecommerce context, this might be a bit over the top. Marketeers in fact are always selling (or trying to sell) things to people that they don’t want. So in a purely marketing sense at least, using “popularity” is legitimate. Still sucks though.