Test-Driven Design: A Little Method Goes a Long Way

At hotels.com we’re pretty test-driven. We’re testing stuff all the time on the site with multi-variate or A/B tests of various kinds. But as I always point out, doing tests (or indeed any kind of quantitative or qualitative research) is easy. It’s what you do with the results that count.

So when I see a test proposal, I always ask myself “what if?” What if the result is X, what would that tell us? And if it is Y or Z? Could we use that information to design something even better? Might the result of that test give us a clue about what to test next? So in a sense I’m not really interested in the current test, I’m interested in what happens after – when the results are in.

Take an example. We observe a rival site showing messages about when a hotel was last booked, or how many rooms are left. We assume that they are doing this because it improves conversion. Some further research and thinking about why this might improve booking levels indicates that people may respond positively to a sense of “urgency” comprising of one or both of a) “Scarcity”: worrying that a hotel might sell out of rooms, and b) “Popularity”: reassurance that the hotel the customer is considering is the right choice because others think the same.

By designing the right tests, we may be able to arrive at something for our own site that better exploits the underlying reasons for the conversion uplift that the competition may be seeing. The trick is first to come up with a good hypothesis that can be tested and refined. For example, we might think up an initial hypothesis by looking at scarcity first, like this:

“Urgency messaging makes people more likely to book hotels.”

This is a good general hypothesis, but gives no guide to how to design an experiment. The hypothesis must be refined to give it some direction.

“People are more likely to book when they are told how many rooms are left in the hotel.”

This gives some directionality, but the hypothesis is not really testable. So the final stage is to phrase a hypothesis around which we can come up with some suitable designs to test.

“Some people are more likely to book when they perceive room levels as being low.”

This is now a testable hypothesis – we have established variables, and by testing various numbers of rooms shown to various types of customer, and eliminating other controlled variables such as channel and seasonality, we can see what kinds of correlation exist between the number of bookings and the presence of scarcity messages.

Of course, this test won’t give us a complete answer as to why people might book more under these circumstances (a single test may be misleading, particularly as correlation is not always an indication of causation). It may even reveal that our hypothesis is not correct. But in both cases, we need to do further tests to see if we can optimise the design of the messaging so as to perform even better than it does today. For example, it would be worth knowing if such messages work exceptionally well with certain customers, but in fact put off other people (something we’ve seen in qualitative testing). So the uplift we see might be despite a minority choosing not to book when in fact they would have done so had they not been exposed to the urgency messages (“1 room left!” might cause some people to decide to leave the site for another that has more rooms left, for instance).

A footnote: it’s suprisingly hard to explain to people how to come up with testable hypotheses. This leads to a lot of A/B testing being wasted on poorly stated aims, or simply made useless by not helping us get any closer to why something might have happened (“It made us more money” isn’t a very intelligent conclusion). Research scientists know this stuff. Perhaps I should hire one.