Approaching Test-Driven Design

At Hotels.com we’ve been doing multi-variate testing (“MVT”, or sometimes “A/B testing” if you’re variant challenged) for a while. This means we typically build a number of different designs, then let them duke it out on the live site to see which one performs the best.

Recently, however, I’ve been increasingly aware that while we have a very powerful tool in MVT, power is nothing without control. When you can test anything you want, things can soon get out of whack: so a bigger button didn’t move any needles; adding a link to a map raised conversion; a red background meant customers in France bailed out. Now what? What do these things mean for us and our work as designers?

What has been focusing my mind is how we should best respond to the results of MVT tests. How we can build on those results and progress towards even better ones? I’m also aware of two other issues that relate to our design activity in this regard: the “local maximum” problem, and how qualitative research fits in.

Some people have said some things about test-driven design and the effects of MVT overall, but I assume that because MVT isn’t that common, it’s not really an issue for most practitioners. Dell is a notable exception, but even if you’re not doing MVT, I think some of the following might be worth bearing in mind in UX generally.

As an aside, I am not trying to say that this is the best, or the only, way to approach test-driven design. I gave up believing too much in the portability of UX techniques long ago. What follows is what I think may work for my team in their current working environment.

It may completely bomb for you.

The key to everything in dealing with experimentation is having an agreed hypothesis before you do a test. You also need a stated design rationale that interprets that hypothesis for each design variant in that test. This is so that you can make a decision on what to do when confronted with the results. An effective hypothesis for a test will also be one that allows you to write further hypotheses to direct further designs to test (should you want to).

So how should we do that? To use an analogy, it’s like playing chess: try to anticipate what might happen a couple of moves ahead.

At first, we need an initial hypothesis about something (AKA “the hypothesis going in”). For example, on an SEO landing pages we might have a number of things we’d like to test, but perhaps we’re most curious about trust. So we could say “Customers will bounce unless they feel the landing page is sufficiently trustworthy to continue their journey.”

Now we create two new designs that we think support the hypothesis going in. We state that Design A “Promotes trust by showing our prices are the best in the market, so the customer doen’t need to go elsewhere”. Meanwhile, Design B “Promotes trust by showing that we have a wide choice of hotels so they don’t need to go elsewhere.” It’s probably best to make these designs look and feel the same, but be as polarised as possible in their approach, so as to better isolate any causation we might find in the test.

With a hypothesis for the project agreed, and with rationale for each design in place, we can then think ahead to how we can respond to the results. Both designs will either do better, worse, or level-peg with the control. Ideally, we also need to have a design response for each of those scenarios, as well what we will do if both designs either beat, meet or lose against the control. That’s quite a few possibilities, but in considering them we keep control over the design process and make sure we’re steering a course to somewhere – even if it’s the wrong place.

So now you run the test and get an outcome. Let’s say Design A wins against both Design B and the control. We might then already have a plan (even an early design) to start iterating on its rationale. Recall that Design A “Promotes trust by showing our prices are the best in the market.” So if that’s shown to do best in the test, we might dial up the effect to see if we can get even more out of it. Show a picture of a competitor being stamped on, make our price-match guarantee claim form part of the page, etc. We can test increasingly strong interpretations of the rationale until the page contains nothing but reassurances about lowest pricing in the market. At some point, we will see a decline in the effect, at which point we know we’ve reached the “local maximum” of that approach. We have then successfully tested our way to the best result against that particular hypothesis.

We can then start again with a different one (eg “Customers are looking for destination information to inform their choice of hotel”) and rinse and repeat. We might then try combining designs by using two strong hypotheses together, and so on.

So now the design and testing process isn’t random, or driven by outside forces, it’s directed by an evolution of hypotheses and selective interpretation of design rationale.

It’s also important to note that each of the possible outcomes of a given test are just as valuable to the designer as any other. Let’s say that instead of Design A winning out, both designs did worse than the control. While that’s a harder call to make (do you say your hypothesis is invalid? Do you test again with more extreme interpretations? Try another interpretation?), it’s still just as good to have that result as any other. This is because it says something about the hypothesis.

Another good thing about working with hypotheses is that you can attach qualitative research results to them as well. This is useful because quantitative testing will give you plenty of “what” but no “why.” Bringing qualitative research to bear on the hypotheses can help you re-formulate and re-test them with quantitative testing. So qualitative testing enriches the above method.

Finally, despite what some people say about the local maximum problem, I think that in following this plan, we may well get insights that allow us to leap to completely different designs to test. This is because epiphanies rarely arrive without some form of stimulation, and the work of looking for profitable hypotheses may well provide that. Mind you, all this is currently theoretical, but we intend to start implementing as soon as we can.

Wish us luck.