April 5, 2012
There’s a memorable scene in the "Wall Street" sequel where Shia LaBeouf’s character asks the head of a trading firm, “What’s your number … the amount you would need to just walk away from it and live?”
The trader’s response: “More.”
And as the film advances the notion that more is analogous to too much, it reminded me of the idea that too much website testing is actually a bad thing.
But is it?
One school of thought believes that when two or more separate tests run concurrently on the same page (or if two visitors are exposed to the same experiment on Page A but then different experiments on Page B), that the results of each test are invalidated. “You have,” as it were, “polluted the experiment with outside influences.”
On the contrary, though, the two tests are still equally valid, and for reasons that have everything to do with how website testing differs from traditional experiments, such as clinical trials or the examples in a statistics textbook.
Traditional Hypothesis Testing
In the traditional approach to hypothesis testing, marketers would understand the behavior of a given population in advance of a test, as well as the intended impact of a treatment, and use those understandings to structure a fixed-length test with a single evaluation at its conclusion.
For example, a typical clinical trial might involve hundreds of health and socioeconomic questions to weed out diversity.
This might mean understanding the expected value for some metric for a segment of traffic, and declaring that a test will look for a minimum of a certain change in that metric. This information would yield a sample size for the test, which would be run until that sample size was collected and then a determination would be made as to the validity of the hypothesis.
However, marketers often lack sufficient information about the behavior of the given population to design a fixed-length experiment, particularly given the wide differences in variance for certain metrics between population segments and over time.
In other words, website testing doesn’t take place in a vacuum. Website testing occurs in real time and cannot duplicate the controlled settings of a laboratory experiment.
Website Testing for the Real World
People are different.
Although visitors in the same experiment may both live in Chicago and have come to your site via AdWords, there are still many differences between them. They earn different salaries and enjoy different musical tastes. One recently updated his Facebook status to “In a Relationship,” while the other is happily married.
For all the control marketers want to have in website testing, there are still many factors that we neither know about nor control (and which wield greater influence on the likelihood to convert than any of the elements you’re testing) . For this reason, we need a large sample size in order to declare statistical significance and to conclude that a test’s outcome is not due to random chance.
And herein lies the reason why overlapping tests are still equally valid—the net effect is the same as there being another difference between the visitors in your sample group. You’ll simply need a slightly larger sample size before declaring significance.
Keys to Successful Website Testing
Concurrent tests must be “independent” in two important ways:
1) The experiment/control decision for each test must be independent from the experiment/control decision for every other test. In other words, the fact that an individual is part of one experiment or control must not be connected to their participation in the experiment or control for another test.
The testing tool must be able to make separate, random assignments for each visitor in each test.
2) The changes being evaluated by the concurrent tests must also be independent. For example, two concurrent tests could not both change the hero shot on a page, because the page can only have one hero shot at a time.
Author’s Note: As I wrote in Debunking the Myth of the One Perfect Page, the origin of the “no overlapping tests” idea actually stems from a technological constraint—not sound mathematics.
Borne from this weakness was the idea that running large numbers of simultaneous tests was simply wrong or invalid. The truth? Test often and test always. With a few simple rules, you’ll reach faster —and equally valid—conclusions about what does, and does not , work for your website visitors.