August 22, 2014
In testing, there are generally two methods for determining winners: tests that only look in the positive direction; and tests that look in both the positive and negative directions. Statisticians refer to these as one-tailed tests and two-tailed tests respectively.
Imagine you want to test the ‘Call to Action’ on your site and you have built a new variation. While testing, you’d use a one-tailed test to prove your hypothesis that your new CTA placement (experiment) is better than the original (control) and you’d use a two-tailed test to determine which CTA placement works better (experiment or control). There is a subtle but big difference between the two.
Because of the way a one-tailed test is set up, the positive results you see appear to be immediately gratifying. But they can be misleading.
For example, a one-tailed test might tell you that the CTA placement you’re experimenting has no harm on your conversion rate compared to your control. If you like the look of that placement better, you may choose to keep it. Because, no harm right?
The problem, though, is that a one-tailed test can’t actually tell you whether that new CTA placement you’re experimenting is causing any harm. It’s only going to measure in a positive direction.
Which is why a one-tailed test, for the purposes of our analogy, is a one-night stand. It artificially inflates you. It’s also sort of like being a participant in an Oprah show, because...
To me, that’s a perplexing way to test.
One of the big motivations for conducting a website testing program is to figure out whether what you’re doing is having an impact on your business—be that positive or negative. It gives you insight into your customers. If you can’t find that out, why would you actually test?
That’s why two-tailed tests are important. Yes, you’ll have fewer winners and, yes, you’ll have “losers,” but that’s real life. Those failures are valuable lessons, and you can fix them. It takes a little more time, but your wins are real and you will reap the benefits. It’s a lot like a long-term relationship.
If you’ve stuck with me this long, I’ll leave you with some insights from the Institute of digital Research and Education at UCLA, which has a great article on the differences between one-tailed tests and two-tailed tests:
Choosing a one-tailed test for the sole purpose of attaining significance is not appropriate. Choosing a one-tailed test after running a two-tailed test that failed to reject the null hypothesis is not appropriate, no matter how "close" to significant the two-tailed test was. Using statistical tests inappropriately can lead to invalid results that are not replicable and highly questionable--a steep price to pay for a significance star in your results table!