Since I am the one who wrote the case study, perhaps I can explain. Loosely sayi...

almost · on May 26, 2010

I must be have been wrong then, my apologies.

How are you using to arrive at that conclusion though? When I do a A/B testing I (or rather, the software I use) use the chi-squared test to give a confidence value expressed as a percentage. When I get over 95% I know I have statistical significance and I end the test. At that point I also have an "improved by" number but I would have to collect much more data for that to be statistically significant as far as I understand.

RiderOfGiraffes · on May 26, 2010

I'm not an expert in statistics, but I'm careful about what I do, and I know the limitations of my knowledge.

However ...

I believe it is flawed methodolgy to run a test until you get a significant result. I'm pretty sure I read something lunk to from HN that discussed this at some length. It's possible - if you simply run your test until you get significance and then stop early - that you will get significance because of random fluctuations in the middle of your trial and stop early when you shouldn't.

As I recall, you should decide on the length of your trial at the beginning, then run your stats at the end.

I'll try to find the article in question, but my Google-Fu is pretty poor today for some reason.

EDIT: It's here:

http://news.ycombinator.com/item?id=1277004

paraschopra · on May 26, 2010

See this http://visualwebsiteoptimizer.com/split-testing-blog/tag/mat...

almost · on May 26, 2010

HN won't let me reply to your answer below (presumably to stop me from flaming you :p). But thanks, I'll read that blog post later but it looks like it answers my questions plus a bit more.

Seems my understanding was even less than I thought :)