Since I am the one who wrote the case study, perhaps I can explain. Loosely saying, what the article actually means is that the observed difference of 28% or more may actually be true in 95% of cases. However, there is a 5% chance that that difference is a lesser than 28%.
If you have any specific question, please feel free to ask.
How are you using to arrive at that conclusion though? When I do a A/B testing I (or rather, the software I use) use the chi-squared test to give a confidence value expressed as a percentage. When I get over 95% I know I have statistical significance and I end the test. At that point I also have an "improved by" number but I would have to collect much more data for that to be statistically significant as far as I understand.
I'm not an expert in statistics, but I'm careful about what I do, and I know the limitations of my knowledge.
However ...
I believe it is flawed methodolgy to run a test until you get a significant result. I'm pretty sure I read something lunk to from HN that discussed this at some length. It's possible - if you simply run your test until you get significance and then stop early - that you will get significance because of random fluctuations in the middle of your trial and stop early when you shouldn't.
As I recall, you should decide on the length of your trial at the beginning, then run your stats at the end.
I'll try to find the article in question, but my Google-Fu is pretty poor today for some reason.
HN won't let me reply to your answer below (presumably to stop me from flaming you :p). But thanks, I'll read that blog post later but it looks like it answers my questions plus a bit more.
Seems my understanding was even less than I thought :)
If you have any specific question, please feel free to ask.