> You can call tests "early" Another way to say that is: you can randomly pick a...

crystal_revenge · on July 29, 2024

Of course at the extreme you are over tuning for exploitation but in practice it's never completely random. You always have some information about the probably winner, so long as the P(A>B|obs) is not 0.5

Taking a long time to reach "significance" just means there is a small difference between the two variants, so it's better to just choose one and the try the next challenger which might have a larger difference.

In the early stages of running A/B tests being 90% certain that one variant is superior is perfectly fine so long as you have another challenger ready. Conversely, In the later stages of a mature website when you're searching for minor gains you probably want a much higher level of certainty that then standard 95%.

In either case thinking in terms of arbitrary significance thresholds doesn't make that much sense for A/B testing.