Rather, these are simulated data for a fictitious company. The author is demonstrating a scenario in which a purely frequentist approach to A/B testing can result in erroneous conclusions, whereas a Bayesian approach will avoid that error. The broad conclusions are (as noted explicitly at the end of the article):
- The data generating process should dictate the analysis technique(s)
- lagged response variables require special handling
- Stan propaganda ;) but also :(
It would be cool to understand what the weaknesses or risks of erroneous conclusion to the Bayseian approach in this or similar scenarios. In other words, is it truly a risk-free trade off to switch from a frequentist technique to a Bayesian technique, or are we simply swapping one set of risks for another?
tl;dr
The author's point is not to make a general claim about the aggressiveness of CTAs.
While I am generally in favor of applying Bayesian approaches, that's overkill for this problem. In their (fictitious) example, the key problem is that they ran their test for too short a time. They already know that the typical lag from visit to conversion on their site is longer than a week, which means that if they want to learn the effect on conversions a week isn't enough data.
While it is possible to make some progress on this issue with careful math, simply running the test longer is a far more effective and robust approach.
> - The data generating process should dictate the analysis technique(s)
And to expand on this, the data generating process is not about a statistical distribution or any other theoretical construct. Only in the frequentist world do you start with assuming a generating process (for the null hypothesis, specifically).
The data generating process in this case are living, breathing humans doing things humans do.
The data generating process is the random assignment of people to experiment groups.
The potential outcomes are fixed: if a person is assigned to one group the outcome is x1; if another, x2. No assumption is made about these potential outcomes. They are not considered random, unless the Population Average Treatment Effect is being estimated. And even in that case, no distribution is assumed. It certainly is not Gaussian for example.
Under random assignment, the observed treatment effect is unbiased for the Sample Average Treatment Effect. So again, the data generating process of interest to the analyst is random assignment.
Assuming you're able to actually achieve truly random participation in the various arms you're trialing, you're right.
And it's my fault for not thinking of that as a possibility. Colour me jaded after experiencing very many bad attempts at randomization that actually suffer from Simpson's paradox in various ways!
Rather, these are simulated data for a fictitious company. The author is demonstrating a scenario in which a purely frequentist approach to A/B testing can result in erroneous conclusions, whereas a Bayesian approach will avoid that error. The broad conclusions are (as noted explicitly at the end of the article):
- The data generating process should dictate the analysis technique(s)
- lagged response variables require special handling
- Stan propaganda ;) but also :(
It would be cool to understand what the weaknesses or risks of erroneous conclusion to the Bayseian approach in this or similar scenarios. In other words, is it truly a risk-free trade off to switch from a frequentist technique to a Bayesian technique, or are we simply swapping one set of risks for another?
tl;dr The author's point is not to make a general claim about the aggressiveness of CTAs.