You're summary is incorrect. Rather, these are simulated data for a fictitious c...

jefftk · on Jan 7, 2022

While I am generally in favor of applying Bayesian approaches, that's overkill for this problem. In their (fictitious) example, the key problem is that they ran their test for too short a time. They already know that the typical lag from visit to conversion on their site is longer than a week, which means that if they want to learn the effect on conversions a week isn't enough data.

While it is possible to make some progress on this issue with careful math, simply running the test longer is a far more effective and robust approach.

OJFord · on Jan 7, 2022

I'm no statistician, but don't you have the same problem however long you run it? Giving even more time for slow conversions to amass?

Also you and GP are calling the example fictitious, but seems to based on 'real traffic logs' via https://dl.acm.org/doi/10.1145/2623330.2623634

xibalba · on Jan 7, 2022

We're taking the author at his word:

> "Let us consider the following fictitious example in which Larry the analyst of the internet company Nozama"

Nozama is Amazon backwards.

kqr · on Jan 7, 2022

> - The data generating process should dictate the analysis technique(s)

And to expand on this, the data generating process is not about a statistical distribution or any other theoretical construct. Only in the frequentist world do you start with assuming a generating process (for the null hypothesis, specifically).

The data generating process in this case are living, breathing humans doing things humans do.

rwilson4 · on Jan 7, 2022

The data generating process is the random assignment of people to experiment groups.

The potential outcomes are fixed: if a person is assigned to one group the outcome is x1; if another, x2. No assumption is made about these potential outcomes. They are not considered random, unless the Population Average Treatment Effect is being estimated. And even in that case, no distribution is assumed. It certainly is not Gaussian for example.

Under random assignment, the observed treatment effect is unbiased for the Sample Average Treatment Effect. So again, the data generating process of interest to the analyst is random assignment.

kqr · on Jan 7, 2022

Assuming you're able to actually achieve truly random participation in the various arms you're trialing, you're right.

And it's my fault for not thinking of that as a possibility. Colour me jaded after experiencing very many bad attempts at randomization that actually suffer from Simpson's paradox in various ways!

rwilson4 · on Jan 7, 2022

You're absolutely correct, proper A/B testing has many engineering challenges!