Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You're summary is incorrect.

Rather, these are simulated data for a fictitious company. The author is demonstrating a scenario in which a purely frequentist approach to A/B testing can result in erroneous conclusions, whereas a Bayesian approach will avoid that error. The broad conclusions are (as noted explicitly at the end of the article):

- The data generating process should dictate the analysis technique(s)

- lagged response variables require special handling

- Stan propaganda ;) but also :(

It would be cool to understand what the weaknesses or risks of erroneous conclusion to the Bayseian approach in this or similar scenarios. In other words, is it truly a risk-free trade off to switch from a frequentist technique to a Bayesian technique, or are we simply swapping one set of risks for another?

tl;dr The author's point is not to make a general claim about the aggressiveness of CTAs.



While I am generally in favor of applying Bayesian approaches, that's overkill for this problem. In their (fictitious) example, the key problem is that they ran their test for too short a time. They already know that the typical lag from visit to conversion on their site is longer than a week, which means that if they want to learn the effect on conversions a week isn't enough data.

While it is possible to make some progress on this issue with careful math, simply running the test longer is a far more effective and robust approach.


I'm no statistician, but don't you have the same problem however long you run it? Giving even more time for slow conversions to amass?

Also you and GP are calling the example fictitious, but seems to based on 'real traffic logs' via https://dl.acm.org/doi/10.1145/2623330.2623634


We're taking the author at his word:

> "Let us consider the following fictitious example in which Larry the analyst of the internet company Nozama"

Nozama is Amazon backwards.


> - The data generating process should dictate the analysis technique(s)

And to expand on this, the data generating process is not about a statistical distribution or any other theoretical construct. Only in the frequentist world do you start with assuming a generating process (for the null hypothesis, specifically).

The data generating process in this case are living, breathing humans doing things humans do.


The data generating process is the random assignment of people to experiment groups.

The potential outcomes are fixed: if a person is assigned to one group the outcome is x1; if another, x2. No assumption is made about these potential outcomes. They are not considered random, unless the Population Average Treatment Effect is being estimated. And even in that case, no distribution is assumed. It certainly is not Gaussian for example.

Under random assignment, the observed treatment effect is unbiased for the Sample Average Treatment Effect. So again, the data generating process of interest to the analyst is random assignment.


Assuming you're able to actually achieve truly random participation in the various arms you're trialing, you're right.

And it's my fault for not thinking of that as a possibility. Colour me jaded after experiencing very many bad attempts at randomization that actually suffer from Simpson's paradox in various ways!


You're absolutely correct, proper A/B testing has many engineering challenges!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: