This is the correct approach, but having done A/B testing for many years (and basically moved away from this area of work), nobody in the industry really cares about understanding the problem they care about prompting themselves as experts and creating the illusion of rigorous marketting.
Correct A/B testing should involved starting with an A/A test to validate the setup, building a basic causal model of what you expect the treatment impact to be, controlling of covariates, and finally ensuring that when the causal factor is controlled for the results change as expected.
But even the "experts" I've read in this area largely focus on statistical details that honestly don't matter (and if they do the change you're proposing is so small that you shouldn't be wasting time on it).
In practice if you need "statistical significance" to determine if change has made an impact on your users you're already focused on problems that are too small to be worth your time.
Ok so, that’s interesting. I like examples so are you saying I should build a “framework” that presents two (landing) pages exactly the same, and (hopefully) is able to collect things like what source the visitor came from, maybe some demographics. And I then try to get 100 impressions with random blue and red buttons, then check to see if there is some confounding factor (blue was always picked by females linking from google ads) and then remove the random next time and show blue ads to half females from google and half anyone else
I think the dumb underlying question I have is - how does one do experimental design
Edit: and if you aren’t seeing giant obvious improvements, try improving something else (I get the idea that my B is going to be so obvious that there is no need to worry about stats - if it’s not that’s a signal to chnage something else?
There exist some solutions for this that overlay your webpage, and there is a heatmap to show where a user's cursor has traveled to. More popular areas show "hotter" in red, which could show how effective your changes are, or where you may want to center content you're trying to get users to notice around. I haven't directly worked with the data, but have seen the heatmaps from Hotjar on sites I've implemented (doing both frontend and backend development, but not involved in the design or SEO/marketing).
Correct A/B testing should involved starting with an A/A test to validate the setup, building a basic causal model of what you expect the treatment impact to be, controlling of covariates, and finally ensuring that when the causal factor is controlled for the results change as expected.
But even the "experts" I've read in this area largely focus on statistical details that honestly don't matter (and if they do the change you're proposing is so small that you shouldn't be wasting time on it).
In practice if you need "statistical significance" to determine if change has made an impact on your users you're already focused on problems that are too small to be worth your time.