Novices also tend to gravitate towards "end-game" business metrics which have a lot more inherent variation than simple operational indicators.
For example - optimizing a content site for AdSense; many folks would gravitate to AdSense $$ as the target metric, which is admittedly an intuitive solution (since that's how you're ultimately getting paid).
But if you think about it....
AdSense Revenue =>
(1 - Bounce Rate) x Pages / Visit x % ads clicked x CPC
Bounce rate is binomial probability with a relatively high p-value (15%+), thus you can get statistically solid reads on results with a relatively small sample.
Pages / Visit is basically the aggregate of a Markov chain (1 - exit probability); also relatively stable.
% ads clicked - binomial probability with low p-value; large samples becomes important
$ CPC - so the ugly thing here is there's a huge range in the value of a click... often as low as $.05 for a casual mobile phone click or $30 for a well qualified financial or legal click (think retargeting, with multiple bidders). And you're usually dealing with a small sample of clicks (since the average % CTR is very low). So HUGE natural variation in results. Oh, and Google likes to penalty price sites with a large rapid increase in click-through-rate (for a few days), so your short term CPC may not resemble what you would earn in steady-state.
So while it may make ECONOMIC sense to use test $ RPM as a metric, you've injected tremendous variation into the test. You can accurately read bounce rate, page activity, and % click-through on a much smaller sample and feel comfortable making a move if you're confident nothing major has changed in terms of the ad quality (and CPC value) you will get.
Isn't that a good argument for using $$ as the metric to optimize for? If you're going to get wiped out by variations in behavior because highly-retargeted legal clicks are worth 500x more than mobile clicks, isn't that an important variable?
The problem I frequently wonder about is that you have to assume independence about the stable variables to be comfortable testing them. In reality, the bounce rate of a people who make you lots of money is probably driven by different factors than the bounce rate of the overall population.
I guess what you should really do is optimize the bounce rate / pages per visit / etc. for just the population of people that could make you money, but you don't typically have access to that information.
For example - optimizing a content site for AdSense; many folks would gravitate to AdSense $$ as the target metric, which is admittedly an intuitive solution (since that's how you're ultimately getting paid).
But if you think about it....
AdSense Revenue =>
(1 - Bounce Rate) x Pages / Visit x % ads clicked x CPC
Bounce rate is binomial probability with a relatively high p-value (15%+), thus you can get statistically solid reads on results with a relatively small sample.
Pages / Visit is basically the aggregate of a Markov chain (1 - exit probability); also relatively stable.
% ads clicked - binomial probability with low p-value; large samples becomes important
$ CPC - so the ugly thing here is there's a huge range in the value of a click... often as low as $.05 for a casual mobile phone click or $30 for a well qualified financial or legal click (think retargeting, with multiple bidders). And you're usually dealing with a small sample of clicks (since the average % CTR is very low). So HUGE natural variation in results. Oh, and Google likes to penalty price sites with a large rapid increase in click-through-rate (for a few days), so your short term CPC may not resemble what you would earn in steady-state.
So while it may make ECONOMIC sense to use test $ RPM as a metric, you've injected tremendous variation into the test. You can accurately read bounce rate, page activity, and % click-through on a much smaller sample and feel comfortable making a move if you're confident nothing major has changed in terms of the ad quality (and CPC value) you will get.