In the Alpha case it was a very late addition, in the last widely shipping version of the chip and IIRC was speculated to be part of some supercomputer/classified use case. ARM has a history of having quirky un-RISCy instructions.
(edit: also it seems that ARM has just cnt.v8 for counting 8-bit lanes in NEON and no 64-bit scalar instruction version, interesting. Being part of NEON also means it's an optional part on ARM)
Late addition is more indicative of value than appearing in first releases. People guess about the base instruction set, but additions happen only in response to high demand.
it's honestly pretty baffling that RISC-V doesnt have it (perils of design-by-academia)