It's much worse than that: based on measurements these use a loop that goes one set bit of the mask argument at a time. uops.info and so on measure a mask that's almost all zero so they vastly underreport. Worst case is in the hundreds of cycles. Do not use pdep/pext before Zen 3. https://twitter.com/uops_info/status/1202950247900684290