For FFTW the showstopper was GPL license. For IPP, 200 MB of binary dependencies, also I remember when Intel was caught testing for Intel CPUs specifically in their runtime libraries instead or CPUID feature bits, deliberately crippling performance on AMD CPUs. I literally don’t have any Intel CPUs left in this house. For cuFFT, the issue is vendor lock-in to nVidia.
And the problem is IMO too small to justify large dependencies. I only needed like 200×400 FFT as a minor component of a larger software.
It would be interesting to see how it compares to https://gitlab.mpcdf.mpg.de/mtr/pocketfft. The c++ branch is header only. I believe this is what scipy uses by default