Did you consider using Constant Q Transform instead of STFT? mpv and ffmpeg come...

emcq · on Feb 24, 2020

When should one use the Constant Q transform over a Mel or Bark spaced STFT?

zeroxfe · on Feb 23, 2020

Have not tried it -- def worth investigating. Thanks.

oever · on Feb 24, 2020

The Constant Q Transforms uses bins that are spaced on a log scale like musical notes. So the bins corresponds better to how humans perceive pitch. You wont waste hundreds of bins to the high frequencies.

Calculating CQT can be roughly as fast as FFT.

http://academics.wellesley.edu/Physics/brown/pubs/effalgV92P...

And here are some real musical samples you can use instead of the artificial midi notes:

http://virtualplaying.com/virtual-playing-orchestra/