cd path/to/vits/monotonic_align
mkdir monotonic_align
python setup.py build_ext --inplace
Then back to fairseq:
cd path/to/fairseq
PYTHONPATH=$PYTHONPATH:path/to/vits python examples/mms/tts/infer.py --model-dir checkpoints/eng --wav outputs/eng.wav --txt "As easy as pie"
(Note: On MacOS, I had to comment out several .cuda() calls in infer.py to make it work. But then it generates high-quality speech very efficiently. I'm impressed.)