gpu-friendly base images tend to be larger (1-3g+) so that takes time (30s - 2m range) to create a new Machine (vm).
Then there’s “spin up time” of your software - downloading model files adds as long as it takes to download GB of model files.
Models (and pip dependencies!) can generally be “cached” if you (re)use volumes.
Attaching volumes to gpu machines dynamically created via the API takes a bit of management on your end (in that you’d need to keep track of your volumes, what region they’re in, and what to do if you need more volumes than you have)