I think the bigger question is how long it takes to load any meaningful model on...

fideloper · on Feb 13, 2024

that’s exactly right.

gpu-friendly base images tend to be larger (1-3g+) so that takes time (30s - 2m range) to create a new Machine (vm).

Then there’s “spin up time” of your software - downloading model files adds as long as it takes to download GB of model files.

Models (and pip dependencies!) can generally be “cached” if you (re)use volumes.

Attaching volumes to gpu machines dynamically created via the API takes a bit of management on your end (in that you’d need to keep track of your volumes, what region they’re in, and what to do if you need more volumes than you have)

dathinab · on Feb 14, 2024

I know it's not common in research and makes often little sense there.

But at least in theory for deployments you should generate deployment images.

I.e. no pip included in the image(!), all dependencies preloaded, unnecessary parts stripped, etc.

Models likely might also be bundled, but not always.

Still large images, but also depending on what they are for the same image might be reused often so it can be cached by the provider to some degree.