Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If for a while I eliminate the scaling issue with v0 AWS architecture. Would it be right to say in v0 issues like excessive load times were solved by decoupling models and flask app (e.x. making batch prediction calls to each necessary model for the current request?) rather than v0 architecture itself?

Was it that hard to make the make batch prediction calls to each necessary model for the current request on AWS?



Making the calls to the corresponding model from Flask was actually easier on AWS, since they were loaded into memory. Unfortunately, the scaling issues/excessive load times were big enough of an issue that we had to make the switch, as our number of hosted models continues to grow.


We supported batch api calls in v0. But as those API calls increased a new instanced would get spun up but boot time was longer. To get around it, we would have to keep more instances running all the time which obviously costs more money.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: