Because we were loading all the models startup time was long which meant that server would return 5xx errors which created more instability. We would had to do some engineering around it with a mix of config and code changes.
The bigger issue was that he had to use bigger machine as we added more custom ML models for our customers. New architecture gives us huge $$ saving and more visibility into performance of each model.
The bigger issue was that he had to use bigger machine as we added more custom ML models for our customers. New architecture gives us huge $$ saving and more visibility into performance of each model.