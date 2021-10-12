Model serving is a critical component of AI use-cases. It involves offering an inference from an AI model in response to a user request. Those who have dabbled in enterprise-grade machine learning applications know that it is usually not one model providing an inference, but actually hundreds or even thousands of models running in tandem. This is a very expensive process computationally as you can't spin up a dedicated container each time you want to serve a request. This is a challenge for developers deploying a large number of models across Kubernetes clusters because there are limitations such as the maximum number of pods and IP addresses allowed as well as compute resource allocation.

SOFTWARE ・ 8 DAYS AGO