TensorFlow Model Serving supports:
In this tutorial, I am going to show how a machine learning model can be put into production environment through TensorFlow Serving in a scalable manner with low latency in mind.
Increased success of various neural networks and deep learning models for various tasks such as image classification, text classification and object detection; more and more companies want to utilize deep learning models into their product offerings and workflows.
However, building a machine learning system is a challenging task as machine learning systems tend to be complex because components of the system are distinct and have different requirements. Training and evaluation of a model require different resources than serving the model in the prediction layer.
Training and validating a model can be done offline, and most of the serving predictions can also be done offline. However, certain use cases like search where input space can be very dynamic, the model serving needs to be online. This also allows building certain product features that cannot be done through offline predictions such as personalization where you want to modify predictions or re-rank the results per customer.
Serving a model to customer requests in the production is a difficult task as the machine learning model needs to respond to customer requests under certain service level agreements(SLAs) and in a high throughput request volume.
Serving the model in production with an API should support the following requirements:
TensorFlow Serving allows to productionize various TensorFlow models is a perfect match for these use cases. It also allows software engineers to safely deploy new models and run experiments on different models while keeping the same server architecture and APIs.
In this talk, I am going to show how a machine learning model can be put into production environment through TensorFlow Serving in a scalable manner with low latency in mind.