Model optimization techniques have been deployed to reduce the model size and computing requirements without degrading the model accuracy. At Facebook, we employ these techniques such as network quantization, embedding quantization, FC pruning in order to make the model size to be smaller and deploy in a cost-effective manner.
Deep Learning and its application to a variety of applications such as computer vision and natural language processing has made significant progress. Especially, in recent years, larger models(GPT-3, CLIP, XLM-R) have made large degree of success in a variety of tasks. However, these large models bring high computing and storage cost which make it difficult to deploy these models in a variety of hardware in a cost-effective manner. The size of the models also prevent real-time prediction capability in certain cases. In order to address the size of the models, model optimization techniques have been deployed to reduce the model size and computing requirements without degrading the model accuracy. At Facebook, we employ these techniques such as network quantization, embedding quantization, FC pruning in order to make the model size to be smaller and deploy in a cost-effective manner. In this talk, I will compare and discuss state of the art model optimization techniques and their applications in large scale recommender systems.