Friday November 12 16:30 – Friday November 12 17:05 in Auditorium

Model Optimization Techniques for Large Scale Recommender Systems

Bugra Akyildiz

Prior knowledge:
Previous knowledge expected
Deep Learning and model architectures

Summary

Model optimization techniques have been deployed to reduce the model size and computing requirements without degrading the model accuracy. At Facebook, we employ these techniques such as network quantization, embedding quantization, FC pruning in order to make the model size to be smaller and deploy in a cost-effective manner.

Outline

Description

Deep Learning and its application to a variety of applications such as computer vision and natural language processing has made significant progress. Especially, in recent years, larger models(GPT-3, CLIP, XLM-R) have made large degree of success in a variety of tasks. However, these large models bring high computing and storage cost which make it difficult to deploy these models in a variety of hardware in a cost-effective manner. The size of the models also prevent real-time prediction capability in certain cases. In order to address the size of the models, model optimization techniques have been deployed to reduce the model size and computing requirements without degrading the model accuracy. At Facebook, we employ these techniques such as network quantization, embedding quantization, FC pruning in order to make the model size to be smaller and deploy in a cost-effective manner. In this talk, I will compare and discuss state of the art model optimization techniques and their applications in large scale recommender systems.