At Booking.com, we want to automatically translate millions of guest reviews across 43 languages. Transfer Learning, both across domains and languages, is at the heart of our solution. We will present how we used the OpenNMT-tf library, leveraged various open source datasets, and trained these models efficiently at scale. We will end with a nice demo!
At Booking.com’s Machine Translation (MT) team, we want to empower people to experience the world without any language barriers. Booking.com is available in 43 languages. But the reviews that our users write used to be available only in the reviewer’s original language of choice. In order for more users to be able to understand and get value from this review, our ultimate goal is to translate also these reviews into all of our 43 languages.
We will present our workflow of training our MT models with the open source package OpenNMT-tf (multi-GPU training, based on Tensorflow, with Python bindings). We would like to show how using transfer learning with a big dataset of open source translation data (up to 90 million parallel sentences) helped us advance our review translations use case. Increasing our dataset to this scale also introduced the challenge of keeping the training time manageable. We will also dive into more detail on how to monitor GPU-usage, and explain some steps we took to reduce training time.
We would like to end with a more relaxed and fun demo, in which we'll show:
The pitfall of training on out-of-domain or narrow data: our original models trained on property descriptions were only able to translate positive text, any negative source was actually translated into positive.
The power of good and widely trained machine translation models: we will show how doubly translation from English to Spanish and back from Spanish to English can often correct spelling mistakes that were originally made in the English review submitted (!).