Los Sistemas Recomendadores analizan patrones de interés del usuario como artículos o productos para poder proporcionar recomendaciones personalizadas que satisfagan sus preferencias. Las sugerencias intervienen en varios procesos de toma de decisiones, tales como qué artículos comprar o qué películas mirar. Para poder llevar esto adelante se deben realizar diferentes tareas.
Shipping a product from scratch entails many challenges, starting with business and all the way to algorithms and infrastructure; even more so when it has to scale to millions of customers, and in a domain as diverse as e-commerce. In this presentation, we will share how the largest e-commerce platform of LATAM developed its recommendations engine from its birth in 2017, up to the present day.
How to create machine learning models that use extremely unbalanced data and that needs to run in production in a few milliseconds? That is a challenge we are currently addressing in iFood to identify fraud in online payments.
Todos sabemos que los prónosticos de lluvia no son confiables. En esta charla presentaremos como la ciencia de datos puede ayudarnos a mejorar esto y en el proceso presenteremos algunos conceptos básicos de estadística, machine learning y los problemas que se enfrentan a la hora de hacer un desarrollo basados en datos.
Python language penetration in the scientific world is not only large now, but also keeps increasing at a very fast pace, due to its user friendliness and allowing the user to develop useful programs quickly. Python’s shortcomings (mainly being very slow on execution and non-threadable in its most popular implementation, cPython) are usually overcome. We present steps and exercises to do this.
La optimización es una rama de la matemática con muchas aplicaciones en el "mundo real", utilizada en algoritmos muy famosos como regresión lineal o el entrenamiento de redes neuronales. En esta charla voy a explicar algunos conceptos básicos y mostrar cómo se puede implementar algoritmos de optimización en Python para invertir en la Bolsa.
¿Te interesaría conocer cómo fue aplicada la teoría de grafos en la final del mundial Sudáfrica 2010? ¿Quieres saber cuál de tus amigos de facebook es el mas popular? ¿Quieres aprender a modelar sistemas complejos como una red social, un sistema de transporte o una red eléctrica? En esta charla vas aprender los conceptos de esta teoría y sus aplicaciones en Python.
At OLX we face the interesting challenge of connecting sellers and buyers around the world. Providing relevant search results is a key part of this, and we need smart algorithms to understand user queries and to provide personalized results. In this talk, we'll explain how we create better representations of categories, products and searches to improve recall, and how we use machine learning to generate accurate yet personalized search results.
PyMC3 is a Python module for probabilistic programming. ArviZ is a Python package for exploratory data analysis of Bayesian models. I will show how to build, solve and check models in PyMC3 with the help of ArviZ.
In this talk we will exhibit how we leverage data from different sources, depending on its characteristics, using several approaches and algorithms in Python, to optimize automatic real time decision taking at the company. First, we will introduce the problem we have to solve. Then, the different data sources will be covered, in order to later explain four different machine learning systems to take advantage of them, an online estimator, user segmentation systems, an ensemble and a recommender system. Finally, we will showcase how we use the different systems for decision taking.
This talk is aimed at lead roles of data science projects, like technical leaders or project managers. We will present tools and tricks to deal with the peculiarities of involving Data Science in a regular software project: best engineering practices, tricks for doing Agile, metrics, the use of experiments, common pitfalls, communication with stakeholders.
En esta charla se mostrará el paso a paso de un proyecto real de clasificación de imágenes: obtención, procesamiento y utilización de distintas técnicas de Machine Learning y Deep Learning para para clasificar fotos de restaurantes enviadas por los usuarios a la plataforma Yelp!.
This talk will give an overview of our visual search system that we built for Hayneedle catalog for furniture category which utilizes Kafka, Nomad, FAISS and Tensorflow. A visual search system that utilizes images of the products enables a better experience in categories where visual appeal is very important and/or attributes of the products are not as useful.
How and why does reinforcement learning work, and what are its limitations? This workshop will give participants deep insights into reinforcement learning through intuitive examples. After exploring some of the theoretical background, you will apply the theory as you learn to code the famous Q-learning algorithm. At the end of the session, you will even get a glimpse of its Deep Learning counterpart, the legendary DQN algorithm.
In the last years, supervised learning with Convolutional Neural Networks (CNNs) has significantly increased its adoption in a wide variety of classification and regression tasks, while unsupervised learning with CNNs has received less attention. In this work, we present a special architecture of CNNs called Deep Convolutional Generative Adversarial Network (DCGAN) that was formulated to perform unsupervised learning tasks. After describing the model, it will be shown its performance by training it on two datasets, and finally we will introduce a special use case of a Generative Adversarial Network (GAN) to address the problem of Domain Shift Adaptation for Semantic Segmentation.
In the last few years, word embeddings have become ubiquitous in NLP problems across the board. But more than word level information can be transfered and we have seen great successes in models like ULMFit and OpenAI transformer pushing the state of the art in many classic datasets. How do these models work and how can you use them for your everyday NLP tasks?
En la charla se pretende hacer una breve introducción a Pandas, mostrando algunas de sus características más sobresalientes, como así también una introducción a Dask, explicando cuando es conveniente utilizar uno u otro paquete y planteando un ejemplo que integre ambos.
Spark-NLP is a Natural Language Understanding Library that works distributedly on top of Apache Spark. It integrates with Spark Pipelines and provides state of the art models for Parsing, Sentence Detection, Named Entity Recognition, and Spell Checking, just to mention a few. It compares to other libraries for single node runtime performance, and excels in cluster environments.
Data aggregation is perhaps the most common "big data" application there is, yet it is hard to find quality tools to actually do it. The problem is simple, you've got a stream of stuff, each row with a set of features, and you want to build a summarized view of all that data that's easier for reporting. We'll discuss a custom solution that manages to blend the best of both big and small data.
El análisis de calidad de imágenes ha sido tradicionalmente ligado a la industria cinematográfica y a la visión artificial. Con el avance de los e-commerce y el desarrollo de una mercadotecnia más avanzada muchas de estas técnicas han sido redescubiertas debido a nuevos insights sobre su impacto en las ventas. En esta charla presentaremos algunos algoritmos y problemas clásicos así como también nuevos enfoques y desafíos.
The objective of this talk is to explain some strategies for processing dataframes when the data starts to grow. 1) Vertical Scaling: Pandas + tons of ram and cpu 2) Controlling the memory usage: Pandas + hdf5 3) Using all your cpu power: Python Multiprocessing 4) Distributed Pandas: Working with Dask 5) Using Map/Reduce: Hadoop and Spark 6) Going Serverless: Redshift unload+parquet+Lambda