Monday 1:10 PM–1:55 PM in Music Box 5411/Winter Garden 5412 (5th fl)

What am I even looking at?

Nick Acosta

Audience level:
Novice

Description

This talk will focus on the classification of programming languages based on their text, and will include introductions to Watson and Github API's, as well as machine learning techniques for text classification.

Abstract

The rise of Python as the go-to programming language for data scientists has made the field one of the more monoglot developer communities, to the point where a data scientist may ask "What programming language am I even looking at?" when sent code from another developer. Luckily, machine learning models can be built to perform programming language detection for data scientists. Benefits of programming language classification models could then improve code compilation and even pave way towards automatic code generation. This talk will go over a few such models, including Naive Bayes and Neural Networks for binary and multi-class classification. The talk will compare the pros and cons of each method, show some of the sort-of-the-art work previously performed, and how they can be easily augmented with the help of API's from IBM Watson.

Subscribe to Receive PyData Updates