Saturday 10:00–10:35 in Megatorium

Looking at Sound: computer vision techniques in audio classification

Jurjen Feitsma, Wouter de Winter

Audience level:
Intermediate

Description

Using CNNs and spectrograms to classify sound. How does this work and what are the differences with image classification? We will then zoom in on classifying noise pollution in a (smart) city environment and detecting emotions in a customer contact center. Intended for people with some basic understanding of deep learning who are new to the audio domain.

Abstract

Description

A few years ago, modern deep learning techniques such as CNNs took the field of computer vision by a storm. But did you know that these techniques work surprisingly well on audio too? You will learn how to represent raw audio samples as an image and how to build a model to classify them.

Abstract

Raw audio is essentially a one-dimensional signal. Looking at the waveform, it’s often erratic en moving up en down thousands of times per second. It’s pretty hard to analyse as is. But after applying FFT, the signal becomes two-dimensional. Just like an image. We will cover some options and tradeoffs in applying this transformation.

For modelling, standard convolutional networks perform well. But as we’re dealing with audio, some things are a bit different. We will review some network architectures, augmentation techniques and some more advanced concepts that are specific for the audio domain.

Audio analysis becomes even more interesting combined with other information sources such as text or video. The audio stream can contain hidden information that it not visible in an image or captured by words.

We will also look at two real-world applications for audio classification. Classifying noise pollution in a (smart) city environment and detecting emotions in a customer contact center.

Audience level

This talk is intended for people with some basic understanding of deep learning who are new to the audio domain.

Subscribe to Receive PyData Updates

Subscribe