Tensor networks have been used in Physics to find efficient expressions of many-body quantum systems, describing systems from materials to holographic spacetime. As it happens, one can also use tensor networks in machine learning. Exploring the equivalence between these two affords us intuition on inductive bias, which is the set of assumptions that determine our choice of machine learning model.
Why do convolutional networks work well for images? What happens in a neural network when it 'learns’? What is machine learning, actually? These are the type of questions that we should all be wondering about if we use machine learning, and especially deep neural networks, on a daily basis. The field of deep learning is developing rapidly with new architectures being invented to try to solve ever more challenging problems, and this zoo of neural networks needs a taxonomy.
One way to bring order to the chaos is by using a physicist's intuition. Bridges are being built, formalizing the link between well-developed fields in physics and neural networks, which allow us to understand extracting information relevant on the macroscopic scale as both a machine learning problem and a problem that has been known in the physics community for a long time, namely why do natural systems look so different at different length scales?
The notion of filtering out noise whilst amplifying relevant macroscopic features has a rich history in many-body physics, where a challenge is to go from a microscopic description of a material to accurately describe the effective behavior at macroscopic scales. This is done via what is called the renormalization group, which helps you identify 'irrelevant’ interactions (interactions that have an impact on the very small scale, but tend to get washed out at the larger scale) and 'relevant’ interactions (those that get amplified at the larger scale). If you call the irrelevant interactions ‘noise’, and the relevant interactions ‘signal’, you can see a similarity with filtering out noise from signal (your features) in a machine learning model.
It turns out that this is no coincidence: these are similar concepts, and a new understanding can be developed around which neural network architectures are suitable for which kind of data.