With lower barriers to entry and rise of streaming, the volume of new songs and artists is exponentially increasing. Given these trends in the music industry, I would like to look at a few challenging problems I encountered at PPL. Solving these problems will improve the efficiency with which we churn through millions of observations and therefore minimise any delay in allocations of royalties.
I would like to present a few challenging questions that I have encountered at PPL:
Rolling up music and musicians: for instance, Beyonce, Bey, Queen Bee, Beyonce feat. Jay-Z, Beyoncé feat jayz all should ideally come under the parent of Beyoncé. A lot has to do with normalising the strings. But an additional layer, with large potential for improving the predictive power, is introducing aliases and leveraging PPL's as well as external third-party data to identify potential aliases.
Identifying turning-point collaborations: which partnerships among musicians are most likely to change their careers significantly? This involves looking at: firstly, whether music collaborations (including band memberships) are changing in nature. Secondly, asking how to identify the value added of collaborators, i.e. the effect of certain collaborations on the artist's profile, recognition and success.
How to identify outliers at the artist level? The distribution of many variables of interest (band memberships, active years etc.) is highly heterogeneous across artists, and contain a few suspicious outliers that are hard to identify from aggregate distributions.