We will present a case study of portfolio allocation on a Peer 2 Peer platform. Within the context of development of a heuristic optimization algorithm, we will focus on the bottlenecks of data manipulation in pandas, how to resolve them by using pure Numpy, compilation with Numba and embarrassingly parallel loops and explain how they work under the hood.
Large-scale optimization problems which require bespoke implementation are common in business. We will overview the entire stage of development of a portfolio allocation algorithm, from prototyping to profiling and optimizing for speed using the PyData stack. In the process, we will examine common bottlenecks in scientific computing and discuss implementation strategies for cases when Pandas is not enough. In particular, we will discuss computation in pure Numpy, just-in-time compilation via Numba, parallelization using joblib and how they work behind the scenes.
We will present a case study of portfolio optimization from a pool with millions of assets and thousands of investors. Our problem is closely related to computer science problems known to be NP-hard. We will discuss how, at our scale, exact solvers turned out to be prohibitively slow and how, consequently, we designed our custom algorithm based on heuristic optimization.