This talk shows how we can reduce risk of failure in software development by using machine learning. Using example code and artifacts from Apache projects, and scikit-learn with Jupyter notebook, we show how to identify areas of risk in large codelines. Some surprising statistical results on distribution of risk in code are also shown., which suggest we may be "doing it wrong".
Software development is in a historical transition from a stage of organized craftsmanship to a stage of industrial production of software. As a part of this transition machines are doing more and more of the repetitive work freeing humans to do the creative work and decision making. However we are behind in the ability to extract risk signals from development artifacts of large projects. We are not getting better at reducing project failures. We can use machine learning applied to the development process to improve outcomes.
This talk shows how change history and issue tracking data can be correlated to identify areas of risk concentration in the project. Simultaneously some statistical results show us areas where conventional methods may be wasting effort and areas where these may be ignoring risk.
Demo uses example code and artifacts from Apache projects, and scikit-learn with Jupyter notebook.