The pursuit of safety in aviation is a task that requires our constant vigilance and effort. Throughout the use of a database from the NTSB the motivation of this talk is the use of different Python packages (Pandas, Scikit-learn) in order to answer multiple questions: Is commercial air transport safer now than 30 years ago? Which flight phase is safer? Which are the main accident causes?
Python has become a very useful tool from a data science point of view. This talk is aimed at anyone who is interested in data analysis, statistics or machine learning in Python having the added incentive of dealing with real data from the aviation authorities.
A typical data analysis workflow will be followed: starting by the data inspection and cleaning. Information from the National Transportation Safety Board (NTSB) will be analyzed using the capabilities of Pandas library to read, clean and manipulate data. This example is particularly suitable to highlight the main characteristics of the DataFrame object and show how data can be accessed, filtered, classified and plotted. Some different graphs ranging from classical plots and scatters to pie charts and representation on maps will be showed. Moreover, we will try to make use of some of the information to derive useful models for accident trends using regression and clustering algorithms from scikit-learn.