Estimating wifi point locations using Bayesian Mixture Model on mobile phone sensor data. Visualising results and comparing against known truth.
Mixture Models are a form of unsupervised learning that can be used to obtain 'fuzzy' clustering - i.e. probabilities of each observation belonging to each cluster. Probabilistic programming enables us to build a model on small data and estimate many unknown parameters (optimal number of clusters, cluster probabilities, cluster centres/means and standard deviation...) whilst passing the uncertainty along into the final parameter estimates, so for example, we'll get a range of likely values for a cluster mean, not just a point estimate.
As wifi point id is in fact known, we effectively have information about true clusters to test against. Also, our 2-dimensional longitude-latitude data lends itself well to common-sense checks. We'll compare the results of 3 popular algorithms: Markov Chain Monte Carlo simulation (mcmc), Variational Bayes and MAP.
We'll use PYMC3 for modelling and Folium for visualisation. Aimed at anyone apart from absolute beginners in python, only basic statistics knowledge is required.