Monday 3:40 PM–4:20 PM in Track 2

Find the Farm (Data Science Insights into Real Estate Pricing)

en zyme

Audience level:


Real estate transactions are geographically and temporally sparse. Pricing models traditionally rely on only physical parameters; omitting realtor effects, listing or selling. Realtor farms found by cluster identification, are analyzed for negotiation strength in listing vs sales prices


Using gmplot, geopy, and Python data science tools we’ll discover realtor farms, and assess the characteristics of sales vs listing price. Real estate transactions tend to be geographically sparse and temporally rare. There is often both a listing and a selling agent in the representing a given property. The sales price is determined by a number of factor. While there has been considerable interest in building pricing models relying on physical parameters, there has been little work done in assessing the contribution of the realtor. The discovery of a ‘farm’ uses cluster identification methods. These farms can then be analyzed for imputed listing prices and the sales price, both of which are negotiated.

The problem: Most real estate analytics deal only with property description and location. Markets can swing quickly from buyer’s to seller’s advantage, so timing and days on market is important. Agent effects are not well understood and can be a significant factor in determining the actual price. Data source are examined . Python Modules utilized. Application of data science, e.g. modules pycluster, pyclustering, scikit-learn. (the talk is primarily application, not theory)

Examples of geographic and hidden affinity Analysis of listing price to appraisal and listing agent effect Analysis of over/under-performance of sales price to listing price Determination of listing agent vs selling agent negotiation skills. Effect of dual agency on pricing. Effect of listing agent Farms on neighborhood pricing.

Consideration as a Machine Learning project using Theano or TensorFlow , Keras, Sonnet tflearn

Conclusions and future directions Questions

data, code, notebooks, and graphics will be included

The methodology presented is likely applicable to other low-volume high-value facilitated transactions.