How to build R-like statistical models in Python with Patsy and scikit-learn.
Creating linear and logistic models in R is dead simple. If your numpy/panda-fu isn’t all that great than it’s a lot harder to do in Python. In R, for instance, you can declare a model with a formula as simple as y ~ x1 + x2
. But in Python, you have to split out your target and input variables and make sure that the matrices work within the scikit-learn API.
In this talk I will introduce the Patsy package for describing and creating statistical models in Python. I’ll walk through how to implement a logistic regression with Patsy and scikit-learn and I’ll emphasize Patsy as a bridge for those who want to better understand Python and/or R.