Scaling up R/Python from a single machine to a cluster environment can be tricky. While there are many tools available that make the launching of a cluster relatively easy, they are not focused or optimized to the specific use case of analytics but mostly on operations. Come and learn about devops tips and tricks to optimize your transition into the big data world as a data scientist.
The migration of running R or Python locally on a single machine to a cluster environment can be tricky. While there are many tools and resources available that make the launching of a cluster relatively easy, they are not focused or optimized to the specific use case of analytics using R and/or Python, but mostly on operations.
Imagine this scenario: you are a data scientist at a small organization. There is no devops support and you need to start setting up your environment for big data processing and analysis. You start a cluster in the cloud (on your favorite provider), you log on, and you want to run an R script that you’ve developed in your laptop. Sounds easy, right? Well have you considered the following?
Come and learn about devops tips and tricks to optimize your transition into the big data world as a data scientist. This is a how-to session intended to raise awareness of some of the typical technical issues that can cause headaches. This session is not intended to be a sysadmin session, but hopefully give you an additional understanding of concepts you need to know, including tools such as Ansible for automating your setup.