Floto is an open source tool to programmatically author, schedule and run scalable data pipelines using AWS Simple Workflow - without the need to maintain a master server or queue or the state of workers.
There are quite a few great tools for building effective and robust distributed data processing pipelines, especially Luigi from Spotify and Airflow from AirBnB.
For scaling out, they all require a queue or master server, though. And those need maintenance.
We wrote floto (github.com/babbel/floto), an open source tool to programmatically author, schedule and run scalable data pipelines on AWS - without the maintenance overhead.
It uses AWS Simple Workflow, but I'll talk most about some general topics regarding data workflow orchestration: