The Shogun is a machine learning library, that has a wide range of efficient machine learning methods. As the library can be used through a unified interface from C++, Python, R, Java, C#, Ruby, etc. it is independent of trends in computing languages. This tutorial's aim is to give an in-depth review of Shogun's architecture and how it is used in education or in the industry.
The Shogun Machine learning toolbox provides a wide range of unified and efficient machine learning methods. The toolbox seamlessly allows to easily combine multiple data representations, algorithm classes, and general purpose tools. This enables both rapid prototyping of data pipelines and extensibility in terms of new algorithms. We combine modern software architecture in C++ with both efficient low-level computing backends and cutting edge algorithm implementations to solve large-scale Machine Learning problems (yet) on single machines.
One of Shogun's most exciting features is that you can use the toolbox through a unified interface from C++, Python, Octave, R, Java, Lua, C#, Ruby, etc. This not just means that we are independent of trends in computing languages, but it also lets you use Shogun as a vehicle to expose your algorithm to multiple communities. We use SWIG to enable bidirectional communication between C++ and target languages. See our examples in all target languages. Shogun runs under Linux/Unix, MacOS, and Windows.
Originally focussing on large-scale kernel methods and bioinformatics, the toolbox saw massive extensions to other fields in recent years. It now offers features that span the whole space of machine learning methods, including many classical methods in classification, regression, dimensionality reduction, clustering, but also more advanced algorithm classes such as metric, multi-task, structured output, and online learning, as well as feature hashing, ensemble methods, and optimization, just to name a few. Shogun in addition contains a number of exclusive state-of-the art algorithms such as a wealth of efficient SVM implementations, Multiple Kernel Learning, kernel hypothesis testing, Krylov methods, etc. All algorithms are supported by a collection of general purpose methods for evaluation, parameter tuning, preprocessing, serialisation & I/O, etc; the resulting combinatorial possibilities are huge. See our showroom.
The main focus point of the tutorial is the set of tools we have developed to create an easy deployable platform for data scientist (both for experts and novice users), that allows them to analyse the problem at hand using Jupyter notebooks. As Shogun is language agnostic, using Jupyter notebooks was a natural match as it allowed us to expose the full power of Shogun, namely that the users could use Shogun with any of the usual kernels of Jupyter (e.g. python, R, Java, Scala, Ruby etc.). As part of our effort to support all major programing languages we have developed a meta-example framework: a user can write an example use-case for a method using our 'meta-example' syntax, that would then be auto-translated into all of the supported languages of the library. This not only allows us to easily scale with the number of new methods in the sense of having the example use-cases of the method being exposed to all of the target languages, but it allows us to have those methods tested throughout of all the supported languages.