Based on scientific Python frameworks we developed a framework for the application of Zero-shot Learning on Intrusion Detection. On the attribute learning stage we propose a new algorithm based on decision trees. The inference stage is based on a k-NN algorithm representing the data in a Grassmann manifold. We present our implementation details as well as the results on a Big Data environment.
Network intrusion detection (NID) is one of the most visible uses for Big Data analytics. One of the main problems in this application is the constant rise of new attacks. This scenario, characterized by the fact that not enough labeled examples are available for the new classes of attacks is hardly addressed by traditional machine learning approaches. New findings on the capabilities of Zero-shot learning (ZSL) approach makes it an interesting solution for this problem because it has the ability to classify instances of unseen classes. This approach has been widely applied to computer vision related tasks but from the best of our knowledge it has not been applied to other kind of problems. ZSL has inherently two stages: the attribute learning and the inference stage. Based on scientific Python frameworks such as: Scikit-learn, Scipy and Numpy we developed a framework for the application of ZSL on Network Intrusion Detection. We propose a new algorithm for the attribute learning stage. The idea is to learn new values for the attributes based on decision trees (DT). Our results show that based on the rules extracted from the DT a better distribution for the attribute values can be found. We also propose an experimental setup for the evaluation of ZSL on NID. Then, for the inference stage we propose an instance-based classification algorithm representing the data on the Grassmann manifold and implementing a new distance function to compute the k-Nearest Neighbours. On this stage around of 5 millions of data instances are processed on real time in different batches simulating a real network traffic. In this talk we present our framework implementation details as well as its deployment on a Big Data environment. The obtained results show a competitive accuracy in both detecting the new classes of attacks as well as in its classification.