Learn how to train a custom tagger to classify text using scikit-learn, with practical tuning advice to get more accurate results. See how to create a REST API to train and host your tagger using AWS services including Lambda, API Gateway and Step Functions. Gather tips on how to overcome limitations in AWS and scikit-learn when creating your own custom tagger.
We introduce the basic concepts behind text classification and taxonomy, together with some of the typical uses for that metadata. We demonstrate how to train a tagger to apply a custom taxonomy using scikit-learn and other open source Python libraries. We discuss some of the pro's and con's of the different tools and techniques for text classification and give some practical advice for how to train your system to get more accurate results. We explain how to create a complete text tagging REST API using AWS services including Lambda, API Gateway, Step Functions, S3 and DynamoDB. Finally, we discuss some of the difficulties in creating such a REST API using AWS and scikit-learn and the methods we developed to overcome those limitations.