Performance evaluation provides quantitative information about the quality and the behavior of natural language processing (NLP) models, helping the developers to realize their systems. However, doing performance evaluation is not an entirely trivial task; an improper practice may lead to unreliable or even harmful effects of the final application. This talk will introduce the basic concepts of evaluation that are widely applied in NLP, including the metrics, cross-validation, and significant tests, and discuss the common pitfalls. Special issues in natural language generation will also be covered.
Dr. Hen-Hsen Huang is an assistant professor in the Department of Computer Science at the National Chengchi University. His research interests include natural language processing and information retrieval. His work has been published in ACL, SIGIR, WWW, IJCAI, CIKM, COLING, and so on. Dr. Huang’s award and honors include the Honorable Mention of Doctoral Dissertation Award of ACLCLP in 2014 and the Honorable Mention of Master Thesis Award of ACLCLP in 2008. He served as the registration chair of TAAI 2017, the publication chair of ROCLING 2020, and as PC members of representative conferences in computational linguistics including ACL, COLING, EMNLP, and NAACL. He was one of organizers of FinNum Task at NTCIR-2014 and FinNLP Workshop at IJCAI 2019.