Commentator : A Code-mixed Multilingual Text Annotation Framework
Source
2024 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING
Author(s)
Sheth, Rajvee
Nisar, Shubh
Prajapati, Heenaben
Beniwal, Himanshu
Singh, Mayank
Editor(s)
Farias, DIH
Hope, T
Li, M
Abstract
As the NLP community increasingly addresses challenges associated with multilingualism, robust annotation tools are essential to handle multilingual datasets efficiently. In this paper, we introduce a code-mixed multilingual text annotation framework, Commentator, specifically designed for annotating code-mixed text. The tool demonstrates its effectiveness in tokenlevel and sentence-level language annotation tasks for Hinglish text. We perform robust qualitative human-based evaluations to showcase Commentator led to 5x faster annotations than the best baseline. Our code is publicly available at https://github.com/lingo-iitgn/ commentator. The demonstration video is available at https://bit.ly/commentator_ video.
