Cross-corpus Native Language Identification via Statistical Embedding

Rangel, F, Rosso, P, Uitdenbogerd, A and Brooke, J 2018, 'Cross-corpus Native Language Identification via Statistical Embedding', in Proceedings of the Second Workshop on Stylistic Variation, New Orleans, United States, 5 June 2018, pp. 39-43.


Document type: Conference Paper
Collection: Conference Papers

Title Cross-corpus Native Language Identification via Statistical Embedding
Author(s) Rangel, F
Rosso, P
Uitdenbogerd, A
Brooke, J
Year 2018
Conference name The Second Workshop on Stylistic Variation (at NAACL)
Conference location New Orleans, United States
Conference dates 5 June 2018
Proceedings title Proceedings of the Second Workshop on Stylistic Variation
Publisher Association for Computational Linguistics
Place of publication United States
Start page 39
End page 43
Total pages 5
Abstract In this paper, we approach the task of na- tive language identification in a realistic cross- corpus scenario where a model is trained with available data and has to predict the native lan- guage from data of a different corpus. We have proposed a statistical embedding representa- tion reporting a significant improvement over common single-layer approaches of the state of the art, identifying Chinese, Arabic, and In- donesian in a cross-corpus scenario. The pro- posed approach was shown to be competitive even when the data is scarce and imbalanced.
Subjects Natural Language Processing
Applied Linguistics and Educational Linguistics
Computational Linguistics
Keyword(s) natural language processing
computational linguistics
native language identification
text classification
Copyright notice © 2018 The Association for Computational Linguistics
ISBN 9781948087247
Versions
Version Filter Type
Access Statistics: 10 Abstract Views  -  Detailed Statistics
Created: Thu, 21 Feb 2019, 12:10:00 EST by Catalyst Administrator
© 2014 RMIT Research Repository • Powered by Fez SoftwareContact us