Capturing out-of-vocabulary words in Arabic text

Nwesri, A, Tahaghoghi, S and Scholer, F 2006, 'Capturing out-of-vocabulary words in Arabic text', in Proceedings of the 2006 conference on empirical methods in natural language processing (EMNLP 2006), Sydney, Australia, 22-23 July 2006.


Document type: Conference Paper
Collection: Conference Papers

Title Capturing out-of-vocabulary words in Arabic text
Author(s) Nwesri, A
Tahaghoghi, S
Scholer, F
Year 2006
Conference name Conference on Empirical Methods in Natural Language Processing
Conference location Sydney, Australia
Conference dates 22-23 July 2006
Proceedings title Proceedings of the 2006 conference on empirical methods in natural language processing (EMNLP 2006)
Publisher Association for Computational Linguistics
Place of publication Australia
Abstract The increasing flow of information between languages has led to a rise in the frequency of non-native or loan words, where terms of one language appear transliterated in another. Dealing with such out of vocabulary words is essential for successful cross-lingual information retrieval. For example, techniques such as stemming should not be applied indiscriminately to all words in a collection, and so before any stemming, foreign words need to be identified. In this paper, we investigate three approaches for the identification of foreign words in Arabic text: lexicons, language patterns, and n-grams and present that results show that lexicon-based approaches outperform the other techniques.
Subjects Business Information Management (incl. Records, Knowledge and Information Management, and Intelligence)
Copyright notice © 2006 Association for Computational Linguistics
Versions
Version Filter Type
Access Statistics: 182 Abstract Views  -  Detailed Statistics
Created: Fri, 09 Oct 2009, 08:09:01 EST by Catalyst Administrator
© 2014 RMIT Research Repository • Powered by Fez SoftwareContact us