Collapsed consonant and vowel models: New approaches for English-Persian transliteration and back-transliteration

Karimi, S, Scholer, F and Turpin, A 2007, 'Collapsed consonant and vowel models: New approaches for English-Persian transliteration and back-transliteration', in Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic, 23-30 June 2007.


Document type: Conference Paper
Collection: Conference Papers

Title Collapsed consonant and vowel models: New approaches for English-Persian transliteration and back-transliteration
Author(s) Karimi, S
Scholer, F
Turpin, A
Year 2007
Conference name Association of Computational Linguistics
Conference location Prague, Czech Republic
Conference dates 23-30 June 2007
Proceedings title Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics
Publisher Association of Computational Linguistics
Place of publication USA
Abstract Most current machine transliteration systems employ a corpus of known sourcetarget word pairs to train their system, and typically evaluate their systems on a similar corpus. In this paper we explore the performance of transliteration systems on corpora that are varied in a controlled way. In particular, we control the number, and prior language knowledge of human transliterators used to construct the corpora, and the origin of the source words that make up the corpora. We find that the word accuracy of automated transliteration systems can vary by up to 30% (in absolute terms) depending on the corpus on which they are run. We conclude that at least four human transliterators should be used to construct corpora for evaluating automated transliteration systems; and that although absolute word accuracy metrics may not translate across corpora, the relative rankings of system performance remains stable across differing corpora.
Subjects Business Information Management (incl. Records, Knowledge and Information Management, and Intelligence)
Copyright notice © 2007 Association for Computational Linguistics
Versions
Version Filter Type
Access Statistics: 143 Abstract Views  -  Detailed Statistics
Created: Fri, 09 Oct 2009, 08:09:01 EST by Catalyst Administrator
© 2014 RMIT Research Repository • Powered by Fez SoftwareContact us