Data fusion for Japanese term and character N-gram search

Yasukawa, M, Culpepper, J and Scholer, F 2015, 'Data fusion for Japanese term and character N-gram search', in Laurence A. F. Park and Sarvnaz Karimi (ed.) Proceedings of the 20th Australasian Document Computing Symposium (ADCS 2015), Parramatta, Australia, 8 - 9 December 2015, pp. 1-4.


Document type: Conference Paper
Collection: Conference Papers

Title Data fusion for Japanese term and character N-gram search
Author(s) Yasukawa, M
Culpepper, J
Scholer, F
Year 2015
Conference name ADCS 2015
Conference location Parramatta, Australia
Conference dates 8 - 9 December 2015
Proceedings title Proceedings of the 20th Australasian Document Computing Symposium (ADCS 2015)
Editor(s) Laurence A. F. Park and Sarvnaz Karimi
Publisher Association for Computing Machinery
Place of publication New York, United States
Start page 1
End page 4
Total pages 4
Abstract Term segmentation plays a vital role in building effective information retrieval systems. In particular, languages such as Japanese and Chinese require a morphological analyzer or a word segmenter to identify potential terms. The alternative approach to indexing a segmented collection is n-gram search, where every n-length sequence of symbols is indexed. Both approaches have strengths and weaknesses when applied to non-English collections. In this study, we explore data fusion techniques to answer the following question: if there are multiple ranked lists of documents from both word and n-gram indexes, can we improve overall effectiveness by combining them? We consider three empirical methods for combining search results using eight different search indexes and twenty-one different search models with and without automatic query expansion. Our approach is language independent; however, we focus on Japanese test collections -- NTCIR IR4QA -- as our testbed for the current experiments. Our experimental results demonstrate that the combination of the two different segmentation approaches has the potential to significantly outperform the best word-segmented search methods.
Subjects Information Retrieval and Web Search
Keyword(s) term segmentation
morphological analysis
n-gram search
Copyright notice © ACM 2015
ISBN 9781450340403
Versions
Version Filter Type
Access Statistics: 95 Abstract Views  -  Detailed Statistics
Created: Thu, 21 Jan 2016, 07:51:00 EST by Catalyst Administrator
© 2014 RMIT Research Repository • Powered by Fez SoftwareContact us