Effective and scalable authorship attribution using function words

Zhao, Y and Zobel, J 2005, 'Effective and scalable authorship attribution using function words', in GG Lee, A Yamada, H Meng and SH Myaeng (ed.) Information Retrieval Technology, Jeju Island, Korea, 13-15 October 2005.

Document type: Conference Paper
Collection: Conference Papers

Title Effective and scalable authorship attribution using function words
Author(s) Zhao, Y
Zobel, J
Year 2005
Conference name Asian Information Retrieval Symposium
Conference location Jeju Island, Korea
Conference dates 13-15 October 2005
Proceedings title Information Retrieval Technology
Editor(s) GG Lee
A Yamada
H Meng
SH Myaeng
Publisher Springer
Place of publication Berlin
Abstract Techniques for identifying the author of an unattributed document can be applied to problems in information analysis and in academic scholarship. A range of methods have been proposed in the research literature, using a variety of features and machine learning approaches, but the methods have been tested on very different data and the results cannot be compared. It is not even clear whether the differences in performance are due to feature selection or other variables. In this paper we examine the use of a large publicly available collection of newswire articles as a benchmark for comparing authorship attribution methods. To demonstrate the value of having a benchmark, we experimentally compare several recent feature-based techniques for authorship attribution, and test how well these methods perform as the volume of data is increased. We show that the benchmark is able to clearly distinguish between different approaches, and that the scalability of the best methods based on using function words features is acceptable, with only moderate decline as the difficulty of the problem is increased.
Subjects Business Information Management (incl. Records, Knowledge and Information Management, and Intelligence)
DOI - identifier 10.1007/11562382_14
Copyright notice © Springer-Verlag Berlin Heidelberg 2005
Version Filter Type
Altmetric details:
Access Statistics: 258 Abstract Views  -  Detailed Statistics
Created: Wed, 22 Jul 2009, 15:47:23 EST by Catalyst Administrator
© 2014 RMIT Research Repository • Powered by Fez SoftwareContact us