Using relative entropy for authorship attribution

Zhao, Y, Zobel, J and Vines, P 2006, 'Using relative entropy for authorship attribution', in H. Ng, M. Leong, M. Kan and D. Ji (ed.) Proceedings of the 3rd Asia retrieval symposium AIRS 2006, Singapore, 18 October 2006, pp. 92-105.

Document type: Conference Paper
Collection: Conference Papers

Title Using relative entropy for authorship attribution
Author(s) Zhao, Y
Zobel, J
Vines, P
Year 2006
Conference name Asia Retrieval Symposium
Conference location Singapore
Conference dates 18 October 2006
Proceedings title Proceedings of the 3rd Asia retrieval symposium AIRS 2006
Editor(s) H. Ng
M. Leong
M. Kan
D. Ji
Publisher Springer
Place of publication Berlin, Germany
Start page 92
End page 105
Total pages 14
Abstract Authorship attribution is the task of deciding who wrote a particular document. Several attribution approaches have been proposed in recent research, but none of these approaches is particularly satisfactory, some of them are ad hoc and most have defects in terms of scalability; effectiveness, and efficiency. In this paper, we propose a principled approach motivated from information theory to identify authors based on elements of writing style. We make use of the Kullback-Leibler divergence, a measure of how different two distributions are, and explore several different approaches to tokenizing documents to extract style markers. We use several data collections to examine the performance of our approach. We have found that our proposed approach is as effective as the best existing attribution methods for two class attribution, and is superior for multi-class attribution. It has lower computational cost and is cheaper to train. Finally, our results suggest this approach is a promising alternative for other categorization problems.
Subjects Business Information Management (incl. Records, Knowledge and Information Management, and Intelligence)
Keyword(s) attribution
computational cost
information theory
DOI - identifier 10.1007/11880592_8
Copyright notice © Springer-Verlag Berlin Heidelberg 2006
ISBN 978-3-540-45780-0
Version Filter Type
Altmetric details:
Access Statistics: 295 Abstract Views  -  Detailed Statistics
Created: Wed, 08 Apr 2009, 09:42:32 EST by Catalyst Administrator
© 2014 RMIT Research Repository • Powered by Fez SoftwareContact us