Feature selection for multiclass binary data

Perera, K, Chan, J and Karunasekera, S 2018, 'Feature selection for multiclass binary data', in Dinh Phung, Vincent S. Tseng, Geoffrey I. Webb, Bao Ho, Mohadeseh Ganji, Lida Rashidi (ed.) Proceedings of the 22nd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2018) LNAI 10939, Melbourne, Australia, 3-6 June 2018, pp. 52-63.


Document type: Conference Paper
Collection: Conference Papers

Title Feature selection for multiclass binary data
Author(s) Perera, K
Chan, J
Karunasekera, S
Year 2018
Conference name PAKDD 2018: Advances in Knowledge Discovery and Data Mining Part III
Conference location Melbourne, Australia
Conference dates 3-6 June 2018
Proceedings title Proceedings of the 22nd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2018) LNAI 10939
Editor(s) Dinh Phung, Vincent S. Tseng, Geoffrey I. Webb, Bao Ho, Mohadeseh Ganji, Lida Rashidi
Publisher Springer
Place of publication Cham, Switzerland
Start page 52
End page 63
Total pages 12
Abstract Feature selection in binary datasets is an important task in many real world machine learning applications such as document classification, genomic data analysis, and image recognition. Despite many algorithms available, selecting features that distinguish all classes from one another in a multiclass binary dataset remains a challenge. Furthermore, many existing feature selection methods incur unnecessary computation costs for binary data, as they are not specifically designed for binary data. We show that exploiting the symmetry and feature value imbalance of binary datasets, more efficient feature selection measures that can better distinguish the classes in multiclass binary datasets can be developed. Using these measures, we propose a greedy feature selection algorithm, CovSkew, for multiclass binary data. We show that CovSkew achieves high accuracy gain over baseline methods, upto ∼ 40%, especially when the selected feature subset is small. We also show that CovSkew has low computational costs compared with most of the baselines.
Subjects Pattern Recognition and Data Mining
Keyword(s) Classification (of information)
Data mining
Image recognition
Information retrieval systems
Learning systems
DOI - identifier 10.1007/978-3-319-93040-4_5
Copyright notice © Springer International Publishing AG, part of Springer Nature 2018
ISBN 9783319930404
Versions
Version Filter Type
Citation counts: Scopus Citation Count Cited 0 times in Scopus Article
Altmetric details:
Access Statistics: 49 Abstract Views  -  Detailed Statistics
Created: Thu, 06 Dec 2018, 10:39:00 EST by Catalyst Administrator
© 2014 RMIT Research Repository • Powered by Fez SoftwareContact us