A novel multivariate filter method for feature selection in text classification problems

Labani, M, Moradi, P, Ahmadizar, F and Jalili, M 2018, 'A novel multivariate filter method for feature selection in text classification problems', Engineering Applications of Artificial Intelligence, vol. 70, pp. 25-37.


Document type: Journal Article
Collection: Journal Articles

Title A novel multivariate filter method for feature selection in text classification problems
Author(s) Labani, M
Moradi, P
Ahmadizar, F
Jalili, M
Year 2018
Journal name Engineering Applications of Artificial Intelligence
Volume number 70
Start page 25
End page 37
Total pages 13
Publisher Pergamon Press
Abstract With increasing number of documents in digital format, automatic text categorization has become a crucial task in pattern recognition problems. To ease the classification task, feature selection methods have been introduced to reduce the dimensionality of the feature space, and thus improve the classification performance. In this paper a novel filter method for feature selection, called Multivariate Relative Discrimination Criterion (MRDC), is proposed for text classification. The proposed method focuses on the reduction of redundant features using minimal-redundancy and maximal-relevancy concepts. To this end, the proposed method takes into account document frequencies for each term, while estimating their usefulness. The proposed method not only selects the features with maximum relevancy, but also the redundancy between them is takes into account using a correlation metric. MRDC does not employ any learning algorithm to evaluate the usefulness of the selected features, and thus it can be categorized as a filter method. In order to assess the effectiveness of the proposed method, several experiments are performed on three real-world datasets. The obtained results are compared to the state-of-the-art filter methods. The reported results show that in most cases MRDC results in better classification performance than others.
Subject Pattern Recognition and Data Mining
Information Systems Management
Keyword(s) Dimensionality reduction
Feature selection
Filter approach
Multivariate analysis
Text classification
DOI - identifier 10.1016/j.engappai.2017.12.014
Copyright notice © 2018 Elsevier Ltd. All rights reserved.
ISSN 0952-1976
Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 9 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 0 times in Scopus Article
Altmetric details:
Access Statistics: 36 Abstract Views  -  Detailed Statistics
Created: Wed, 19 Sep 2018, 13:27:00 EST by Catalyst Administrator
© 2014 RMIT Research Repository • Powered by Fez SoftwareContact us