An empirical study of learning from imbalanced data

Zhang, X and Yuxuan, L 2011, 'An empirical study of learning from imbalanced data', in Heng Tao Shen and Yanchun Zhang (ed.) Proceedings of 22nd Australasian Database Conference (ADC 2011), Perth, Australia, Jaunary 2011, pp. 85-95.


Document type: Conference Paper
Collection: Conference Papers

Title An empirical study of learning from imbalanced data
Author(s) Zhang, X
Yuxuan, L
Year 2011
Conference name ADC 2011
Conference location Perth, Australia
Conference dates Jaunary 2011
Proceedings title Proceedings of 22nd Australasian Database Conference (ADC 2011)
Editor(s) Heng Tao Shen and Yanchun Zhang
Publisher Australian Computer Society
Place of publication Perth, Australia
Start page 85
End page 95
Total pages 11
Abstract No consistent conclusions have been drawn from ex- isting studies regarding the e ectiveness of di erent approaches to learning from imbalanced data. In this paper we apply bias-variance analysis to study the utility of di erent strategies for imbalanced learning. We conduct experiments on 15 real-world imbalanced datasets of applying various re-sampling and induc- tion bias adjustment strategies to the standard deci- sion tree, naive bayes and k-nearest neighbour (k-NN) learning algorithms. Our main ndings include: Im- balanced class distribution is primarily a high bias problem, which partly explains why it impedes the performance of many standard learning algorithms. Compared to the re-sampling strategies, adjusting in- duction bias can more signi cantly vary the bias and variance components of classi cation errors. Espe- cially the inverse distance weighting strategy can sig- ni cantly reduce the variance errors for k-NN. Based on these ndings we o er practical advice on apply- ing the re-sampling and induction bias adjustment strategies to improve imbalanced learning.
Subjects Pattern Recognition and Data Mining
Keyword(s) Bias-Variance Analysis
Imbalanced
Learning
Copyright notice Copyright © 2011, Australian Computer Society
ISBN 9781920682958
Versions
Version Filter Type
Access Statistics: 342 Abstract Views  -  Detailed Statistics
Created: Fri, 10 Feb 2012, 11:32:00 EST by Catalyst Administrator
© 2014 RMIT Research Repository • Powered by Fez SoftwareContact us