| Authors: | Xiānniàn Fàn [范先念], Kē Táng [唐珂], and Thomas Weise |
Learning from imbalanced datasets has drawn more and more attentions in both theoretical and practical aspects. Over-sampling is a popular and simple method for imbalanced learning. In this paper, we show that there is an inherently potential risk associated with the over-sampling algorithms in terms of the large margin principle. Then we propose a new synthetic over sampling method, named Margin-guided Synthetic Over-sampling (MSYN), to reduce this risk. The MSYN improves learning with respect to the data distributions guided by the margin-based rule. Empirical study verities the proposed analysis.
Classification, Overfitting, Machine Learning, ML, Mining of Imbalanced Data Sets, Oversampling
@inproceedings{FTW2011MBAOOSFIL,
author = {Xi{\={a}}nni{\`{a}}n F{\`{a}}n and K{\={e}} T{\'{a}}ng and Thomas Weise},
title = {{Margin-Based Over-Sampling Method for Learning From Imbalanced Datasets}},
booktitle = {Proceedings of the 15th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'11)},
editor = {Joshua (Zhexue) Huang and L{\'{o}}ngb{\={\i}}ng C{\={a}}o and Jaideep Srivastava},
publisher = {{Springer-Verlag GmbH: {Berlin, Germany}}},
series = {Lecture Notes in Computer Science (LNCS)},
year = {2011},
location = {{Sh{\={e}}nzh{\`{e}}n, Gu{\v{a}}ngd{\={o}}ng, China}},
key = {FTW2011MBAOOSFIL},
},| Metadata: | http://www.it-weise.de/documents/metaFTW2011MBAOOSFIL.html |