print print


Margin-Based Over-Sampling Method for Learning From Imbalanced Datasets

Authors

Authors: Xiānniàn Fàn [范先念], Kē Táng [唐珂], and Thomas Weise

Abstract

Learning from imbalanced datasets has drawn more and more attentions in both theoretical and practical aspects. Over-sampling is a popular and simple method for imbalanced learning. In this paper, we show that there is an inherently potential risk associated with the over-sampling algorithms in terms of the large margin principle. Then we propose a new synthetic over sampling method, named Margin-guided Synthetic Over-sampling (MSYN), to reduce this risk. The MSYN improves learning with respect to the data distributions guided by the margin-based rule. Empirical study verities the proposed analysis.

Keywords

Classification, Overfitting, Machine Learning, ML, Mining of Imbalanced Data Sets, Oversampling

BibTeX

@inproceedings{FTW2011MBAOOSFIL,
  author                    = {Xi{\={a}}nni{\`{a}}n F{\`{a}}n and K{\={e}} T{\'{a}}ng and Thomas Weise},
  title                     = {{Margin-Based Over-Sampling Method for Learning From Imbalanced Datasets}},
  booktitle                 = {Proceedings of the 15th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'11)},
  editor                    = {Joshua (Zhexue) Huang and L{\'{o}}ngb{\={\i}}ng C{\={a}}o and Jaideep Srivastava},
  publisher                 = {{Springer-Verlag GmbH: {Berlin, Germany}}},
  series                    = {Lecture Notes in Computer Science (LNCS)},
  year                      = {2011},
  location                  = {{Sh{\={e}}nzh{\`{e}}n, Gu{\v{a}}ngd{\={o}}ng, China}},
  key                       = {FTW2011MBAOOSFIL},
},

Links

Metadata: http://www.it-weise.de/documents/metaFTW2011MBAOOSFIL.html

back to the publication