FW-SMOTE: A feature-weighted oversampling approach for imbalanced classification

Sebastián Maldonado, Carla Vairetti*, Alberto Fernandez, Francisco Herrera

*Autor correspondiente de este trabajo

Producción científica: Contribución a una revistaArtículorevisión exhaustiva

91 Citas (Scopus)

Resumen

The Synthetic Minority Over-sampling Technique (SMOTE) is a well-known resampling strategy that has been successfully used for dealing with the class-imbalance problem, one of the most challenging pattern recognition tasks in the last two decades. In this work, we claim that SMOTE has an important issue when defining the neighborhood in order to create new minority samples: the use of the Euclidean distance may not be suitable in high-dimensional settings. Our hypothesis is that the use of a weighted metric that does not assume that all features are equally important could improve performance in the presence of noisy/redundant variables. In this line, we present a novel SMOTE-like method that uses the weighted Minkowski distance for defining the neighborhood for each example of the minority class. This methodology leads to a better definition of the neighborhood since it prioritizes those features that are more relevant for the classification task. A complementary advantage of the proposal is performing feature selection since attributes can be discarded when their corresponding weights are below a given threshold. Our experiments on 42 class-imbalance datasets show the virtues of the proposed SMOTE variant, achieving the best predictive performance when compared with the traditional SMOTE approach and other recent variants on low- and high-dimensional settings, handling issues such as class overlap and hubness adequately without increasing the complexity of the method.
Idioma originalInglés
Número de artículo108511
PublicaciónPattern Recognition
Volumen124
DOI
EstadoPublicada - abr. 2022

Nota bibliográfica

Publisher Copyright:
© 2021 Elsevier Ltd

Palabras clave

  • Data resampling
  • Feature selection
  • Imbalanced data classification
  • OWA Operators
  • SMOTE

Huella

Profundice en los temas de investigación de 'FW-SMOTE: A feature-weighted oversampling approach for imbalanced classification'. En conjunto forman una huella única.

Citar esto