The predictive performance of classification methods relies heavily on the nature of the environment, as in the joint distribution of inputs and outputs may evolve over time. This issue is known as dataset shift. Given that most statistical and machine learning techniques assume that the training sample is drawn from the same distribution as the test data used for evaluation, an appreciable amount of researchers and practitioners tend to ignore this issue at the model construction stage. In this paper, we propose a novel Fuzzy Support Vector Machine strategy, in which the traditional hinge loss function is redefined to account for dataset shift. Additionally, we propose a general version of this loss function applying aggregation operators in order to improve performance by dealing with dataset shift via fuzzy logic. Originally developed as linear approaches, our proposals are extended to kernel-based classification for non-linear machine learning. Our methods are able to perform best compared to traditional classifiers in terms of out-of-time prediction using simulated and real-world dataset for credit scoring, confirming the theoretical virtues of our approach.
Bibliographical noteFunding Information:
The authors gratefully acknowledge financial support from ANID PIA/BASAL AFB180003 and FONDECYT-Chile, grants 1200221 (Sebastián Maldonado), 1201403 (Julio López), and 12200007 (Carla Vairetti). The authors would like to thank the anonymous reviewers for their valuable comments and suggestions for improving the quality of the paper.
© 2021 Elsevier Inc.
Copyright 2021 Elsevier B.V., All rights reserved.
- Credit scoring
- Dataset shift
- Fuzzy support vector machines
- OWA operators
- Support vector machines