The predictive performance of classification methods relies heavily on the nature of the environment, as in the joint distribution of inputs and outputs may evolve over time. This issue is known as dataset shift. Given that most statistical and machine learning techniques assume that the training sample is drawn from the same distribution as the test data used for evaluation, an appreciable amount of researchers and practitioners tend to ignore this issue at the model construction stage. In this paper, we propose a novel Fuzzy Support Vector Machine strategy, in which the traditional hinge loss function is redefined to account for dataset shift. Additionally, we propose a general version of this loss function applying aggregation operators in order to improve performance by dealing with dataset shift via fuzzy logic. Originally developed as linear approaches, our proposals are extended to kernel-based classification for non-linear machine learning. Our methods are able to perform best compared to traditional classifiers in terms of out-of-time prediction using simulated and real-world dataset for credit scoring, confirming the theoretical virtues of our approach.
- Credit scoring
- Dataset shift
- Fuzzy support vector machines
- OWA operators
- Support vector machines