Dataset shift is a relevant topic in unsupervised learning since many applications face evolving environments, causing an important loss of generalization and performance. Most techniques that deal with this issue are designed for data stream clustering, whose goal is to process sequences of data efficiently under Big Data. In this study, we claim dataset shift is an issue for static clustering tasks in which data is collected over a long period. To mitigate it, we propose Time-weighted kernel k-means, a k-means variant that includes a time-dependent weighting process. We do this via the induced ordered weighted average (IOWA) operator. The weighting process acts as a gradual forgetting mechanism, prioritizing recent examples over outdated ones in the clustering algorithm. The computational experiments show the potential Time-weighted kernel k-means has in evolving environments.
Bibliographical noteFunding Information:
The authors gratefully acknowledge financial support from ANID PIA/BASAL, grants AFB180003 and FB0008; FONDECYT-Chile, grants 1200221 (Sebastián Maldonado), 1180685 and 11220510 (José Delpiano), and 12200007 (Carla Vairetti); and Fondo de Ayuda a la Investigacion (FAI), Universidad de Los Andes. The authors are grateful to the anonymous reviewers who contributed to improving the quality of the original paper.
© 2022 Elsevier Ltd
- Dataset shift
- Induced ordered weighted average
- Kernel k-means
- OWA operators