Mitigating the effect of dataset shift in clustering

Sebastián Maldonado, Ramiro Saltos, Carla Vairetti*, José Delpiano

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

Dataset shift is a relevant topic in unsupervised learning since many applications face evolving environments, causing an important loss of generalization and performance. Most techniques that deal with this issue are designed for data stream clustering, whose goal is to process sequences of data efficiently under Big Data. In this study, we claim dataset shift is an issue for static clustering tasks in which data is collected over a long period. To mitigate it, we propose Time-weighted kernel k-means, a k-means variant that includes a time-dependent weighting process. We do this via the induced ordered weighted average (IOWA) operator. The weighting process acts as a gradual forgetting mechanism, prioritizing recent examples over outdated ones in the clustering algorithm. The computational experiments show the potential Time-weighted kernel k-means has in evolving environments.

Original languageEnglish
Article number109058
Pages (from-to)109058
JournalPattern Recognition
Volume134
DOIs
StatePublished - Feb 2023

Bibliographical note

Funding Information:
The authors gratefully acknowledge financial support from ANID PIA/BASAL, grants AFB180003 and FB0008; FONDECYT-Chile, grants 1200221 (Sebastián Maldonado), 1180685 and 11220510 (José Delpiano), and 12200007 (Carla Vairetti); and Fondo de Ayuda a la Investigacion (FAI), Universidad de Los Andes. The authors are grateful to the anonymous reviewers who contributed to improving the quality of the original paper.

Publisher Copyright:
© 2022 Elsevier Ltd

Keywords

  • Clustering
  • Dataset shift
  • Induced ordered weighted average
  • Kernel k-means
  • OWA operators

Fingerprint

Dive into the research topics of 'Mitigating the effect of dataset shift in clustering'. Together they form a unique fingerprint.

Cite this