Abstract
In this paper, we propose a novel matching strategy to correct for confounding in uplift modeling. Our method, called propensity score oversampling and matching (ProSOM), extends the well-known propensity score matching (PSM) technique by addressing one of its main limitations: dealing with small datasets that face an imbalance in the distribution of the causal variable. Apart from this, we also face the additional complexity of dealing with class labels. The proposed method establishes a parallel between uplift modeling and class-imbalance classification as it extends existing oversampling techniques to create synthetic elements from the treatment group. We design an algorithm that performs classaware data oversampling in the treatment group, and then it matches samples from this group with the control group. This can be seen as a novel hybrid undersampling-oversampling solution for causal learning. Experiments on five datasets show the virtues of ProSOM in terms of predictive performance, achieving the best Qini coefficient for all five datasets in relation to PSM and other resampling solutions.
Original language | English |
---|---|
Pages (from-to) | 1-12 |
Number of pages | 12 |
Journal | European Journal of Operational Research |
Volume | 316 |
Issue number | 3 |
DOIs | |
State | Published - 1 Aug 2024 |
Bibliographical note
Publisher Copyright:© 2024 Elsevier B.V.
Keywords
- Analytics
- Data resampling
- Oversampling
- Propensity score matching
- Uplift modeling