IDEAS home Printed from https://ideas.repec.org/a/gam/jstats/v8y2025i3p53-d1689871.html
   My bibliography  Save this article

Distance-Based Relevance Function for Imbalanced Regression

Author

Listed:
  • Daniel Daeyoung In

    (Department of Statistics and Data Science, Yonsei University, Seoul 03722, Republic of Korea)

  • Hyunjoong Kim

    (Department of Statistics and Data Science, Yonsei University, Seoul 03722, Republic of Korea)

Abstract

Imbalanced regression poses a significant challenge in real-world prediction tasks, where rare target values are prone to overfitting during model training. To address this, prior research has employed relevance functions to quantify the rarity of target instances. However, existing functions often struggle to capture the rarity across diverse target distributions. In this study, we introduce a novel Distance-based Relevance Function (DRF) that quantifies the rarity based on the distance between target values, enabling a more accurate and distribution-agnostic assessment of rare data. This general approach allows imbalanced regression techniques to be effectively applied to a broader range of distributions, including bimodal cases. We evaluate the proposed DRF using Mean Squared Error (MSE), relevance-weighted Mean Absolute Error ( MAE ϕ ), and Symmetric Mean Absolute Percentage Error (SMAPE). Empirical studies on synthetic datasets and 18 real-world datasets demonstrate that DRF tends to improve the performance across various machine learning models, including support vector regression, neural networks, XGBoost, and random forests. These findings suggest that DRF offers a promising direction for rare target detection and broadens the applicability of imbalanced regression methods.

Suggested Citation

  • Daniel Daeyoung In & Hyunjoong Kim, 2025. "Distance-Based Relevance Function for Imbalanced Regression," Stats, MDPI, vol. 8(3), pages 1-14, June.
  • Handle: RePEc:gam:jstats:v:8:y:2025:i:3:p:53-:d:1689871
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2571-905X/8/3/53/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2571-905X/8/3/53/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. King, Gary & Zeng, Langche, 2001. "Logistic Regression in Rare Events Data," Political Analysis, Cambridge University Press, vol. 9(2), pages 137-163, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. F. Gauthier & D. Germain & B. Hétu, 2017. "Logistic models as a forecasting tool for snow avalanches in a cold maritime climate: northern Gaspésie, Québec, Canada," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 89(1), pages 201-232, October.
    2. Douglas Cumming & Lars Hornuf & Moein Karami & Denis Schweizer, 2023. "Disentangling Crowdfunding from Fraudfunding," Journal of Business Ethics, Springer, vol. 182(4), pages 1103-1128, February.
    3. Cristian David Correa-Álvarez & Juan Carlos Salazar-Uribe & Luis Raúl Pericchi-Guerra, 2023. "Bayesian multilevel logistic regression models: a case study applied to the results of two questionnaires administered to university students," Computational Statistics, Springer, vol. 38(4), pages 1791-1810, December.
    4. Eunae Yoo & Elliot Rabinovich & Bin Gu, 2020. "The Growth of Follower Networks on Social Media Platforms for Humanitarian Operations," Production and Operations Management, Production and Operations Management Society, vol. 29(12), pages 2696-2715, December.
    5. Lo Turco, Alessia & Maggioni, Daniela, 2018. "Effects of Islamic religiosity on bilateral trust in trade: The case of Turkish exports," Journal of Comparative Economics, Elsevier, vol. 46(4), pages 947-965.
    6. Blackman, Allen & Guerrero, Santiago, 2012. "What drives voluntary eco-certification in Mexico?," Journal of Comparative Economics, Elsevier, vol. 40(2), pages 256-268.
    7. Alessandra Iannamorelli & Stefano Nobili & Antonio Scalia & Luana Zaccaria, 2024. "Asymmetric Information and Corporate Lending: Evidence from SME Bond Markets," Review of Finance, European Finance Association, vol. 28(1), pages 163-201.
    8. Mehrez Ben Slama & Dhafer Saidane & Hassouna Fedhila, 2012. "How to identify targets in the M&A banking operations? Case of cross-border strategies in Europe by line of activity," Review of Quantitative Finance and Accounting, Springer, vol. 38(2), pages 209-240, February.
    9. Lorenzo Cassi & Anne Plunket, 2014. "Proximity, network formation and inventive performance: in search of the proximity paradox," The Annals of Regional Science, Springer;Western Regional Science Association, vol. 53(2), pages 395-422, September.
    10. Xinfu Xing & Chenglong Wu & Jinhui Li & Xueyou Li & Limin Zhang & Rongjie He, 2021. "Susceptibility assessment for rainfall-induced landslides using a revised logistic regression method," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 106(1), pages 97-117, March.
    11. Hwang, Seokyoun & Sarath, Bharat & Han, Seung-youb, 2022. "Auditor independence: The effect of auditors’ quality control efforts and corporate governance," Journal of International Accounting, Auditing and Taxation, Elsevier, vol. 47(C).
    12. Lahiri, Kajal & Yang, Liu, 2013. "Forecasting Binary Outcomes," Handbook of Economic Forecasting, in: G. Elliott & C. Granger & A. Timmermann (ed.), Handbook of Economic Forecasting, edition 1, volume 2, chapter 0, pages 1025-1106, Elsevier.
    13. Eling, Martin & Jia, Ruo, 2018. "Business failure, efficiency, and volatility: Evidence from the European insurance industry," International Review of Financial Analysis, Elsevier, vol. 59(C), pages 58-76.
    14. Andrews, RJ & Fazio, Catherine & Guzman, Jorge & Liu, Yupeng & Stern, Scott, 2022. "The Startup Cartography Project: Measuring and mapping entrepreneurial ecosystems," Research Policy, Elsevier, vol. 51(2).
    15. Tom Broekel & Wladimir Mueller, 2018. "Critical links in knowledge networks – What about proximities and gatekeeper organisations?," Industry and Innovation, Taylor & Francis Journals, vol. 25(10), pages 919-939, November.
    16. Claudio Lucifora & Daria Vigani, 2022. "What if your boss is a woman? Evidence on gender discrimination at the workplace," Review of Economics of the Household, Springer, vol. 20(2), pages 389-417, June.
    17. Valérie Revest & Alessandro Sapio, 2016. "Graduation and sell-out strategies in the Alternative Investment Market," Discussion Papers 4_2016, CRISEI, University of Naples "Parthenope", Italy.
    18. Andy Lardon & Marc Deloof, 2014. "Financial disclosure by SMEs listed on a semi-regulated market: evidence from the Euronext Free Market," Small Business Economics, Springer, vol. 42(2), pages 361-385, February.
    19. Sarlin, Peter & von Schweinitz, Gregor, 2021. "Optimizing Policymakers’ Loss Functions In Crisis Prediction: Before, Within Or After?," Macroeconomic Dynamics, Cambridge University Press, vol. 25(1), pages 100-123, January.
    20. Paul Collier & Anke Hoeffler, 2004. "Greed and grievance in civil war," Oxford Economic Papers, Oxford University Press, vol. 56(4), pages 563-595, October.

    More about this item

    Keywords

    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jstats:v:8:y:2025:i:3:p:53-:d:1689871. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.