IDEAS home Printed from https://ideas.repec.org/a/spr/advdac/v17y2023i4d10.1007_s11634-022-00533-3.html
   My bibliography  Save this article

Robust instance-dependent cost-sensitive classification

Author

Listed:
  • Simon De Vos

    (KU Leuven)

  • Toon Vanderschueren

    (KU Leuven)

  • Tim Verdonck

    (University of Antwerp
    KU Leuven)

  • Wouter Verbeke

    (KU Leuven)

Abstract

Instance-dependent cost-sensitive (IDCS) learning methods have proven useful for binary classification tasks where individual instances are associated with variable misclassification costs. However, we demonstrate in this paper by means of a series of experiments that IDCS methods are sensitive to noise and outliers in relation to instance-dependent misclassification costs and their performance strongly depends on the cost distribution of the data sample. Therefore, we propose a generic three-step framework to make IDCS methods more robust: (i) detect outliers automatically, (ii) correct outlying cost information in a data-driven way, and (iii) construct an IDCS learning method using the adjusted cost information. We apply this framework to cslogit, a logistic regression-based IDCS method, to obtain its robust version, which we name r-cslogit. The robustness of this approach is introduced in steps (i) and (ii), where we make use of robust estimators to detect and impute outlying costs of individual instances. The newly proposed r-cslogit method is tested on synthetic and semi-synthetic data and proven to be superior in terms of savings compared to its non-robust counterpart for variable levels of noise and outliers. All our code is made available online at https://github.com/SimonDeVos/Robust-IDCS .

Suggested Citation

  • Simon De Vos & Toon Vanderschueren & Tim Verdonck & Wouter Verbeke, 2023. "Robust instance-dependent cost-sensitive classification," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 17(4), pages 1057-1079, December.
  • Handle: RePEc:spr:advdac:v:17:y:2023:i:4:d:10.1007_s11634-022-00533-3
    DOI: 10.1007/s11634-022-00533-3
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11634-022-00533-3
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11634-022-00533-3?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Howard D. Bondell, 2005. "Minimum distance estimation for the logistic regression model," Biometrika, Biometrika Trust, vol. 92(3), pages 724-731, September.
    2. Nikola Štefelová & Andreas Alfons & Javier Palarea-Albaladejo & Peter Filzmoser & Karel Hron, 2021. "Robust regression with compositional covariates including cellwise outliers," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 15(4), pages 869-909, December.
    3. Bergesio, Andrea & Yohai, Victor J., 2011. "Projection Estimators for Generalized Linear Models," Journal of the American Statistical Association, American Statistical Association, vol. 106(494), pages 661-671.
    4. Höppner, Sebastiaan & Baesens, Bart & Verbeke, Wouter & Verdonck, Tim, 2022. "Instance-dependent cost-sensitive learning for detecting transfer fraud," European Journal of Operational Research, Elsevier, vol. 297(1), pages 291-300.
    5. George Petrides & Darie Moldovan & Lize Coenen & Tias Guns & Wouter Verbeke, 2022. "Cost-sensitive learning for profit-driven credit scoring," Journal of the Operational Research Society, Taylor & Francis Journals, vol. 73(2), pages 338-350, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Li, Zhe & Liang, Shuguang & Pan, Xianyou & Pang, Meng, 2024. "Credit risk prediction based on loan profit: Evidence from Chinese SMEs," Research in International Business and Finance, Elsevier, vol. 67(PA).
    2. Xing, Jin & Chi, Guotai & Pan, Ancheng, 2024. "Instance-dependent misclassification cost-sensitive learning for default prediction," Research in International Business and Finance, Elsevier, vol. 69(C).
    3. Bianco, Ana M. & Martínez, Elena, 2009. "Robust testing in the logistic regression model," Computational Statistics & Data Analysis, Elsevier, vol. 53(12), pages 4095-4105, October.
    4. Verbeke, Wouter & Olaya, Diego & Guerry, Marie-Anne & Van Belle, Jente, 2023. "To do or not to do? Cost-sensitive causal classification with individual treatment effect estimates," European Journal of Operational Research, Elsevier, vol. 305(2), pages 838-852.
    5. Ollinger, Michael & Houser, Matthew, 2020. "Ground beef recalls and subsequent food safety performance," Food Policy, Elsevier, vol. 97(C).
    6. Linke, Yuliana Yu., 2017. "Asymptotic normality of one-step M-estimators based on non-identically distributed observations," Statistics & Probability Letters, Elsevier, vol. 129(C), pages 216-221.
    7. Chi, Guotai & Dong, Bingjie & Zhou, Ying & Jin, Peng, 2024. "Long-horizon predictions of credit default with inconsistent customers," Technological Forecasting and Social Change, Elsevier, vol. 198(C).
    8. De Bock, Koen W. & Coussement, Kristof & Caigny, Arno De & Słowiński, Roman & Baesens, Bart & Boute, Robert N. & Choi, Tsan-Ming & Delen, Dursun & Kraus, Mathias & Lessmann, Stefan & Maldonado, Sebast, 2024. "Explainable AI for Operational Research: A defining framework, methods, applications, and a research agenda," European Journal of Operational Research, Elsevier, vol. 317(2), pages 249-272.
    9. Koen W. de Bock & Kristof Coussement & Arno De Caigny & Roman Slowiński & Bart Baesens & Robert N Boute & Tsan-Ming Choi & Dursun Delen & Mathias Kraus & Stefan Lessmann & Sebastián Maldonado & David , 2023. "Explainable AI for Operational Research: A Defining Framework, Methods, Applications, and a Research Agenda," Post-Print hal-04219546, HAL.
    10. Geng, Pei & Sakhanenko, Lyudmila, 2016. "Parameter estimation for the logistic regression model under case-control study," Statistics & Probability Letters, Elsevier, vol. 109(C), pages 168-177.
    11. Ana M. Bianco & Graciela Boente & Gonzalo Chebi, 2022. "Penalized robust estimators in sparse logistic regression," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 31(3), pages 563-594, September.
    12. Agostinelli, Claudio & Valdora, Marina & Yohai, Victor J., 2019. "Initial robust estimation in generalized linear models," Computational Statistics & Data Analysis, Elsevier, vol. 134(C), pages 144-156.
    13. Pei Geng & Huyen Nguyen, 2024. "Parameter estimation for Logistic errors-in-variables regression under case–control studies," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 33(2), pages 661-684, April.
    14. Yashon O. Ouma & Ditiro B. Moalafhi & George Anderson & Boipuso Nkwae & Phillimon Odirile & Bhagabat P. Parida & Jiaguo Qi, 2022. "Dam Water Level Prediction Using Vector AutoRegression, Random Forest Regression and MLP-ANN Models Based on Land-Use and Climate Factors," Sustainability, MDPI, vol. 14(22), pages 1-31, November.
    15. Félix Vandervorst & Wouter Verbeke & Tim Verdonck, 2024. "Claims fraud detection with uncertain labels," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 18(1), pages 219-243, March.
    16. Ostrovski, Vladimir, 2022. "Testing equivalence to binary generalized linear models with application to logistic regression," Statistics & Probability Letters, Elsevier, vol. 191(C).
    17. Dries Cornilly & Lise Tubex & Stefan Van Aelst & Tim Verdonck, 2024. "Robust and sparse logistic regression," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 18(3), pages 663-679, September.
    18. Bianco, Ana M. & Boente, Graciela & Rodrigues, Isabel M., 2013. "Resistant estimators in Poisson and Gamma models with missing responses and an application to outlier detection," Journal of Multivariate Analysis, Elsevier, vol. 114(C), pages 209-226.
    19. Diao Guoqing & Ning Jing & qin jing, 2012. "Maximum Likelihood Estimation for Semiparametric Density Ratio Model," The International Journal of Biostatistics, De Gruyter, vol. 8(1), pages 1-29, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:advdac:v:17:y:2023:i:4:d:10.1007_s11634-022-00533-3. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.