IDEAS home Printed from https://ideas.repec.org/a/spr/jclass/v40y2023i1d10.1007_s00357-023-09431-5.html
   My bibliography  Save this article

Finding the Proverbial Needle: Improving Minority Class Identification Under Extreme Class Imbalance

Author

Listed:
  • Trent Geisler

    (United States Military Academy)

  • Herman Ray

    (Kennesaw State University)

  • Ying Xie

    (Kennesaw State University)

Abstract

Imbalanced learning problems typically consist of data with skewed class distributions, coupled with large misclassification costs for the rare events. For binary classification, logistic regression is a common supervised learning technique chosen to perform this task. Unfortunately, the model performs poorly on classification tasks when class distributions are highly imbalanced. To improve this generalization, we implement a novel instance-level weighting methodology for the minority class in the loss function. We build our method from a recently published, locally weighted log-likelihood objective function, where each of the minority class weights are learned from the data. We improve upon this previous approach by creating a convex and hyperparameter-free loss function that improves generalization performance for datasets exhibiting extreme class imbalance.

Suggested Citation

  • Trent Geisler & Herman Ray & Ying Xie, 2023. "Finding the Proverbial Needle: Improving Minority Class Identification Under Extreme Class Imbalance," Journal of Classification, Springer;The Classification Society, vol. 40(1), pages 192-212, April.
  • Handle: RePEc:spr:jclass:v:40:y:2023:i:1:d:10.1007_s00357-023-09431-5
    DOI: 10.1007/s00357-023-09431-5
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00357-023-09431-5
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00357-023-09431-5?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. King, Gary & Zeng, Langche, 2001. "Logistic Regression in Rare Events Data," Political Analysis, Cambridge University Press, vol. 9(2), pages 137-163, January.
    2. Manski, Charles F & Lerman, Steven R, 1977. "The Estimation of Choice Probabilities from Choice Based Samples," Econometrica, Econometric Society, vol. 45(8), pages 1977-1988, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lahiri, Kajal & Yang, Liu, 2013. "Forecasting Binary Outcomes," Handbook of Economic Forecasting, in: G. Elliott & C. Granger & A. Timmermann (ed.), Handbook of Economic Forecasting, edition 1, volume 2, chapter 0, pages 1025-1106, Elsevier.
    2. Sarlin, Peter & von Schweinitz, Gregor, 2021. "Optimizing Policymakers’ Loss Functions In Crisis Prediction: Before, Within Or After?," Macroeconomic Dynamics, Cambridge University Press, vol. 25(1), pages 100-123, January.
    3. Springborn, Michael & Romagosa, Christina M. & Keller, Reuben P., 2011. "The value of nonindigenous species risk assessment in international trade," Ecological Economics, Elsevier, vol. 70(11), pages 2145-2153, September.
    4. Richard Fabling & Arthur Grimes & Lynda Sanderson, 2012. "Whatever next? Export market choices of New Zealand firms," Papers in Regional Science, Wiley Blackwell, vol. 91(1), pages 137-159, March.
    5. Stock, Ruth Maria & von Hippel, Eric & Gillert, Nils Lennart, 2016. "Impacts of personality traits on consumer innovation success," Research Policy, Elsevier, vol. 45(4), pages 757-769.
    6. Maalouf, Maher & Trafalis, Theodore B., 2011. "Robust weighted kernel logistic regression in imbalanced and rare events data," Computational Statistics & Data Analysis, Elsevier, vol. 55(1), pages 168-183, January.
    7. Jonathan P. Kastellec & Jeffrey R. Lax, 2008. "Case Selection and the Study of Judicial Politics," Journal of Empirical Legal Studies, John Wiley & Sons, vol. 5(3), pages 407-446, September.
    8. Carole Comerton-Forde & Tālis J. Putniņš, 2014. "Stock Price Manipulation: Prevalence and Determinants," Review of Finance, European Finance Association, vol. 18(1), pages 23-66.
    9. Jiang, Guoliang F. & Holburn, Guy L.F. & Beamish, Paul W., 2014. "The Impact of Vicarious Experience on Foreign Location Strategy," Journal of International Management, Elsevier, vol. 20(3), pages 345-358.
    10. Saarimaa, Tuukka & Tukiainen, Janne, 2012. "Politics in coalition formation of local governments," LSE Research Online Documents on Economics 58528, London School of Economics and Political Science, LSE Library.
    11. Miyakawa, Daisuke & Oikawa, Koki & Ueda, Kozo, 2021. "Firm Exit during the COVID-19 Pandemic: Evidence from Japan," Journal of the Japanese and International Economies, Elsevier, vol. 59(C).
    12. Alfredo Jiménez & David Fuente, 2016. "Learning from Others: the Impact of Vicarious Experience on the Psychic Distance and FDI Relationship," Management International Review, Springer, vol. 56(5), pages 633-664, October.
    13. Avanzini, Diego & Martı́nez, Juan Francisco & Pérez, Vı́ctor, 2020. "Assessing mortgage default risk in full-recourse economies, with an application to the case of Chile," Latin American Journal of Central Banking (previously Monetaria), Elsevier, vol. 1(1).
    14. Jasjit Singh, 2005. "Collaborative Networks as Determinants of Knowledge Diffusion Patterns," Management Science, INFORMS, vol. 51(5), pages 756-770, May.
    15. Ahmed, M.S. & Attouch, M.K. & Dabo-Niang, S., 2018. "Binary functional linear models under choice-based sampling," Econometrics and Statistics, Elsevier, vol. 7(C), pages 134-152.
    16. Carpenter, Jeffrey & Myers, Caitlin Knowles, 2010. "Why volunteer? Evidence on the role of altruism, image, and incentives," Journal of Public Economics, Elsevier, vol. 94(11-12), pages 911-920, December.
    17. Olav Sorenson & Jan W. Rivkin & Lee Fleming, 2010. "Complexity, Networks and Knowledge Flows," Chapters, in: Ron Boschma & Ron Martin (ed.), The Handbook of Evolutionary Economic Geography, chapter 15, Edward Elgar Publishing.
    18. Antonia Köster & Christian Matt & Thomas Hess, 2021. "Do All Roads Lead to Rome? Exploring the Relationship Between Social Referrals, Referral Propensity and Stickiness to Video-on-Demand Websites," Business & Information Systems Engineering: The International Journal of WIRTSCHAFTSINFORMATIK, Springer;Gesellschaft für Informatik e.V. (GI), vol. 63(4), pages 349-366, August.
    19. Abhirup Chakrabarti & Will Mitchell, 2013. "The Persistent Effect of Geographic Distance in Acquisition Target Selection," Organization Science, INFORMS, vol. 24(6), pages 1805-1826, December.
    20. Jungpil Hahn & Jae Yun Moon & Chen Zhang, 2008. "Emergence of New Project Teams from Open Source Software Developer Networks: Impact of Prior Collaboration Ties," Information Systems Research, INFORMS, vol. 19(3), pages 369-391, September.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:jclass:v:40:y:2023:i:1:d:10.1007_s00357-023-09431-5. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.