IDEAS home Printed from https://ideas.repec.org/a/cup/polals/v24y2016i1p87-103_9.html
   My bibliography  Save this article

Comparing Random Forest with Logistic Regression for Predicting Class-Imbalanced Civil War Onset Data

Author

Listed:
  • Muchlinski, David
  • Siroky, David
  • He, Jingrui
  • Kocher, Matthew

Abstract

The most commonly used statistical models of civil war onset fail to correctly predict most occurrences of this rare event in out-of-sample data. Statistical methods for the analysis of binary data, such as logistic regression, even in their rare event and regularized forms, perform poorly at prediction. We compare the performance of Random Forests with three versions of logistic regression (classic logistic regression, Firth rare events logistic regression, and L 1-regularized logistic regression), and find that the algorithmic approach provides significantly more accurate predictions of civil war onset in out-of-sample data than any of the logistic regression models. The article discusses these results and the ways in which algorithmic statistical methods like Random Forests can be useful to more accurately predict rare events in conflict data.

Suggested Citation

  • Muchlinski, David & Siroky, David & He, Jingrui & Kocher, Matthew, 2016. "Comparing Random Forest with Logistic Regression for Predicting Class-Imbalanced Civil War Onset Data," Political Analysis, Cambridge University Press, vol. 24(1), pages 87-103, January.
  • Handle: RePEc:cup:polals:v:24:y:2016:i:1:p:87-103_9
    as

    Download full text from publisher

    File URL: https://www.cambridge.org/core/product/identifier/S1047198700012055/type/journal_article
    File Function: link to article abstract page
    Download Restriction: no
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Songul Cinaroglu, 2020. "Modelling unbalanced catastrophic health expenditure data by using machine‐learning methods," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 27(4), pages 168-181, October.
    2. Liam F. Beiser-McGrath & Robert A. Huber, 2018. "Assessing the relative importance of psychological and demographic factors for predicting climate and environmental attitudes," Climatic Change, Springer, vol. 149(3), pages 335-347, August.
    3. David Siroky & Carolyn M. Warner & Gabrielle Filip-Crawford & Anna Berlin & Steven L. Neuberg, 2020. "Grievances and rebellion: Comparing relative deprivation and horizontal inequality," Conflict Management and Peace Science, Peace Science Society (International), vol. 37(6), pages 694-715, November.
    4. Abdel Latef Anouze & Imad Bou-Hamad, 2021. "Inefficiency source tracking: evidence from data envelopment analysis and random forests," Annals of Operations Research, Springer, vol. 306(1), pages 273-293, November.
    5. Mark Musumba & Naureen Fatema & Shahriar Kibriya, 2021. "Prevention Is Better Than Cure: Machine Learning Approach to Conflict Prediction in Sub-Saharan Africa," Sustainability, MDPI, vol. 13(13), pages 1-18, July.
    6. Zhaochen He & John Camobreco & Keith Perkins, 2022. "How he won: Using machine learning to understand Trump’s 2016 victory," Journal of Computational Social Science, Springer, vol. 5(1), pages 905-947, May.
    7. Gallego, Jorge & Rivero, Gonzalo & Martínez, Juan, 2021. "Preventing rather than punishing: An early warning model of malfeasance in public procurement," International Journal of Forecasting, Elsevier, vol. 37(1), pages 360-377.
    8. John Cuffe & Sudip Bhattacharjee & Ugochukwu Etudo & Justin C. Smith & Nevada Basdeo & Nathaniel Burbank & Shawn R. Roberts, 2019. "Using Public Data to Generate Industrial Classification Codes," NBER Chapters, in: Big Data for Twenty-First-Century Economic Statistics, pages 229-246, National Bureau of Economic Research, Inc.
    9. Hofman, Jake M. & Goldstein, Daniel G. & Sen, Siddhartha & Poursabzi-Sangdeh, Forough & Allen, Jennifer & Dong, Ling Liang & Fried, Brenda & Gaur, Harpreet & Hoq, Adnan & Mbazor, Emeka & Moreira, Naom, 2021. "Expanding the scope of reproducibility research through data analysis replications," Organizational Behavior and Human Decision Processes, Elsevier, vol. 164(C), pages 192-202.
    10. Ku, Arthur Lin & Qiu, Yueming (Lucy) & Lou, Jiehong & Nock, Destenie & Xing, Bo, 2022. "Changes in hourly electricity consumption under COVID mandates: A glance to future hourly residential power consumption pattern with remote work in Arizona," Applied Energy, Elsevier, vol. 310(C).
    11. Antonietta di Salvatore & Mirko Moscatelli, 2024. "Improving survey information on household debt using granular credit databases," Questioni di Economia e Finanza (Occasional Papers) 839, Bank of Italy, Economic Research and International Relations Area.
    12. Phil Henrickson, 2020. "Predicting the costs of war," The Journal of Defense Modeling and Simulation, , vol. 17(3), pages 285-308, July.
    13. Vestby, Jonas & Buhaug, Halvard & von Uexkull, Nina, 2021. "Why do some poor countries see armed conflict while others do not? A dual sector approach," World Development, Elsevier, vol. 138(C).
    14. Marie K. Schellens & Salim Belyazid, 2020. "Revisiting the Contested Role of Natural Resources in Violent Conflict Risk through Machine Learning," Sustainability, MDPI, vol. 12(16), pages 1-29, August.
    15. Güneş Murat Tezcür & Clayton Besaw, 2020. "Jihadist waves: Syria, the Islamic State, and the changing nature of foreign fighters," Conflict Management and Peace Science, Peace Science Society (International), vol. 37(2), pages 215-231, March.
    16. Felix Ettensperger, 2020. "Comparing supervised learning algorithms and artificial neural networks for conflict prediction: performance and applicability of deep learning in the field," Quality & Quantity: International Journal of Methodology, Springer, vol. 54(2), pages 567-601, April.
    17. Freire, Danilo, 2021. "Democratizing Policy Analytics with AutoML," Working Papers 11015, George Mason University, Mercatus Center.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:cup:polals:v:24:y:2016:i:1:p:87-103_9. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Kirk Stebbing (email available below). General contact details of provider: https://www.cambridge.org/pan .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.