IDEAS home Printed from https://ideas.repec.org/a/gam/jijerp/v16y2019i23p4658-d289973.html
   My bibliography  Save this article

Bring More Data!—A Good Advice? Removing Separation in Logistic Regression by Increasing Sample Size

Author

Listed:
  • Hana Šinkovec

    (Institute of Clinical Biometrics, Center for Medical Statistics, Informatics and Intelligent Systems (CEMSIIS), Spitalgasse 23, 1090 Vienna, Austria)

  • Angelika Geroldinger

    (Institute of Clinical Biometrics, Center for Medical Statistics, Informatics and Intelligent Systems (CEMSIIS), Spitalgasse 23, 1090 Vienna, Austria)

  • Georg Heinze

    (Institute of Clinical Biometrics, Center for Medical Statistics, Informatics and Intelligent Systems (CEMSIIS), Spitalgasse 23, 1090 Vienna, Austria)

Abstract

The parameters of logistic regression models are usually obtained by the method of maximum likelihood (ML). However, in analyses of small data sets or data sets with unbalanced outcomes or exposures, ML parameter estimates may not exist. This situation has been termed ‘separation’ as the two outcome groups are separated by the values of a covariate or a linear combination of covariates. To overcome the problem of non-existing ML parameter estimates, applying Firth’s correction (FC) was proposed. In practice, however, a principal investigator might be advised to ‘bring more data’ in order to solve a separation issue. We illustrate the problem by means of examples from colorectal cancer screening and ornithology. It is unclear if such an increasing sample size (ISS) strategy that keeps sampling new observations until separation is removed improves estimation compared to applying FC to the original data set. We performed an extensive simulation study where the main focus was to estimate the cost-adjusted relative efficiency of ML combined with ISS compared to FC. FC yielded reasonably small root mean squared errors and proved to be the more efficient estimator. Given our findings, we propose not to adapt the sample size when separation is encountered but to use FC as the default method of analysis whenever the number of observations or outcome events is critically low.

Suggested Citation

  • Hana Šinkovec & Angelika Geroldinger & Georg Heinze, 2019. "Bring More Data!—A Good Advice? Removing Separation in Logistic Regression by Increasing Sample Size," IJERPH, MDPI, vol. 16(23), pages 1-12, November.
  • Handle: RePEc:gam:jijerp:v:16:y:2019:i:23:p:4658-:d:289973
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1660-4601/16/23/4658/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1660-4601/16/23/4658/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Rousseeuw, Peter J. & Christmann, Andreas, 2003. "Robustness against separation and outliers in logistic regression," Computational Statistics & Data Analysis, Elsevier, vol. 43(3), pages 315-332, July.
    2. King, Gary & Zeng, Langche, 2001. "Logistic Regression in Rare Events Data," Political Analysis, Cambridge University Press, vol. 9(2), pages 137-163, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Domenica Matranga & Filippa Bono & Laura Maniscalco, 2021. "Statistical Advances in Epidemiology and Public Health," IJERPH, MDPI, vol. 18(7), pages 1-5, March.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. F. Gauthier & D. Germain & B. Hétu, 2017. "Logistic models as a forecasting tool for snow avalanches in a cold maritime climate: northern Gaspésie, Québec, Canada," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 89(1), pages 201-232, October.
    2. Douglas Cumming & Lars Hornuf & Moein Karami & Denis Schweizer, 2023. "Disentangling Crowdfunding from Fraudfunding," Journal of Business Ethics, Springer, vol. 182(4), pages 1103-1128, February.
    3. Cristian David Correa-Álvarez & Juan Carlos Salazar-Uribe & Luis Raúl Pericchi-Guerra, 2023. "Bayesian multilevel logistic regression models: a case study applied to the results of two questionnaires administered to university students," Computational Statistics, Springer, vol. 38(4), pages 1791-1810, December.
    4. Eunae Yoo & Elliot Rabinovich & Bin Gu, 2020. "The Growth of Follower Networks on Social Media Platforms for Humanitarian Operations," Production and Operations Management, Production and Operations Management Society, vol. 29(12), pages 2696-2715, December.
    5. Lo Turco, Alessia & Maggioni, Daniela, 2018. "Effects of Islamic religiosity on bilateral trust in trade: The case of Turkish exports," Journal of Comparative Economics, Elsevier, vol. 46(4), pages 947-965.
    6. Blackman, Allen & Guerrero, Santiago, 2012. "What drives voluntary eco-certification in Mexico?," Journal of Comparative Economics, Elsevier, vol. 40(2), pages 256-268.
    7. Alessandra Iannamorelli & Stefano Nobili & Antonio Scalia & Luana Zaccaria, 2024. "Asymmetric Information and Corporate Lending: Evidence from SME Bond Markets," Review of Finance, European Finance Association, vol. 28(1), pages 163-201.
    8. Mehrez Ben Slama & Dhafer Saidane & Hassouna Fedhila, 2012. "How to identify targets in the M&A banking operations? Case of cross-border strategies in Europe by line of activity," Review of Quantitative Finance and Accounting, Springer, vol. 38(2), pages 209-240, February.
    9. Lorenzo Cassi & Anne Plunket, 2014. "Proximity, network formation and inventive performance: in search of the proximity paradox," The Annals of Regional Science, Springer;Western Regional Science Association, vol. 53(2), pages 395-422, September.
    10. Xinfu Xing & Chenglong Wu & Jinhui Li & Xueyou Li & Limin Zhang & Rongjie He, 2021. "Susceptibility assessment for rainfall-induced landslides using a revised logistic regression method," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 106(1), pages 97-117, March.
    11. Hwang, Seokyoun & Sarath, Bharat & Han, Seung-youb, 2022. "Auditor independence: The effect of auditors’ quality control efforts and corporate governance," Journal of International Accounting, Auditing and Taxation, Elsevier, vol. 47(C).
    12. Lahiri, Kajal & Yang, Liu, 2013. "Forecasting Binary Outcomes," Handbook of Economic Forecasting, in: G. Elliott & C. Granger & A. Timmermann (ed.), Handbook of Economic Forecasting, edition 1, volume 2, chapter 0, pages 1025-1106, Elsevier.
    13. Eling, Martin & Jia, Ruo, 2018. "Business failure, efficiency, and volatility: Evidence from the European insurance industry," International Review of Financial Analysis, Elsevier, vol. 59(C), pages 58-76.
    14. Andrews, RJ & Fazio, Catherine & Guzman, Jorge & Liu, Yupeng & Stern, Scott, 2022. "The Startup Cartography Project: Measuring and mapping entrepreneurial ecosystems," Research Policy, Elsevier, vol. 51(2).
    15. Tom Broekel & Wladimir Mueller, 2018. "Critical links in knowledge networks – What about proximities and gatekeeper organisations?," Industry and Innovation, Taylor & Francis Journals, vol. 25(10), pages 919-939, November.
    16. Claudio Lucifora & Daria Vigani, 2022. "What if your boss is a woman? Evidence on gender discrimination at the workplace," Review of Economics of the Household, Springer, vol. 20(2), pages 389-417, June.
    17. Valérie Revest & Alessandro Sapio, 2016. "Graduation and sell-out strategies in the Alternative Investment Market," Discussion Papers 4_2016, CRISEI, University of Naples "Parthenope", Italy.
    18. Andy Lardon & Marc Deloof, 2014. "Financial disclosure by SMEs listed on a semi-regulated market: evidence from the Euronext Free Market," Small Business Economics, Springer, vol. 42(2), pages 361-385, February.
    19. Sarlin, Peter & von Schweinitz, Gregor, 2021. "Optimizing Policymakers’ Loss Functions In Crisis Prediction: Before, Within Or After?," Macroeconomic Dynamics, Cambridge University Press, vol. 25(1), pages 100-123, January.
    20. Paul Collier & Anke Hoeffler, 2004. "Greed and grievance in civil war," Oxford Economic Papers, Oxford University Press, vol. 56(4), pages 563-595, October.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jijerp:v:16:y:2019:i:23:p:4658-:d:289973. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.