IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v115y2017icp35-52.html
   My bibliography  Save this article

Should we impute or should we weight? Examining the performance of two CART-based techniques for addressing missing data in small sample research with nonnormal variables

Author

Listed:
  • Hayes, Timothy
  • McArdle, John J.

Abstract

Recently, researchers have proposed a variety of new methods for employing exploratory data mining algorithms to address missing data. Two promising classes of missing data methods take advantage of classification and regression trees and random forests. A first method uses the predicted probabilities of response (vs. non-response) generated by a CART analysis to create inverse probability weights. This method has been shown to perform well in prior simulations when nonresponse was generated by tree-based structures, even under low sample sizes. A second method uses the values falling in terminal nodes of CART trees to generate multiple imputations. In prior studies, these methods performed well at estimating main effects and interactions in regression models when sample sizes were large (N=1000), but their performance was not evaluated under small sample conditions. In the present research, we assess the performance of CART-based weights and CART-based imputations under low sample sizes (N=125 or 250) and nonnormality when missing data are generated by smooth functions (linear, quadratic, cubic, interactive). Results suggest that random forest weights excel under low sample sizes, regardless of nonnormality, whereas CART multiple imputation is more efficient with larger samples (N=500 or 1000).

Suggested Citation

  • Hayes, Timothy & McArdle, John J., 2017. "Should we impute or should we weight? Examining the performance of two CART-based techniques for addressing missing data in small sample research with nonnormal variables," Computational Statistics & Data Analysis, Elsevier, vol. 115(C), pages 35-52.
  • Handle: RePEc:eee:csdana:v:115:y:2017:i:c:p:35-52
    DOI: 10.1016/j.csda.2017.05.006
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947317301019
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2017.05.006?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. van Buuren, Stef & Groothuis-Oudshoorn, Karin, 2011. "mice: Multivariate Imputation by Chained Equations in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 45(i03).
    2. Allen Fleishman, 1978. "A method for simulating non-normal distributions," Psychometrika, Springer;The Psychometric Society, vol. 43(4), pages 521-532, December.
    3. Doove, L.L. & Van Buuren, S. & Dusseldorp, E., 2014. "Recursive partitioning for missing data imputation in the presence of interaction effects," Computational Statistics & Data Analysis, Elsevier, vol. 72(C), pages 92-104.
    4. Bengt Muthén & David Kaplan & Michael Hollis, 1987. "On structural equation modeling with data that are not missing completely at random," Psychometrika, Springer;The Psychometric Society, vol. 52(3), pages 431-462, September.
    5. Oberski, Daniel, 2014. "lavaan.survey: An R Package for Complex Survey Analysis of Structural Equation Models," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 57(i01).
    6. Rosseel, Yves, 2012. "lavaan: An R Package for Structural Equation Modeling," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 48(i02).
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Satoshi Usami & Ross Jacobucci & Timothy Hayes, 2019. "The performance of latent growth curve model-based structural equation model trees to uncover population heterogeneity in growth trajectories," Computational Statistics, Springer, vol. 34(1), pages 1-22, March.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Bence Csaba Farkas & Valérian Chambon & Pierre O. Jacquet, 2022. "Do perceived control and time orientation mediate the effect of early life adversity on reproductive behaviour and health status? Insights from the European Value Study and the European Social Survey," Palgrave Communications, Palgrave Macmillan, vol. 9(1), pages 1-14, December.
    2. Georges Steffgen & Philipp E. Sischka & Martha Fernandez de Henestrosa, 2020. "The Quality of Work Index and the Quality of Employment Index: A Multidimensional Approach of Job Quality and Its Links to Well-Being at Work," IJERPH, MDPI, vol. 17(21), pages 1-31, October.
    3. Esef Hakan Toytok & Sungur Gürel, 2019. "Does Project Children’s University Increase Academic Self-Efficacy in 6th Graders? A Weak Experimental Design," Sustainability, MDPI, vol. 11(3), pages 1-12, February.
    4. Pezzuti, Lina & Tommasi, Marco & Saggino, Aristide & Dawe, James & Lauriola, Marco, 2020. "Gender differences and measurement bias in the assessment of adult intelligence: Evidence from the Italian WAIS-IV and WAIS-R standardizations," Intelligence, Elsevier, vol. 79(C).
    5. Julia Morgan & Casey Canfield, 2021. "Comparing Behavioral Theories to Predict Consumer Interest to Participate in Energy Sharing," Sustainability, MDPI, vol. 13(14), pages 1-17, July.
    6. Reynolds, J.P. & Pilling, M. & Marteau, T.M., 2018. "Communicating quantitative evidence of policy effectiveness and support for the policy: Three experimental studies," Social Science & Medicine, Elsevier, vol. 218(C), pages 1-12.
    7. Youngjoo Cho & Debashis Ghosh, 2021. "Quantile-Based Subgroup Identification for Randomized Clinical Trials," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 13(1), pages 90-128, April.
    8. Kate R. Pawloski & Betty Kolod & Rabeea F. Khan & Vishal Midya & Tania Chen & Adeyemi Oduwole & Bernard Camins & Elena Colicino & I. Michael Leitman & Ismail Nabeel & Kristin Oliver & Damaskini Valvi, 2021. "Factors Associated with SARS-CoV-2 Infection in Physician Trainees in New York City during the First COVID-19 Wave," IJERPH, MDPI, vol. 18(10), pages 1-15, May.
    9. Ana Isabel Maldonado & Carol B. Cunradi & Anna María Nápoles, 2020. "Racial/Ethnic Discrimination and Intimate Partner Violence Perpetration in Latino Men: The Mediating Effects of Mental Health," IJERPH, MDPI, vol. 17(21), pages 1-17, November.
    10. Liu, Yuqing & Schuberth, Florian & Liu, Yide & Henseler, Jörg, 2022. "Modeling and assessing forged concepts in tourism and hospitality using confirmatory composite analysis," Journal of Business Research, Elsevier, vol. 152(C), pages 221-230.
    11. Sebastian Kurten & David Winant & Kathleen Beullens, 2021. "Mothers Matter: Using Regression Tree Algorithms to Predict Adolescents’ Sharing of Drunk References on Social Media," IJERPH, MDPI, vol. 18(21), pages 1-16, October.
    12. Johannes Bodo Heekerens & Kathrin Heinitz, 2019. "Looking Forward: The Effect of the Best-Possible-Self Intervention on Thriving Through Relative Intrinsic Goal Pursuits," Journal of Happiness Studies, Springer, vol. 20(5), pages 1379-1395, June.
    13. Yen Lee & David Kaplan, 2018. "Generating Multivariate Ordinal Data via Entropy Principles," Psychometrika, Springer;The Psychometric Society, vol. 83(1), pages 156-181, March.
    14. Humera Razzak & Christian Heumann, 2019. "Hybrid Multiple Imputation In A Large Scale Complex Survey," Statistics in Transition New Series, Polish Statistical Association, vol. 20(4), pages 33-58, December.
    15. Eva Spiritus-Beerden & An Verelst & Ines Devlieger & Nina Langer Primdahl & Fábio Botelho Guedes & Antonio Chiarenza & Stephanie De Maesschalck & Natalie Durbeej & Rocío Garrido & Margarida Gaspar de , 2021. "Mental Health of Refugees and Migrants during the COVID-19 Pandemic: The Role of Experienced Discrimination and Daily Stressors," IJERPH, MDPI, vol. 18(12), pages 1-14, June.
    16. Razzak Humera & Heumann Christian, 2019. "Hybrid Multiple Imputation In A Large Scale Complex Survey," Statistics in Transition New Series, Polish Statistical Association, vol. 20(4), pages 33-58, December.
    17. Sheppard, Leah D. & O'Reilly, Jane & van Dijke, Marius & Restubog, Simon Lloyd D. & Aquino, Karl, 2020. "The stress-relieving benefits of positively experienced social sexual behavior in the workplace," Organizational Behavior and Human Decision Processes, Elsevier, vol. 156(C), pages 38-52.
    18. Pierre O. Jacquet & Farid Pazhoohi & Charles Findling & Hugo Mell & Coralie Chevallier & Nicolas Baumard, 2021. "Predictive modeling of religiosity, prosociality, and moralizing in 295,000 individuals from European and non-European populations," Palgrave Communications, Palgrave Macmillan, vol. 8(1), pages 1-12, December.
    19. Marjolein C. J. Caniëls & Wim Lambrechts & Johannes (Joost) Platje & Anna Motylska-Kuźma & Bartosz Fortuński, 2021. "50 Shades of Green: Insights into Personal Values and Worldviews as Drivers of Green Purchasing Intention, Behaviour, and Experience," Sustainability, MDPI, vol. 13(8), pages 1-18, April.
    20. Eichelberger, Dominique A. & Sticca, Fabio & Kübler, Dinah R. & Kakebeeke, Tanja H. & Caflisch, Jon A. & Jenni, Oskar G. & Wehrle, Flavia M., 2023. "Stability of mental abilities and physical growth from 6 months to 65 years: Findings from the Zurich Longitudinal Studies," Intelligence, Elsevier, vol. 97(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:115:y:2017:i:c:p:35-52. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.