IDEAS home Printed from https://ideas.repec.org/a/spr/stmapp/v32y2023i2d10.1007_s10260-022-00658-x.html
   My bibliography  Save this article

Predictions of machine learning with mixed-effects in analyzing longitudinal data under model misspecification

Author

Listed:
  • Shuwen Hu

    (Queensland University of Technology
    CSIRO Agriculture & Food)

  • You-Gan Wang

    (Queensland University of Technology
    Australian Catholic University)

  • Christopher Drovandi

    (Queensland University of Technology)

  • Taoyun Cao

    (Guangdong University of Finance and Economics)

Abstract

We consider predictions in longitudinal studies, and investigate the well known statistical mixed-effects model, piecewise linear mixed-effects model and six different popular machine learning approaches: decision trees, bagging, random forest, boosting, support-vector machine and neural network. In order to consider the correlated data in machine learning, the random effects is combined into the traditional tree methods and random forest. Our focus is the performance of statistical modelling and machine learning especially in the cases of the misspecification of the fixed effects and the random effects. Extensive simulation studies have been carried out to evaluate the performance using a number of criteria. Two real datasets from longitudinal studies are analysed to demonstrate our findings. The R code and dataset are freely available at https://github.com/shuwen92/MEML .

Suggested Citation

  • Shuwen Hu & You-Gan Wang & Christopher Drovandi & Taoyun Cao, 2023. "Predictions of machine learning with mixed-effects in analyzing longitudinal data under model misspecification," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 32(2), pages 681-711, June.
  • Handle: RePEc:spr:stmapp:v:32:y:2023:i:2:d:10.1007_s10260-022-00658-x
    DOI: 10.1007/s10260-022-00658-x
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10260-022-00658-x
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10260-022-00658-x?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Fu, Wei & Simonoff, Jeffrey S., 2015. "Unbiased regression trees for longitudinal and clustered data," Computational Statistics & Data Analysis, Elsevier, vol. 88(C), pages 53-74.
    2. Crane-Droesch, Andrew, 2017. "Semiparametric Panel Data Using Neural Networks," 2017 Annual Meeting, July 30-August 1, Chicago, Illinois 258128, Agricultural and Applied Economics Association.
    3. You-Gan Wang & Xu Lin, 2005. "Effects of Variance-Function Misspecification in Analysis of Longitudinal Data," Biometrics, The International Biometric Society, vol. 61(2), pages 413-421, June.
    4. You-Gan Wang, 2003. "Working correlation structure misspecification, estimation and covariate design: Implications for generalised estimating equations performance," Biometrika, Biometrika Trust, vol. 90(1), pages 29-41, March.
    5. Reza Drikvandi & Geert Verbeke & Geert Molenberghs, 2017. "Diagnosing misspecification of the random-effects distribution in mixed models," Biometrics, The International Biometric Society, vol. 73(1), pages 63-71, March.
    6. Francis K. C. Hui & Samuel Müller & Alan H. Welsh, 2021. "Random Effects Misspecification Can Have Severe Consequences for Random Effects Inference in Linear Mixed Models," International Statistical Review, International Statistical Institute, vol. 89(1), pages 186-206, April.
    7. Hajjem, Ahlem & Larocque, Denis & Bellavance, François, 2017. "Generalized mixed effects regression trees," Statistics & Probability Letters, Elsevier, vol. 126(C), pages 114-118.
    8. Hajjem, Ahlem & Bellavance, François & Larocque, Denis, 2011. "Mixed effects regression trees for clustered data," Statistics & Probability Letters, Elsevier, vol. 81(4), pages 451-459, April.
    9. Leonardo Grilli & Carla Rampichini, 2015. "Specification of random effects in multilevel models: a review," Quality & Quantity: International Journal of Methodology, Springer, vol. 49(3), pages 967-976, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Francis K. C. Hui & Samuel Müller & Alan H. Welsh, 2021. "Random Effects Misspecification Can Have Severe Consequences for Random Effects Inference in Linear Mixed Models," International Statistical Review, International Statistical Institute, vol. 89(1), pages 186-206, April.
    2. Steffen Nestler & Sarah Humberg, 2022. "A Lasso and a Regression Tree Mixed-Effect Model with Random Effects for the Level, the Residual Variance, and the Autocorrelation," Psychometrika, Springer;The Psychometric Society, vol. 87(2), pages 506-532, June.
    3. You-Gan Wang & Yuning Zhao, 2007. "A Modified Pseudolikelihood Approach for Analysis of Longitudinal Data," Biometrics, The International Biometric Society, vol. 63(3), pages 681-689, September.
    4. You-Gan Wang & Xu Lin & Min Zhu, 2005. "Robust Estimating Functions and Bias Correction for Longitudinal Data Analysis," Biometrics, The International Biometric Society, vol. 61(3), pages 684-691, September.
    5. Wang, You-Gan & Hin, Lin-Yee, 2010. "Modeling strategies in longitudinal data analysis: Covariate, variance function and correlation structure selection," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 3359-3370, December.
    6. Tsubasa Ito & Shonosuke Sugasawa, 2023. "Grouped generalized estimating equations for longitudinal data analysis," Biometrics, The International Biometric Society, vol. 79(3), pages 1868-1879, September.
    7. Kim, Seheon & Rasouli, Soora & Timmermans, Harry & Yang, Dujuan, 2018. "Estimating panel effects in probabilistic representations of dynamic decision trees using bayesian generalized linear mixture models," Transportation Research Part B: Methodological, Elsevier, vol. 111(C), pages 168-184.
    8. Fu, Liya & Wang, You-Gan & Bai, Zhidong, 2010. "Rank regression for analysis of clustered data: A natural induced smoothing approach," Computational Statistics & Data Analysis, Elsevier, vol. 54(4), pages 1036-1050, April.
    9. Anna Gottard & Giulia Vannucci & Leonardo Grilli & Carla Rampichini, 2023. "Mixed-effect models with trees," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 17(2), pages 431-461, June.
    10. Peter Calhoun & Richard A. Levine & Juanjuan Fan, 2021. "Repeated measures random forests (RMRF): Identifying factors associated with nocturnal hypoglycemia," Biometrics, The International Biometric Society, vol. 77(1), pages 343-351, March.
    11. O'Hara Hines, R.J. & Hines, W.G.S., 2007. "Covariance miss-specification and the local influence approach in sensitivity analyses of longitudinal data with drop-outs," Computational Statistics & Data Analysis, Elsevier, vol. 51(12), pages 5537-5546, August.
    12. Tsionas, Mike, 2022. "Efficiency estimation using probabilistic regression trees with an application to Chilean manufacturing industries," International Journal of Production Economics, Elsevier, vol. 249(C).
    13. Patrick Krennmair & Timo Schmid, 2022. "Flexible domain prediction using mixed effects random forests," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 71(5), pages 1865-1894, November.
    14. Vens, Maren & Ziegler, Andreas, 2012. "Generalized estimating equations and regression diagnostics for longitudinal controlled clinical trials: A case study," Computational Statistics & Data Analysis, Elsevier, vol. 56(5), pages 1232-1242.
    15. Zelenkov, Yu. & Solntsev, I., 2022. "Predicting the value of professional sport clubs. A study of European soccer, 2005-2018," Journal of the New Economic Association, New Economic Association, vol. 56(4), pages 28-46.
    16. Liya Fu & Zhuoran Yang & Yan Zhou & You-Gan Wang, 2021. "An efficient Gehan-type estimation for the accelerated failure time model with clustered and censored data," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 27(4), pages 679-709, October.
    17. Feng, Sanying & Lian, Heng & Xue, Liugen, 2016. "A new nested Cholesky decomposition and estimation for the covariance matrix of bivariate longitudinal data," Computational Statistics & Data Analysis, Elsevier, vol. 102(C), pages 98-109.
    18. Tang, Niansheng & Wang, Wenjun, 2019. "Robust estimation of generalized estimating equations with finite mixture correlation matrices and missing covariates at random for longitudinal data," Journal of Multivariate Analysis, Elsevier, vol. 173(C), pages 640-655.
    19. Peng, Cheng & Yang, Yihe & Zhou, Jie & Pan, Jianxin, 2022. "Latent Gaussian copula models for longitudinal binary data," Journal of Multivariate Analysis, Elsevier, vol. 189(C).
    20. You-Gan Wang & Yudong Zhao, 2008. "Weighted Rank Regression for Clustered Data Analysis," Biometrics, The International Biometric Society, vol. 64(1), pages 39-45, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:stmapp:v:32:y:2023:i:2:d:10.1007_s10260-022-00658-x. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.