IDEAS home Printed from https://ideas.repec.org/a/spr/compst/v36y2021i2d10.1007_s00180-020-01038-3.html
   My bibliography  Save this article

Efficient and doubly-robust methods for variable selection and parameter estimation in longitudinal data analysis

Author

Listed:
  • Liya Fu

    (Xi’an Jiaotong University)

  • Zhuoran Yang

    (Xi’an Jiaotong University)

  • Fengjing Cai

    (Wenzhou University)

  • You-Gan Wang

    (Queensland University of Technology)

Abstract

New technologies have produced increasingly complex and massive datasets, such as next generation sequencing and microarray data in biology, dynamic treatment regimes in clinical trials and long-term wide-scale studies in the social sciences. Each study exhibits its unique data structure within individuals, clusters and possibly across time and space. In order to draw valid conclusion from such large dimensional data, we must account for intracluster correlations, varying cluster sizes, and outliers in response and/or covariate domains to achieve valid and efficient inferences. A weighted rank-based method is proposed for selecting variables and estimating parameters simultaneously. The main contribution of the proposed method is four fold: (1) variable selection using adaptive lasso is extended to robust rank regression so that protection against outliers in both response and predictor variables is obtained; (2) within-subject correlations are incorporated so that efficiency of parameter estimation is improved; (3) the computation is convenient via the existing function in statistical software R. (4) the proposed method is proved to have desirable asymptotic properties for fixed number of covariates (p). Simulation studies are carried out to evaluate the proposed method for a number of scenarios including the cases when p equals to the number of subjects. The simulation results indicate that the proposed method is efficient and robust. A hormone dataset is analyzed for illustration. By adding additional redundant variables as covariates, the penalty approach and weighting schemes are proven to be effective.

Suggested Citation

  • Liya Fu & Zhuoran Yang & Fengjing Cai & You-Gan Wang, 2021. "Efficient and doubly-robust methods for variable selection and parameter estimation in longitudinal data analysis," Computational Statistics, Springer, vol. 36(2), pages 781-804, June.
  • Handle: RePEc:spr:compst:v:36:y:2021:i:2:d:10.1007_s00180-020-01038-3
    DOI: 10.1007/s00180-020-01038-3
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00180-020-01038-3
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00180-020-01038-3?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Lan Wang & Runze Li, 2009. "Weighted Wilcoxon-Type Smoothly Clipped Absolute Deviation Method," Biometrics, The International Biometric Society, vol. 65(2), pages 564-571, June.
    2. Lv, Jing & Yang, Hu & Guo, Chaohui, 2015. "An efficient and robust variable selection method for longitudinal generalized linear models," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 74-88.
    3. You-Gan Wang & Yudong Zhao, 2008. "Weighted Rank Regression for Clustered Data Analysis," Biometrics, The International Biometric Society, vol. 64(1), pages 39-45, March.
    4. Wing‐Kam Fung & Zhong‐Yi Zhu & Bo‐Cheng Wei & Xuming He, 2002. "Influence diagnostics and outlier tests for semiparametric mixed models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 64(3), pages 565-579, August.
    5. You-Gan Wang, 2003. "Working correlation structure misspecification, estimation and covariate design: Implications for generalised estimating equations performance," Biometrika, Biometrika Trust, vol. 90(1), pages 29-41, March.
    6. Xiao Ni & Daowen Zhang & Hao Helen Zhang, 2010. "Variable Selection for Semiparametric Mixed Models in Longitudinal Studies," Biometrics, The International Biometric Society, vol. 66(1), pages 79-88, March.
    7. Jianqing Fan & Runze Li, 2004. "New Estimation and Model Selection Procedures for Semiparametric Modeling in Longitudinal Data Analysis," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 710-723, January.
    8. Fan J. & Li R., 2001. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1348-1360, December.
    9. Yang, Hu & Guo, Chaohui & Lv, Jing, 2015. "SCAD penalized rank regression with a diverging number of parameters," Journal of Multivariate Analysis, Elsevier, vol. 133(C), pages 321-333.
    10. Fu, Liya & Wang, You-Gan & Bai, Zhidong, 2010. "Rank regression for analysis of clustered data: A natural induced smoothing approach," Computational Statistics & Data Analysis, Elsevier, vol. 54(4), pages 1036-1050, April.
    11. Liya Fu & You-Gan Wang, 2012. "Efficient Estimation for Rank-Based Regression with Clustered Data," Biometrics, The International Biometric Society, vol. 68(4), pages 1074-1082, December.
    12. Xueqin Wang & Yunlu Jiang & Mian Huang & Heping Zhang, 2013. "Robust Variable Selection With Exponential Squared Loss," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 108(502), pages 632-643, June.
    13. Fan, Yali & Qin, Guoyou & Zhu, Zhongyi, 2012. "Variable selection in robust regression models for longitudinal data," Journal of Multivariate Analysis, Elsevier, vol. 109(C), pages 156-167.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Guney, Yesim & Arslan, Olcay & Yavuz, Fulya Gokalp, 2022. "Robust estimation in multivariate heteroscedastic regression models with autoregressive covariance structures using EM algorithm," Journal of Multivariate Analysis, Elsevier, vol. 191(C).
    2. Kangning Wang & Lu Lin, 2019. "Robust and efficient estimator for simultaneous model structure identification and variable selection in generalized partial linear varying coefficient models with longitudinal data," Statistical Papers, Springer, vol. 60(5), pages 1649-1676, October.
    3. Feng, Sanying & Lian, Heng & Xue, Liugen, 2016. "A new nested Cholesky decomposition and estimation for the covariance matrix of bivariate longitudinal data," Computational Statistics & Data Analysis, Elsevier, vol. 102(C), pages 98-109.
    4. Kangning Wang & Wen Shan, 2021. "Copula and composite quantile regression-based estimating equations for longitudinal data," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 73(3), pages 441-455, June.
    5. Smucler, Ezequiel & Yohai, Victor J., 2017. "Robust and sparse estimators for linear regression models," Computational Statistics & Data Analysis, Elsevier, vol. 111(C), pages 116-130.
    6. Wang, Kangning & Li, Shaomin & Sun, Xiaofei & Lin, Lu, 2019. "Modal regression statistical inference for longitudinal data semivarying coefficient models: Generalized estimating equations, empirical likelihood and variable selection," Computational Statistics & Data Analysis, Elsevier, vol. 133(C), pages 257-276.
    7. Zhao, Weihua & Lian, Heng & Song, Xinyuan, 2017. "Composite quantile regression for correlated data," Computational Statistics & Data Analysis, Elsevier, vol. 109(C), pages 15-33.
    8. Lv, Jing & Yang, Hu & Guo, Chaohui, 2015. "An efficient and robust variable selection method for longitudinal generalized linear models," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 74-88.
    9. Gijbels, I. & Vrinssen, I., 2015. "Robust nonnegative garrote variable selection in linear regression," Computational Statistics & Data Analysis, Elsevier, vol. 85(C), pages 1-22.
    10. Tianfa Xie & Ruiyuan Cao & Jiang Du, 2020. "Variable selection for spatial autoregressive models with a diverging number of parameters," Statistical Papers, Springer, vol. 61(3), pages 1125-1145, June.
    11. Jia Chen & Jiti Gao, 2014. "Semiparametric Model Selection in Panel Data Models with Deterministic Trends and Cross-Sectional Dependence," Monash Econometrics and Business Statistics Working Papers 15/14, Monash University, Department of Econometrics and Business Statistics.
    12. Yeşim Güney & Yetkin Tuaç & Şenay Özdemir & Olcay Arslan, 2021. "Robust estimation and variable selection in heteroscedastic regression model using least favorable distribution," Computational Statistics, Springer, vol. 36(2), pages 805-827, June.
    13. Li, Haocheng & Shu, Di & He, Wenqing & Yi, Grace Y., 2019. "Variable selection via the composite likelihood method for multilevel longitudinal data with missing responses and covariates," Computational Statistics & Data Analysis, Elsevier, vol. 135(C), pages 25-34.
    14. Kangning Wang & Mengjie Hao & Xiaofei Sun, 2021. "Robust and efficient estimating equations for longitudinal data partial linear models and its applications," Statistical Papers, Springer, vol. 62(5), pages 2147-2168, October.
    15. Ni, Xiao & Zhang, Hao Helen & Zhang, Daowen, 2009. "Automatic model selection for partially linear models," Journal of Multivariate Analysis, Elsevier, vol. 100(9), pages 2100-2111, October.
    16. Peng, Heng & Lu, Ying, 2012. "Model selection in linear mixed effect models," Journal of Multivariate Analysis, Elsevier, vol. 109(C), pages 109-129.
    17. Xiao Ni & Daowen Zhang & Hao Helen Zhang, 2010. "Variable Selection for Semiparametric Mixed Models in Longitudinal Studies," Biometrics, The International Biometric Society, vol. 66(1), pages 79-88, March.
    18. Shan Luo & Zehua Chen, 2014. "Sequential Lasso Cum EBIC for Feature Selection With Ultra-High Dimensional Feature Space," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(507), pages 1229-1240, September.
    19. Lai, Peng & Wang, Qihua & Lian, Heng, 2012. "Bias-corrected GEE estimation and smooth-threshold GEE variable selection for single-index models with clustered data," Journal of Multivariate Analysis, Elsevier, vol. 105(1), pages 422-432.
    20. Joseph G. Ibrahim & Hongtu Zhu & Ramon I. Garcia & Ruixin Guo, 2011. "Fixed and Random Effects Selection in Mixed Effects Models," Biometrics, The International Biometric Society, vol. 67(2), pages 495-503, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:compst:v:36:y:2021:i:2:d:10.1007_s00180-020-01038-3. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.