IDEAS home Printed from https://ideas.repec.org/a/eee/jmvana/v194y2023ics0047259x22001191.html
   My bibliography  Save this article

A split-and-conquer variable selection approach for high-dimensional general semiparametric models with massive data

Author

Listed:
  • Fang, Jianglin

Abstract

Estimation and variable selection in partially linear models for massive data has been discussed by several authors. However, there does not seem to exist an established procedure for other semiparametric models, such as the semiparametric varying-coefficient linear model, the single index regression model, the partially linear errors-in-variables model, etc. In this paper, we propose a general procedure for variable selection in high-dimensional general semiparametric models by penalized semiparametric estimating equations. Under some regularity conditions, the oracle property is established, which the number of parameters is allowed to diverge. Furthermore, we also propose a split-and-conquer variable selection procedure for high-dimensional general semiparametric models with massive data. Under some weak regularity conditions, we establish the oracle property of the proposed procedure when the number of subsets does not grow too fast. What is more, the split-and-conquer procedure enjoys the oracle property as the penalized estimator by using all the dataset, and can substantially reduce computing time and computer memory requirements. The performance of the proposed method is illustrated via a real data application and numerical simulations.

Suggested Citation

  • Fang, Jianglin, 2023. "A split-and-conquer variable selection approach for high-dimensional general semiparametric models with massive data," Journal of Multivariate Analysis, Elsevier, vol. 194(C).
  • Handle: RePEc:eee:jmvana:v:194:y:2023:i:c:s0047259x22001191
    DOI: 10.1016/j.jmva.2022.105128
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0047259X22001191
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.jmva.2022.105128?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Chenlei Leng & Cheng Yong Tang, 2012. "Penalized empirical likelihood and growing dimensional general estimating equations," Biometrika, Biometrika Trust, vol. 99(3), pages 703-716.
    2. Whitney K. Newey & Richard J. Smith, 2004. "Higher Order Properties of Gmm and Generalized Empirical Likelihood Estimators," Econometrica, Econometric Society, vol. 72(1), pages 219-255, January.
    3. Lan Wang & Jianhui Zhou & Annie Qu, 2012. "Penalized Generalized Estimating Equations for High-Dimensional Longitudinal Data Analysis," Biometrics, The International Biometric Society, vol. 68(2), pages 353-360, June.
    4. Kane, Michael & Emerson, John W. & Weston, Stephen, 2013. "Scalable Strategies for Computing with Massive Data," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 55(i14).
    5. Fan J. & Li R., 2001. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1348-1360, December.
    6. Johnson, Brent A. & Lin, D.Y. & Zeng, Donglin, 2008. "Penalized Estimating Functions and Variable Selection in Semiparametric Regression Models," Journal of the American Statistical Association, American Statistical Association, vol. 103, pages 672-680, June.
    7. Ariel Kleiner & Ameet Talwalkar & Purnamrita Sarkar & Michael I. Jordan, 2014. "A scalable bootstrap for massive data," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 76(4), pages 795-816, September.
    8. Cheng Yong Tang & Chenlei Leng, 2010. "Penalized high-dimensional empirical likelihood," Biometrika, Biometrika Trust, vol. 97(4), pages 905-920.
    9. Qifan Song & Faming Liang, 2015. "A split-and-merge Bayesian variable selection approach for ultrahigh dimensional regression," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 77(5), pages 947-972, November.
    10. Wenjiang J. Fu, 2003. "Penalized Estimating Equations," Biometrics, The International Biometric Society, vol. 59(1), pages 126-132, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ouyang, Jiangrong & Bondell, Howard, 2023. "Bayesian analysis of longitudinal data via empirical likelihood," Computational Statistics & Data Analysis, Elsevier, vol. 187(C).
    2. Chang, Jinyuan & Chen, Song Xi & Chen, Xiaohong, 2015. "High dimensional generalized empirical likelihood for moment restrictions with dependent data," Journal of Econometrics, Elsevier, vol. 185(1), pages 283-304.
    3. Zhang, Jia & Shi, Haoming & Tian, Lemeng & Xiao, Fengjun, 2019. "Penalized generalized empirical likelihood in high-dimensional weakly dependent data," Journal of Multivariate Analysis, Elsevier, vol. 171(C), pages 270-283.
    4. Qinqin Hu & Lu Lin, 2017. "Conditional sure independence screening by conditional marginal empirical likelihood," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 69(1), pages 63-96, February.
    5. Lu Tang & Peter X.‐K. Song, 2021. "Poststratification fusion learning in longitudinal data analysis," Biometrics, The International Biometric Society, vol. 77(3), pages 914-928, September.
    6. Li, Cheng & Jiang, Wenxin, 2016. "On oracle property and asymptotic validity of Bayesian generalized method of moments," Journal of Multivariate Analysis, Elsevier, vol. 145(C), pages 132-147.
    7. Zheqi Wang & Dehui Wang & Jianhua Cheng, 2023. "A new autoregressive process driven by explanatory variables and past observations: an application to PM 2.5," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 32(2), pages 619-658, June.
    8. Tang, Niansheng & Yan, Xiaodong & Zhao, Puying, 2018. "Exponentially tilted likelihood inference on growing dimensional unconditional moment models," Journal of Econometrics, Elsevier, vol. 202(1), pages 57-74.
    9. Xia Chen & Liyue Mao, 2020. "Penalized empirical likelihood for partially linear errors-in-variables models," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 104(4), pages 597-623, December.
    10. Fan, Guo-Liang & Liang, Han-Ying & Shen, Yu, 2016. "Penalized empirical likelihood for high-dimensional partially linear varying coefficient model with measurement errors," Journal of Multivariate Analysis, Elsevier, vol. 147(C), pages 183-201.
    11. Zhang, Ting & Wang, Lei, 2020. "Smoothed empirical likelihood inference and variable selection for quantile regression with nonignorable missing response," Computational Statistics & Data Analysis, Elsevier, vol. 144(C).
    12. Li, Gaorong & Lian, Heng & Feng, Sanying & Zhu, Lixing, 2013. "Automatic variable selection for longitudinal generalized linear models," Computational Statistics & Data Analysis, Elsevier, vol. 61(C), pages 174-186.
    13. Feng, Sanying & Lian, Heng & Zhu, Fukang, 2016. "Reduced rank regression with possibly non-smooth criterion functions: An empirical likelihood approach," Computational Statistics & Data Analysis, Elsevier, vol. 103(C), pages 139-150.
    14. Brittany Green & Heng Lian & Yan Yu & Tianhai Zu, 2021. "Ultra high‐dimensional semiparametric longitudinal data analysis," Biometrics, The International Biometric Society, vol. 77(3), pages 903-913, September.
    15. Tamar Sofer & Elizabeth D. Schifano & David C. Christiani & Xihong Lin, 2017. "Weighted pseudolikelihood for SNP set analysis with multiple secondary outcomes in case‐control genetic association studies," Biometrics, The International Biometric Society, vol. 73(4), pages 1210-1220, December.
    16. Xingwei Tong & Xin He & Liuquan Sun & Jianguo Sun, 2009. "Variable Selection for Panel Count Data via Non‐Concave Penalized Estimating Function," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 36(4), pages 620-635, December.
    17. Tong Tong Wu & Gang Li & Chengyong Tang, 2015. "Empirical Likelihood for Censored Linear Regression and Variable Selection," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 42(3), pages 798-812, September.
    18. Blommaert, A. & Hens, N. & Beutels, Ph., 2014. "Data mining for longitudinal data under multicollinearity and time dependence using penalized generalized estimating equations," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 667-680.
    19. Geronimi, J. & Saporta, G., 2017. "Variable selection for multiply-imputed data with penalized generalized estimating equations," Computational Statistics & Data Analysis, Elsevier, vol. 110(C), pages 103-114.
    20. Mahdieh Bayati & Seyed Kamran Ghoreishi & Jingjing Wu, 2021. "Bayesian analysis of restricted penalized empirical likelihood," Computational Statistics, Springer, vol. 36(2), pages 1321-1339, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:jmvana:v:194:y:2023:i:c:s0047259x22001191. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/wps/find/journaldescription.cws_home/622892/description#description .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.