IDEAS home Printed from https://ideas.repec.org/a/spr/aistmt/v70y2018i3d10.1007_s10463-017-0602-4.html
   My bibliography  Save this article

Robust variable selection for finite mixture regression models

Author

Listed:
  • Qingguo Tang

    (Nanjing University of Science and Technology)

  • R. J. Karunamuni

    (University of Alberta)

Abstract

Finite mixture regression (FMR) models are frequently used in statistical modeling, often with many covariates with low significance. Variable selection techniques can be employed to identify the covariates with little influence on the response. The problem of variable selection in FMR models is studied here. Penalized likelihood-based approaches are sensitive to data contamination, and their efficiency may be significantly reduced when the model is slightly misspecified. We propose a new robust variable selection procedure for FMR models. The proposed method is based on minimum-distance techniques, which seem to have some automatic robustness to model misspecification. We show that the proposed estimator has the variable selection consistency and oracle property. The finite-sample breakdown point of the estimator is established to demonstrate its robustness. We examine small-sample and robustness properties of the estimator using a Monte Carlo study. We also analyze a real data set.

Suggested Citation

  • Qingguo Tang & R. J. Karunamuni, 2018. "Robust variable selection for finite mixture regression models," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 70(3), pages 489-521, June.
  • Handle: RePEc:spr:aistmt:v:70:y:2018:i:3:d:10.1007_s10463-017-0602-4
    DOI: 10.1007/s10463-017-0602-4
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10463-017-0602-4
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10463-017-0602-4?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Zou, Hui, 2006. "The Adaptive Lasso and Its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1418-1429, December.
    2. Marianthi Markatou, 2000. "Mixture Models, Robustness, and the Weighted Likelihood Methodology," Biometrics, The International Biometric Society, vol. 56(2), pages 483-486, June.
    3. Leisch, Friedrich, 2004. "FlexMix: A General Framework for Finite Mixture Models and Latent Class Regression in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 11(i08).
    4. Karunamuni, Rohana J. & Wu, Jingjing, 2011. "One-step minimum Hellinger distance estimation," Computational Statistics & Data Analysis, Elsevier, vol. 55(12), pages 3148-3164, December.
    5. Chen, Song Xi, 1999. "Beta kernel estimators for density functions," Computational Statistics & Data Analysis, Elsevier, vol. 31(2), pages 131-145, August.
    6. Tang, Qingguo & Karunamuni, Rohana J., 2013. "Minimum distance estimation in a finite mixture regression model," Journal of Multivariate Analysis, Elsevier, vol. 120(C), pages 185-204.
    7. Fan J. & Li R., 2001. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1348-1360, December.
    8. Abbas Khalili & Shili Lin, 2013. "Regularization in Finite Mixture of Regression Models with Diverging Number of Parameters," Biometrics, The International Biometric Society, vol. 69(2), pages 436-446, June.
    9. Wang, Hansheng & Li, Guodong & Jiang, Guohua, 2007. "Robust Regression Shrinkage and Consistent Variable Selection Through the LAD-Lasso," Journal of Business & Economic Statistics, American Statistical Association, vol. 25, pages 347-355, July.
    10. Wu, Jingjing & Karunamuni, Rohana & Zhang, Biao, 2010. "Minimum Hellinger distance estimation in a two-sample semiparametric model," Journal of Multivariate Analysis, Elsevier, vol. 101(5), pages 1102-1122, May.
    11. Hui Zou & Trevor Hastie, 2005. "Addendum: Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(5), pages 768-768, November.
    12. Zudi Lu & Yer Van Hui & Andy H. Lee, 2003. "Minimum Hellinger Distance Estimation for Finite Mixtures of Poisson Regression Models and Its Applications," Biometrics, The International Biometric Society, vol. 59(4), pages 1016-1026, December.
    13. Wu, Jingjing & Karunamuni, Rohana J., 2012. "Efficient Hellinger distance estimates for semiparametric models," Journal of Multivariate Analysis, Elsevier, vol. 107(C), pages 1-23.
    14. Hui Zou & Trevor Hastie, 2005. "Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(2), pages 301-320, April.
    15. Khalili, Abbas & Chen, Jiahua, 2007. "Variable Selection in Finite Mixture of Regression Models," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 1025-1038, September.
    16. Xueqin Wang & Yunlu Jiang & Mian Huang & Heping Zhang, 2013. "Robust Variable Selection With Exponential Squared Loss," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 108(502), pages 632-643, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Jennifer S. K. Chan & S. T. Boris Choy & Udi Makov & Ariel Shamir & Vered Shapovalov, 2022. "Variable Selection Algorithm for a Mixture of Poisson Regression for Handling Overdispersion in Claims Frequency Modeling Using Telematics Car Driving Data," Risks, MDPI, vol. 10(4), pages 1-10, April.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yan Li & Chun Yu & Yize Zhao & Weixin Yao & Robert H. Aseltine & Kun Chen, 2022. "Pursuing sources of heterogeneity in modeling clustered population," Biometrics, The International Biometric Society, vol. 78(2), pages 716-729, June.
    2. Mingqiu Wang & Guo-Liang Tian, 2016. "Robust group non-convex estimations for high-dimensional partially linear models," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 28(1), pages 49-67, March.
    3. Lee, Kuo-Jung & Feldkircher, Martin & Chen, Yi-Chi, 2021. "Variable selection in finite mixture of regression models with an unknown number of components," Computational Statistics & Data Analysis, Elsevier, vol. 158(C).
    4. Umberto Amato & Anestis Antoniadis & Italia De Feis & Irene Gijbels, 2021. "Penalised robust estimators for sparse and high-dimensional linear models," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 30(1), pages 1-48, March.
    5. Wentao Wang & Jiaxuan Liang & Rong Liu & Yunquan Song & Min Zhang, 2022. "A Robust Variable Selection Method for Sparse Online Regression via the Elastic Net Penalty," Mathematics, MDPI, vol. 10(16), pages 1-18, August.
    6. T. Cai & J. Huang & L. Tian, 2009. "Regularized Estimation for the Accelerated Failure Time Model," Biometrics, The International Biometric Society, vol. 65(2), pages 394-404, June.
    7. Jonathan Boss & Alexander Rix & Yin‐Hsiu Chen & Naveen N. Narisetty & Zhenke Wu & Kelly K. Ferguson & Thomas F. McElrath & John D. Meeker & Bhramar Mukherjee, 2021. "A hierarchical integrative group least absolute shrinkage and selection operator for analyzing environmental mixtures," Environmetrics, John Wiley & Sons, Ltd., vol. 32(8), December.
    8. Ping Zeng & Yongyue Wei & Yang Zhao & Jin Liu & Liya Liu & Ruyang Zhang & Jianwei Gou & Shuiping Huang & Feng Chen, 2014. "Variable selection approach for zero-inflated count data via adaptive lasso," Journal of Applied Statistics, Taylor & Francis Journals, vol. 41(4), pages 879-894, April.
    9. Aneiros, Germán & Novo, Silvia & Vieu, Philippe, 2022. "Variable selection in functional regression models: A review," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    10. Xiaofei Wu & Rongmei Liang & Hu Yang, 2022. "Penalized and constrained LAD estimation in fixed and high dimension," Statistical Papers, Springer, vol. 63(1), pages 53-95, February.
    11. Abdul Wahid & Dost Muhammad Khan & Ijaz Hussain, 2017. "Robust Adaptive Lasso method for parameter’s estimation and variable selection in high-dimensional sparse models," PLOS ONE, Public Library of Science, vol. 12(8), pages 1-17, August.
    12. Tang, Qingguo & Karunamuni, Rohana J., 2013. "Minimum distance estimation in a finite mixture regression model," Journal of Multivariate Analysis, Elsevier, vol. 120(C), pages 185-204.
    13. Diego Vidaurre & Concha Bielza & Pedro Larrañaga, 2013. "A Survey of L1 Regression," International Statistical Review, International Statistical Institute, vol. 81(3), pages 361-387, December.
    14. Guan Yu & Yufeng Liu, 2016. "Sparse Regression Incorporating Graphical Structure Among Predictors," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(514), pages 707-720, April.
    15. Tutz, Gerhard & Pößnecker, Wolfgang & Uhlmann, Lorenz, 2015. "Variable selection in general multinomial logit models," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 207-222.
    16. Margherita Giuzio, 2017. "Genetic algorithm versus classical methods in sparse index tracking," Decisions in Economics and Finance, Springer;Associazione per la Matematica, vol. 40(1), pages 243-256, November.
    17. Yize Zhao & Matthias Chung & Brent A. Johnson & Carlos S. Moreno & Qi Long, 2016. "Hierarchical Feature Selection Incorporating Known and Novel Biological Information: Identifying Genomic Features Related to Prostate Cancer Recurrence," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(516), pages 1427-1439, October.
    18. Gareth M. James & Peter Radchenko & Jinchi Lv, 2009. "DASSO: connections between the Dantzig selector and lasso," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(1), pages 127-142, January.
    19. Camila Epprecht & Dominique Guegan & Álvaro Veiga & Joel Correa da Rosa, 2017. "Variable selection and forecasting via automated methods for linear models: LASSO/adaLASSO and Autometrics," Post-Print halshs-00917797, HAL.
    20. Wang, Christina Dan & Chen, Zhao & Lian, Yimin & Chen, Min, 2022. "Asset selection based on high frequency Sharpe ratio," Journal of Econometrics, Elsevier, vol. 227(1), pages 168-188.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:aistmt:v:70:y:2018:i:3:d:10.1007_s10463-017-0602-4. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.