IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v149y2020ics0167947320300426.html
   My bibliography  Save this article

The LASSO on latent indices for regression modeling with ordinal categorical predictors

Author

Listed:
  • Hui, Francis K.C.
  • Müller, Samuel
  • Welsh, A.H.

Abstract

Many applications of regression models involve ordinal categorical predictors. Two common approaches for handling ordinal predictors are to form a set of dummy variables, or employ a two stage approach where dimension reduction is first applied and then the response is regressed against the predicted latent indices. Both approaches have drawbacks, with the former running into a high-dimensional problem especially if interactions are considered, while the latter separates the prediction of the latent indices from the construction of the regression model. To overcome these challenges, a new approach called the LASSO on Latent Indices (LoLI) for handling ordinal predictors in regression is proposed, which involves jointly constructing latent indices for each or for groups of ordinal predictors and modeling the response directly as a function of these. LoLI borrows strength from the response to more accurately predict the latent indices, leading to better estimation of the corresponding effects. Furthermore, LoLI incorporates a LASSO type penalty to perform hierarchical selection, with interaction terms selected only if both parent main effects are included. Simulations show that LoLI can outperform the dummy variable and two stage approaches in selection and prediction performance. Applying LoLI to an Australian household-based panel identified three dimensions of psychosocial workplace quality (job demands, stress, and security) which affect an individual’s mental health in an additive and pairwise interactive manner.

Suggested Citation

  • Hui, Francis K.C. & Müller, Samuel & Welsh, A.H., 2020. "The LASSO on latent indices for regression modeling with ordinal categorical predictors," Computational Statistics & Data Analysis, Elsevier, vol. 149(C).
  • Handle: RePEc:eee:csdana:v:149:y:2020:i:c:s0167947320300426
    DOI: 10.1016/j.csda.2020.106951
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947320300426
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2020.106951?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Zou, Hui, 2006. "The Adaptive Lasso and Its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1418-1429, December.
    2. Ruud, Paul A., 1991. "Extensions of estimation methods using the EM algorithm," Journal of Econometrics, Elsevier, vol. 49(3), pages 305-341, September.
    3. Pötscher, Benedikt M. & Schneider, Ulrike, 2007. "On the distribution of the adaptive LASSO estimator," MPRA Paper 6913, University Library of Munich, Germany.
    4. Zhixuan Fu & Chirag R. Parikh & Bingqing Zhou, 2017. "Penalized variable selection in competing risks regression," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 23(3), pages 353-376, July.
    5. Francis K. C. Hui & Emi Tanaka & David I. Warton, 2018. "Order selection and sparsity in latent variable models via the ordered factor LASSO," Biometrics, The International Biometric Society, vol. 74(4), pages 1311-1319, December.
    6. Zhang, Yiyun & Li, Runze & Tsai, Chih-Ling, 2010. "Regularization Parameter Selections via Generalized Information Criterion," Journal of the American Statistical Association, American Statistical Association, vol. 105(489), pages 312-323.
    7. Pötscher, Benedikt M. & Leeb, Hannes, 2009. "On the distribution of penalized maximum likelihood estimators: The LASSO, SCAD, and thresholding," Journal of Multivariate Analysis, Elsevier, vol. 100(9), pages 2065-2082, October.
    8. Joe, Harry, 2005. "Asymptotic efficiency of the two-stage estimation method for copula-based models," Journal of Multivariate Analysis, Elsevier, vol. 94(2), pages 401-419, June.
    9. Michel Wedel & Wagner Kamakura, 2001. "Factor analysis with (mixed) observed and latent variables in the exponential family," Psychometrika, Springer;The Psychometric Society, vol. 66(4), pages 515-530, December.
    10. Francis K. C. Hui & David I. Warton & Scott D. Foster, 2015. "Tuning Parameter Selection for the Adaptive Lasso Using ERIC," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(509), pages 262-269, March.
    11. Ming Yuan & Yi Lin, 2006. "Model selection and estimation in regression with grouped variables," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 68(1), pages 49-67, February.
    12. Zhixuan Fu & Shuangge Ma & Haiqun Lin & Chirag R. Parikh & Bingqing Zhou, 2017. "Penalized Variable Selection for Multi-center Competing Risks Data," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 9(2), pages 379-405, December.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Zhang, Xuanming & Huang, Fei & Hui, Francis K.C. & Haberman, Steven, 2023. "Cause-of-death mortality forecasting using adaptive penalized tensor decompositions," Insurance: Mathematics and Economics, Elsevier, vol. 111(C), pages 193-213.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Zhixuan Fu & Chirag R. Parikh & Bingqing Zhou, 2017. "Penalized variable selection in competing risks regression," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 23(3), pages 353-376, July.
    2. Fei Jin & Lung-fei Lee, 2018. "Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices," Econometrics, MDPI, vol. 6(1), pages 1-24, February.
    3. David Cheng & Abhishek Chakrabortty & Ashwin N. Ananthakrishnan & Tianxi Cai, 2020. "Estimating average treatment effects with a double‐index propensity score," Biometrics, The International Biometric Society, vol. 76(3), pages 767-777, September.
    4. Marcelo C. Medeiros & Eduardo F. Mendes, 2015. "l1-Regularization of High-Dimensional Time-Series Models with Flexible Innovations," Textos para discussão 636, Department of Economics PUC-Rio (Brazil).
    5. Jin, Fei & Lee, Lung-fei, 2018. "Irregular N2SLS and LASSO estimation of the matrix exponential spatial specification model," Journal of Econometrics, Elsevier, vol. 206(2), pages 336-358.
    6. Ulrike Schneider, 2016. "Confidence Sets Based on Thresholding Estimators in High-Dimensional Gaussian Regression Models," Econometric Reviews, Taylor & Francis Journals, vol. 35(8-10), pages 1412-1455, December.
    7. Max H. Farrell, 2013. "Robust Inference on Average Treatment Effects with Possibly More Covariates than Observations," Papers 1309.4686, arXiv.org, revised Feb 2018.
    8. Hui Xiao & Yiguo Sun, 2019. "On Tuning Parameter Selection in Model Selection and Model Averaging: A Monte Carlo Study," JRFM, MDPI, vol. 12(3), pages 1-16, June.
    9. Daniel, Jeffrey & Horrocks, Julie & Umphrey, Gary J., 2018. "Penalized composite likelihoods for inhomogeneous Gibbs point process models," Computational Statistics & Data Analysis, Elsevier, vol. 124(C), pages 104-116.
    10. Mehrabani, Ali, 2023. "Estimation and identification of latent group structures in panel data," Journal of Econometrics, Elsevier, vol. 235(2), pages 1464-1482.
    11. Holter, Julia C. & Stallrich, Jonathan W., 2023. "Tuning parameter selection for penalized estimation via R2," Computational Statistics & Data Analysis, Elsevier, vol. 183(C).
    12. Qian, Junhui & Su, Liangjun, 2016. "Shrinkage estimation of common breaks in panel data models via adaptive group fused Lasso," Journal of Econometrics, Elsevier, vol. 191(1), pages 86-109.
    13. Lu, Xun & Su, Liangjun, 2016. "Shrinkage estimation of dynamic panel data models with interactive fixed effects," Journal of Econometrics, Elsevier, vol. 190(1), pages 148-175.
    14. Xianyi Wu & Xian Zhou, 2019. "On Hodges’ superefficiency and merits of oracle property in model selection," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 71(5), pages 1093-1119, October.
    15. Gabriel E Hoffman & Benjamin A Logsdon & Jason G Mezey, 2013. "PUMA: A Unified Framework for Penalized Multiple Regression Analysis of GWAS Data," PLOS Computational Biology, Public Library of Science, vol. 9(6), pages 1-19, June.
    16. Farrell, Max H., 2015. "Robust inference on average treatment effects with possibly more covariates than observations," Journal of Econometrics, Elsevier, vol. 189(1), pages 1-23.
    17. Pötscher, Benedikt M. & Schneider, Ulrike, 2008. "Confidence sets based on penalized maximum likelihood estimators," MPRA Paper 9062, University Library of Munich, Germany.
    18. Kaixu Yang & Tapabrata Maiti, 2022. "Ultrahigh‐dimensional generalized additive model: Unified theory and methods," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 49(3), pages 917-942, September.
    19. Bruce E. Hansen, 2016. "The Risk of James--Stein and Lasso Shrinkage," Econometric Reviews, Taylor & Francis Journals, vol. 35(8-10), pages 1456-1470, December.
    20. Eduardo F. Mendes & Gabriel J. P. Pinto, 2023. "Generalized Information Criteria for Structured Sparse Models," Papers 2309.01764, arXiv.org.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:149:y:2020:i:c:s0167947320300426. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.