IDEAS home Printed from https://ideas.repec.org/a/spr/advdac/v17y2023i1d10.1007_s11634-021-00489-w.html
   My bibliography  Save this article

Clusterwise elastic-net regression based on a combined information criterion

Author

Listed:
  • Xavier Bry

    (University of Montpellier, IMAG)

  • Ndèye Niang

    (CEDRIC CNAM)

  • Thomas Verron

    (DANAIS)

  • Stéphanie Bougeard

    (Anses (French Agency for Food, Environmental and Occupational Health Safety))

Abstract

Many research questions pertain to a regression problem assuming that the population under study is not homogeneous with respect to the underlying model. In this setting, we propose an original method called Combined Information criterion CLUSterwise elastic-net regression (Ciclus). This method handles several methodological and application-related challenges. It is derived from both the information theory and the microeconomic utility theory and maximizes a well-defined criterion combining three weighted sub-criteria, each being related to a specific aim: getting a parsimonious partition, compact clusters for a better prediction of cluster-membership, and a good within-cluster regression fit. The solving algorithm is monotonously convergent, under mild assumptions. The Ciclus principle provides an innovative solution to two key issues: (i) the automatic optimization of the number of clusters, (ii) the proposal of a prediction model. We applied it to elastic-net regression in order to be able to manage high-dimensional data involving redundant explanatory variables. Ciclus is illustrated through both a simulation study and a real example in the field of omic data, showing how it improves the quality of the prediction and facilitates the interpretation. It should therefore prove useful whenever the data involve a population mixture as for example in biology, social sciences, economics or marketing.

Suggested Citation

  • Xavier Bry & Ndèye Niang & Thomas Verron & Stéphanie Bougeard, 2023. "Clusterwise elastic-net regression based on a combined information criterion," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 17(1), pages 75-107, March.
  • Handle: RePEc:spr:advdac:v:17:y:2023:i:1:d:10.1007_s11634-021-00489-w
    DOI: 10.1007/s11634-021-00489-w
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11634-021-00489-w
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11634-021-00489-w?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Frédéric Mortier & Dakis‐Yaoba Ouédraogo & Florian Claeys & Mahlet G. Tadesse & Guillaume Cornu & Fidèle Baya & Fabrice Benedet & Vincent Freycon & Sylvie Gourlet‐Fleury & Nicolas Picard, 2015. "Mixture of inhomogeneous matrix models for species‐rich ecosystems," Environmetrics, John Wiley & Sons, Ltd., vol. 26(1), pages 39-51, February.
    2. Heungsun Hwang & Wayne Desarbo & Yoshio Takane, 2007. "Fuzzy Clusterwise Generalized Structured Component Analysis," Psychometrika, Springer;The Psychometric Society, vol. 72(2), pages 181-198, June.
    3. Hye Suk & Heungsun Hwang, 2010. "Regularized fuzzy clusterwise ridge regression," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 4(1), pages 35-51, April.
    4. Preda, C. & Saporta, G., 2005. "Clusterwise PLS regression on a stochastic process," Computational Statistics & Data Analysis, Elsevier, vol. 49(1), pages 99-108, April.
    5. Friedman, Jerome H. & Hastie, Trevor & Tibshirani, Rob, 2010. "Regularization Paths for Generalized Linear Models via Coordinate Descent," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 33(i01).
    6. Leisch, Friedrich, 2004. "FlexMix: A General Framework for Finite Mixture Models and Latent Class Regression in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 11(i08).
    7. S. Bougeard & V. Cariou & G. Saporta & N. Niang, 2018. "Prediction for regularized clusterwise multiblock regression," Applied Stochastic Models in Business and Industry, John Wiley & Sons, vol. 34(6), pages 852-867, November.
    8. Lawrence Hubert & Phipps Arabie, 1985. "Comparing partitions," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 193-218, December.
    9. Lê Cao Kim-Anh & Rossouw Debra & Robert-Granié Christèle & Besse Philippe, 2008. "A Sparse PLS for Variable Selection when Integrating Omics Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 7(1), pages 1-32, November.
    10. Wayne DeSarbo & William Cron, 1988. "A maximum likelihood methodology for clusterwise linear regression," Journal of Classification, Springer;The Classification Society, vol. 5(2), pages 249-282, September.
    11. Ahonen, Ilmari & Nevalainen, Jaakko & Larocque, Denis, 2019. "Prediction with a flexible finite mixture-of-regressions," Computational Statistics & Data Analysis, Elsevier, vol. 132(C), pages 212-224.
    12. Christophe Biernacki & Luis Angel García-Escudero & Salvatore Ingrassia, 2020. "Special issue on “Innovations on model based clustering and classification”," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 14(2), pages 231-234, June.
    13. Tom Frans Wilderjans & Eva Gaer & Henk A. L. Kiers & Iven Mechelen & Eva Ceulemans, 2017. "Principal Covariates Clusterwise Regression (PCCR): Accounting for Multicollinearity and Population Heterogeneity in Hierarchically Organized Data," Psychometrika, Springer;The Psychometric Society, vol. 82(1), pages 86-111, March.
    14. Preda, C. & Saporta, G., 2005. "PLS regression on a stochastic process," Computational Statistics & Data Analysis, Elsevier, vol. 48(1), pages 149-158, January.
    15. Hui Zou & Trevor Hastie, 2005. "Addendum: Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(5), pages 768-768, November.
    16. Charrad, Malika & Ghazzali, Nadia & Boiteau, Véronique & Niknafs, Azam, 2014. "NbClust: An R Package for Determining the Relevant Number of Clusters in a Data Set," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 61(i06).
    17. Hui Zou & Trevor Hastie, 2005. "Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(2), pages 301-320, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Stéphanie Bougeard & Hervé Abdi & Gilbert Saporta & Ndèye Niang, 2018. "Clusterwise analysis for multiblock component methods," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(2), pages 285-313, June.
    2. Dmitry Kobak & Yves Bernaerts & Marissa A. Weis & Federico Scala & Andreas S. Tolias & Philipp Berens, 2021. "Sparse reduced‐rank regression for exploratory visualisation of paired multivariate data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 70(4), pages 980-1000, August.
    3. Joki, Kaisa & Bagirov, Adil M. & Karmitsa, Napsu & Mäkelä, Marko M. & Taheri, Sona, 2020. "Clusterwise support vector linear regression," European Journal of Operational Research, Elsevier, vol. 287(1), pages 19-35.
    4. Luo, Ruiyan & Qi, Xin, 2015. "Sparse wavelet regression with multiple predictive curves," Journal of Multivariate Analysis, Elsevier, vol. 134(C), pages 33-49.
    5. Tianyu Tan & Hye Suk & Heungsun Hwang & Jooseop Lim, 2013. "Functional fuzzy clusterwise regression analysis," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 7(1), pages 57-82, March.
    6. Tutz, Gerhard & Pößnecker, Wolfgang & Uhlmann, Lorenz, 2015. "Variable selection in general multinomial logit models," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 207-222.
    7. Mkhadri, Abdallah & Ouhourane, Mohamed, 2013. "An extended variable inclusion and shrinkage algorithm for correlated variables," Computational Statistics & Data Analysis, Elsevier, vol. 57(1), pages 631-644.
    8. Susan Athey & Guido W. Imbens & Stefan Wager, 2018. "Approximate residual balancing: debiased inference of average treatment effects in high dimensions," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 80(4), pages 597-623, September.
    9. Christopher J Greenwood & George J Youssef & Primrose Letcher & Jacqui A Macdonald & Lauryn J Hagg & Ann Sanson & Jenn Mcintosh & Delyse M Hutchinson & John W Toumbourou & Matthew Fuller-Tyszkiewicz &, 2020. "A comparison of penalised regression methods for informing the selection of predictive markers," PLOS ONE, Public Library of Science, vol. 15(11), pages 1-14, November.
    10. Immanuel Bayer & Philip Groth & Sebastian Schneckener, 2013. "Prediction Errors in Learning Drug Response from Gene Expression Data – Influence of Labeling, Sample Size, and Machine Learning Algorithm," PLOS ONE, Public Library of Science, vol. 8(7), pages 1-13, July.
    11. Mostafa Rezaei & Ivor Cribben & Michele Samorani, 2021. "A clustering-based feature selection method for automatically generated relational attributes," Annals of Operations Research, Springer, vol. 303(1), pages 233-263, August.
    12. Gustavo A. Alonso-Silverio & Víctor Francisco-García & Iris P. Guzmán-Guzmán & Elías Ventura-Molina & Antonio Alarcón-Paredes, 2021. "Toward Non-Invasive Estimation of Blood Glucose Concentration: A Comparative Performance," Mathematics, MDPI, vol. 9(20), pages 1-13, October.
    13. Christopher Kath & Florian Ziel, 2018. "The value of forecasts: Quantifying the economic gains of accurate quarter-hourly electricity price forecasts," Papers 1811.08604, arXiv.org.
    14. Karim Barigou & Stéphane Loisel & Yahia Salhi, 2020. "Parsimonious Predictive Mortality Modeling by Regularization and Cross-Validation with and without Covid-Type Effect," Risks, MDPI, vol. 9(1), pages 1-18, December.
    15. Gurgul Henryk & Machno Artur, 2017. "Trade Pattern on Warsaw Stock Exchange and Prediction of Number of Trades," Statistics in Transition New Series, Polish Statistical Association, vol. 18(1), pages 91-114, March.
    16. Michael Funke & Kadri Männasoo & Helery Tasane, 2023. "Regional Economic Impacts of the Øresund Cross-Border Fixed Link: Cui Bono?," CESifo Working Paper Series 10557, CESifo.
    17. Camila Epprecht & Dominique Guegan & Álvaro Veiga & Joel Correa da Rosa, 2017. "Variable selection and forecasting via automated methods for linear models: LASSO/adaLASSO and Autometrics," Post-Print halshs-00917797, HAL.
    18. Zichen Zhang & Ye Eun Bae & Jonathan R. Bradley & Lang Wu & Chong Wu, 2022. "SUMMIT: An integrative approach for better transcriptomic data imputation improves causal gene identification," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    19. Štefan Lyócsa & Petra Vašaničová & Branka Hadji Misheva & Marko Dávid Vateha, 2022. "Default or profit scoring credit systems? Evidence from European and US peer-to-peer lending markets," Financial Innovation, Springer;Southwestern University of Finance and Economics, vol. 8(1), pages 1-21, December.
    20. Peter Bühlmann & Jacopo Mandozzi, 2014. "High-dimensional variable screening and bias in subsequent inference, with an empirical comparison," Computational Statistics, Springer, vol. 29(3), pages 407-430, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:advdac:v:17:y:2023:i:1:d:10.1007_s11634-021-00489-w. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.