Author
Listed:
- Evan Gorstein
- Rosa Aghdam
- Claudia Solís-Lemus
Abstract
High-dimensional mixed-effects models are an increasingly important form of regression in which the number of covariates rivals or exceeds the number of samples, which are collected in groups or clusters. The penalized likelihood approach to fitting these models relies on a coordinate descent algorithm that lacks guarantees of convergence to a global optimum. Here, we empirically study the behavior of this algorithm on simulated and real examples of three types of data that are common in modern biology: transcriptome, genome-wide association, and microbiome data. Our simulations provide new insights into the algorithm’s behavior in these settings, and, comparing the performance of two popular penalties, we demonstrate that the smoothly clipped absolute deviation (SCAD) penalty consistently outperforms the least absolute shrinkage and selection operator (LASSO) penalty in terms of both variable selection and estimation accuracy across omics data. To empower researchers in biology and other fields to fit models with the SCAD penalty, we implement the algorithm in a Julia package, HighDimMixedModels.jl.Author summary: High-dimensional, clustered data are increasingly common in modern omics. In our study, we focus on the penalized likelihood approach to fitting mixed-effects models to these data, employing a coordinate descent (CD) algorithm to minimize the objective function. Although CD is a common optimization scheme, its convergence in this setting lacks guarantees, prompting our empirical investigation of its behavior when applied to transcriptome, genome-wide association, and microbiome datasets. We evaluate the model and algorithm’s performance on simulations of these studies and subsequently apply it to real examples of each. To help facilitate the practical application of these models and further research, we have implemented the algorithm in an open-source Julia package, HighDimMixedModels.jl. This package provides implementations of both the least absolute shrinkage and selection operator (LASSO) and the smoothly clipped absolute deviation (SCAD) penalty, and having tested its performance on various omics data sets, we hope that it offers a user-friendly solution for researchers in biology.
Suggested Citation
Evan Gorstein & Rosa Aghdam & Claudia Solís-Lemus, 2025.
"HighDimMixedModels.jl: Robust high-dimensional mixed-effects models across omics data,"
PLOS Computational Biology, Public Library of Science, vol. 21(1), pages 1-28, January.
Handle:
RePEc:plo:pcbi00:1012143
DOI: 10.1371/journal.pcbi.1012143
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1012143. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.