Author
Listed:
- Yu Ding
- Virend K. Somers
- Bing Si
Abstract
The increasing availability of health data from resources such as large biobanks, electronic healthcare records, medical tests, and wearable sensors, has set the stage for the development of novel Machine Learning (ML) models for multi-modal mixed-type data to capture the complexity of human health and disease. Clustering is a type of ML model that aims to identify homogenous subgroups from heterogeneous data, providing a data-driven solution to targeted, subgroup-specific study and intervention. While such data contain diverse and complementary information to facilitate decision making and improve population health, clustering of high-dimensional multi-modal mixed-type data poses major challenges to existing ML and statistical models. We propose a novel Multi-modal Mixed-type Structural Equation Model (M2-SEM) with structured sparsity to cluster heterogeneous health data for precise subgroup discovery. To accommodate a mix of continuous and categorical data modalities, we developed a novel Gauss-Hermite-enabled Expectation-Majorization-Minimization (GH-EMM) algorithm that integrates the GH quadrature and the Majorization Maximization (MM) algorithm within the Expectation Maximization (EM) framework for efficient model estimation. The proposed M2-SEM and GH-EMM are first tested in extensive simulation studies in comparison with benchmarks, and then applied to identify subgroups of individuals with low- and high-risk of developing adverse CardioMetabolic (CM) outcomes based on a full spectrum of CM risk factors such as poor nutrition and mental health, physical inactivity, and sleep deprivation. These findings shed light on the promise of using multi-modal mixed-type health data for early identification and targeted intervention of at-risk individuals for health promotion at the population level.
Suggested Citation
Yu Ding & Virend K. Somers & Bing Si, 2025.
"Multi-modal mixed-type structural equation modeling with structured sparsity for subgroup discovery from heterogeneous health data,"
IISE Transactions, Taylor & Francis Journals, vol. 57(12), pages 1497-1511, December.
Handle:
RePEc:taf:uiiexx:v:57:y:2025:i:12:p:1497-1511
DOI: 10.1080/24725854.2024.2445776
Download full text from publisher
As the access to this document is restricted, you may want to
for a different version of it.
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:taf:uiiexx:v:57:y:2025:i:12:p:1497-1511. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Longhurst (email available below). General contact details of provider: http://www.tandfonline.com/uiie .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.