Author
Listed:
- Kazuyuki Matsumoto
(Graduate School of Technology, Industrial and Social Sciences, Tokushima University, Minamijosanjima-Cho 2-1, Tokushima-Shi 770-8506, Japan)
- Minoru Yoshida
(Graduate School of Technology, Industrial and Social Sciences, Tokushima University, Minamijosanjima-Cho 2-1, Tokushima-Shi 770-8506, Japan)
- Chikaho Karino
(Graduate School of Sciences and Technology for Innovation, Tokushima University, Minamijosanjima-Cho 2-1, Tokushima-Shi 770-8506, Japan)
Abstract
Lifestyle-related diseases such as diabetes are closely influenced by daily habits, yet the complex interactions between lifestyle factors and blood glucose variation remain insufficiently quantified. This study proposes a natural language processing (NLP) framework that analyzes long-form illness blogs to identify lifestyle factors associated with elevated blood glucose levels. Diabetes-related narratives were collected from a Japanese illness blog portal (TOBYO) and processed through GPT-4o-based automated labeling, BERT-series contextual embeddings, and LightGBM classification. For Type 2 Diabetes classification, the model achieved an F1-score of 0.73 using JMedRoBERTa embeddings, outperforming baseline models (BERT = 0.70; Twitter-RoBERTa = 0.65). Key factors contributing to glucose elevation were identified through feature importance analysis, with dietary behavior, lack of exercise, poor sleep, and stress emerging as major contributors. These findings demonstrate the potential of combining large language models with structured machine learning to extract health-relevant knowledge from patient narratives. The proposed approach contributes to preventive healthcare by offering interpretable, data-driven insights into lifestyle–glycemic relationships, and provides a foundation for personalized diabetes risk monitoring and AI-based health management applications.
Suggested Citation
Kazuyuki Matsumoto & Minoru Yoshida & Chikaho Karino, 2026.
"Mining Patient Narratives to Analyze Lifestyle–Blood Glucose Relationships: An LLM-Based Text Mining Framework,"
J, MDPI, vol. 9(2), pages 1-19, May.
Handle:
RePEc:gam:jjopen:v:9:y:2026:i:2:p:14-:d:1941465
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jjopen:v:9:y:2026:i:2:p:14-:d:1941465. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.