Author
Listed:
- Mei Yang
(School of Information Science and Technology, Hainan Normal University, Meilan District Guilinyang University Town, Haikou 571127, China
Anhui Jianyang Information Technology Co., Ltd., No. 8, Pihe Road, Sanli’an Street, Shushan District, Hefei 230031, China)
- Tao Wang
(School of Information Science and Technology, Hainan Normal University, Meilan District Guilinyang University Town, Haikou 571127, China)
- Yuchun Li
(School of Information Science and Technology, Hainan Normal University, Meilan District Guilinyang University Town, Haikou 571127, China)
- Shu Xu
(Science Island Branch, University of Science and Technology of China, Hefei 230026, China
Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230031, China)
Abstract
Background: Understanding how Regional Innovation Systems (RISs) drive innovation outputs remains a central question in innovation studies. Most existing empirical research relies on linear or single-indicator models, which fail to capture nonlinear interactions among the key RIS dimensions—Firms, Knowledge, Government, and Economy. Methodology: This study proposes an integrated analytical framework that combines Confirmatory Factor Analysis (CFA), CatBoost machine learning, and SHAP-based explainability to bridge theory-driven modeling with data-driven prediction. Using provincial panel data from China spanning 2011–2023, CFA is first employed to construct and validate four latent RIS dimensions. These latent constructs are then used as inputs in a CatBoost model to predict regional patent outputs, followed by SHAP analysis to quantify the marginal and interactive contributions of each dimension. Results: The CFA results confirm the reliability and validity of the four latent dimensions, establishing a robust structural foundation for the RIS. The CatBoost model achieves high predictive accuracy (log-transformed R 2 = 0.975, RMSE = 0.206), substantially outperforming traditional linear benchmarks. SHAP analysis indicates that the Firm dimension is the primary driver of innovation output, while Knowledge, Government, and Economy dimensions exhibit context-dependent moderating effects characterized by diminishing returns, threshold effects, and nonlinear synergies. Conclusions: By integrating latent-variable modeling with interpretable machine learning, this study develops a “CFA-CatBoost-SHAP” closed-loop paradigm for transparent and high-precision analysis of innovation mechanisms. This approach advances RIS theory by empirically validating its multidimensional structure, enriches the methodological toolkit for innovation research, and provides actionable insights for the design of targeted R&D and innovation policies.
Suggested Citation
Mei Yang & Tao Wang & Yuchun Li & Shu Xu, 2025.
"Bridging Theory and Data: Linking Regional Innovation System Dimensions to Patent Outcomes Through CFA-CatBoost Integration,"
Sustainability, MDPI, vol. 17(24), pages 1-18, December.
Handle:
RePEc:gam:jsusta:v:17:y:2025:i:24:p:11211-:d:1817982
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jsusta:v:17:y:2025:i:24:p:11211-:d:1817982. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.