IDEAS home Printed from https://ideas.repec.org/a/wly/jjmath/v2025y2025i1n6643152.html

Bridging the Semantic Gap: An Ensemble Learning Framework With Textual Topic‐Raw Financial Feature Fusion to Enhance Fraud Detection in Chinese Markets

Author

Listed:
  • Congying Wei
  • Xiyuan Qian

Abstract

With the increasing complexity of financial statement manipulation and severe class imbalance issue, the growing complexity of financial fraud detection systems has revealed limitations in conventional approaches that rely exclusively on quantitative financial data and traditional machine learning algorithms. To overcome these constraints, we propose an enhanced financial fraud detection model that leverages advanced ensemble learning classifiers on combined features, comprising both textual information extracted from annual reports through natural language processing techniques and structured financial data from corporate statements. Utilizing a dataset of Chinese manufacturing firms listed between 2010 and 2019, we integrate textual topic indicators derived from the latent Dirichlet allocation (LDA) model with raw financial items to construct a comprehensive fraud detection system. Empirical results demonstrate the superiority of combined textual and financial indicators, which achieves significant improvements, with AUC increasing +1.5% for RUSBoost and +1.6% for XGBoost, alongside 4.5% and 3.8% NDCG@K gains (p

Suggested Citation

  • Congying Wei & Xiyuan Qian, 2025. "Bridging the Semantic Gap: An Ensemble Learning Framework With Textual Topic‐Raw Financial Feature Fusion to Enhance Fraud Detection in Chinese Markets," Journal of Mathematics, John Wiley & Sons, vol. 2025(1).
  • Handle: RePEc:wly:jjmath:v:2025:y:2025:i:1:n:6643152
    DOI: 10.1155/jom/6643152
    as

    Download full text from publisher

    File URL: https://doi.org/10.1155/jom/6643152
    Download Restriction: no

    File URL: https://libkey.io/10.1155/jom/6643152?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Lynnette Purda & David Skillicorn, 2015. "Accounting Variables, Deception, and a Bag of Words: Assessing the Tools of Fraud Detection," Contemporary Accounting Research, John Wiley & Sons, vol. 32(3), pages 1193-1223, September.
    2. Li, Jing & Li, Nan & Xia, Tongshui & Guo, Jinjin, 2023. "Textual analysis and detection of financial fraud: Evidence from Chinese manufacturing firms," Economic Modelling, Elsevier, vol. 126(C).
    3. Kim, Soo Y. & Upneja, Arun, 2014. "Predicting restaurant financial distress using decision tree and AdaBoosted decision tree models," Economic Modelling, Elsevier, vol. 36(C), pages 354-362.
    4. Md Jahidur Rahman & Hongtao Zhu, 2023. "Predicting accounting fraud using imbalanced ensemble learning classifiers – evidence from China," Accounting and Finance, Accounting and Finance Association of Australia and New Zealand, vol. 63(3), pages 3455-3486, September.
    5. Zhang, Tianjiao & Zhu, Weidong & Wu, Yong & Wu, Zihao & Zhang, Chao & Hu, Xue, 2023. "An explainable financial risk early warning model based on the DS-XGBoost model," Finance Research Letters, Elsevier, vol. 56(C).
    6. Healy, Paul M. & Palepu, Krishna G., 2001. "Information asymmetry, corporate disclosure, and the capital markets: A review of the empirical disclosure literature," Journal of Accounting and Economics, Elsevier, vol. 31(1-3), pages 405-440, September.
    7. Zhang, Yi & Hu, Ailing & Wang, Jiahua & Zhang, Yaojie, 2022. "Detection of fraud statement based on word vector: Evidence from financial companies in China," Finance Research Letters, Elsevier, vol. 46(PB).
    8. repec:eme:maj000:02686900410509802 is not listed on IDEAS
    9. Kathleen A. Kaminski & T. Sterling Wetzel & Liming Guan, 2004. "Can financial ratios detect fraudulent financial reporting?," Managerial Auditing Journal, Emerald Group Publishing Limited, vol. 19(1), pages 15-28, January.
    10. Xin‐Ping Song & Zhi‐Hua Hu & Jian‐Guo Du & Zhao‐Han Sheng, 2014. "Application of Machine Learning Methods to Risk Assessment of Financial Statement Fraud: Evidence from China," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 33(8), pages 611-626, December.
    11. Paul C. Tetlock, 2007. "Giving Content to Investor Sentiment: The Role of Media in the Stock Market," Journal of Finance, American Finance Association, vol. 62(3), pages 1139-1168, June.
    12. Tim Loughran & Bill Mcdonald, 2011. "When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10‐Ks," Journal of Finance, American Finance Association, vol. 66(1), pages 35-65, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jiao Ji & Oleksandr Talavera & Shuxing Yin, 2018. "The Hidden Information Content: Evidence from the Tone of Independent Director Reports," Working Papers 2018-28, Swansea University, School of Management.
    2. Xiaoqian Zhu & Huidong Wu & Yanpeng Chang & Jianping Li, 2025. "Accounting fraud detection through textual risk disclosures in annual reports: From the perspective of SEC guidelines," Accounting and Finance, Accounting and Finance Association of Australia and New Zealand, vol. 65(2), pages 1837-1862, June.
    3. Zhang, Zejun & Wang, Zhao & Cai, Lixin, 2025. "Predicting financial fraud in Chinese listed companies: An enterprise portrait and machine learning approach," Pacific-Basin Finance Journal, Elsevier, vol. 90(C).
    4. Beattie, Vivien, 2014. "Accounting narratives and the narrative turn in accounting research: Issues, theory, methodology, methods and a research framework," The British Accounting Review, Elsevier, vol. 46(2), pages 111-134.
    5. Yunchuan Sun & Xiaoping Zeng & Ying Xu & Hong Yue & Xipu Yu, 2024. "An intelligent detecting model for financial frauds in Chinese A‐share market," Economics and Politics, Wiley Blackwell, vol. 36(2), pages 1110-1136, July.
    6. Jie Hao & Viet T. Pham, 2022. "COVID‐19 Disclosures and Market Uncertainty: Evidence from 10‐Q Filings," Australian Accounting Review, CPA Australia, vol. 32(2), pages 238-266, June.
    7. Tsai, Feng-Tse & Lu, Hsin-Min & Hung, Mao-Wei, 2016. "The impact of news articles and corporate disclosure on credit risk valuation," Journal of Banking & Finance, Elsevier, vol. 68(C), pages 100-116.
    8. Agapova, Anna & Volkov, Nikanor, 2019. "Guidance on strategic information: Investor-management disagreement and firm intrinsic value," Journal of Banking & Finance, Elsevier, vol. 108(C).
    9. Price, S. McKay & Doran, James S. & Peterson, David R. & Bliss, Barbara A., 2012. "Earnings conference calls and stock returns: The incremental informativeness of textual tone," Journal of Banking & Finance, Elsevier, vol. 36(4), pages 992-1011.
    10. Wanli Li & Tiantian Yan & Yue Li & Ziqiao Yan, 2023. "Earnings management and CSR report tone: Evidence from China," Corporate Social Responsibility and Environmental Management, John Wiley & Sons, vol. 30(4), pages 1883-1902, July.
    11. Elias Zavitsanos & Dimitris Mavroeidis & Konstantinos Bougiatiotis & Eirini Spyropoulou & Lefteris Loukas & Georgios Paliouras, 2023. "Financial misstatement detection: a realistic evaluation," Papers 2305.17457, arXiv.org.
    12. Aman, Hiroyuki & Moriyasu, Hiroshi, 2022. "Effect of corporate disclosure and press media on market liquidity: Evidence from Japan," International Review of Financial Analysis, Elsevier, vol. 82(C).
    13. Zeng, Xiao & Xin, Yingying & Xiang, Kai, 2025. "The spillover effect of customer annual report tone on supplier ESG decisions," International Review of Financial Analysis, Elsevier, vol. 107(C).
    14. Dennis W. Campbell & Ruidi Shang, 2022. "Tone at the Bottom: Measuring Corporate Misconduct Risk from the Text of Employee Reviews," Management Science, INFORMS, vol. 68(9), pages 7034-7053, September.
    15. Dejian Yu & Bo Xiang, 2025. "Customized integrated decision model for CBEC enterprise credit evaluation: The fusion of multi-source features and machine learning," Electronic Markets, Springer;IIM University of St. Gallen, vol. 35(1), pages 1-19, December.
    16. Dutta, Shantanu & Fuksa, Michel & Macaulay, Ken, 2019. "Determinants of MD&A sentiment in Canada," International Review of Economics & Finance, Elsevier, vol. 60(C), pages 130-148.
    17. Li, Jing & Li, Nan & Xia, Tongshui & Guo, Jinjin, 2023. "Textual analysis and detection of financial fraud: Evidence from Chinese manufacturing firms," Economic Modelling, Elsevier, vol. 126(C).
    18. Tim Loughran & Bill Mcdonald, 2016. "Textual Analysis in Accounting and Finance: A Survey," Journal of Accounting Research, John Wiley & Sons, Ltd., vol. 54(4), pages 1187-1230, September.
    19. Lu Zhang & Hanlu Fan & Junru Zhang & Yuan George Shan & Qingliang Tang, 2025. "Climate Risk and Stock Price Crash: The Role of Text‐Based Carbon Disclosure," Accounting and Finance, Accounting and Finance Association of Australia and New Zealand, vol. 65(3), pages 3107-3141, September.
    20. Bilal Hafeez & M. Humayun Kabir & Udomsak Wongchoti, 2022. "Are retail investors really passive? Shareholder activism in the digital age," Journal of Business Finance & Accounting, Wiley Blackwell, vol. 49(3-4), pages 423-460, March.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:wly:jjmath:v:2025:y:2025:i:1:n:6643152. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://onlinelibrary.wiley.com/journal/1469 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.