IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2507.06266.html
   My bibliography  Save this paper

Machine Learning based Enterprise Financial Audit Framework and High Risk Identification

Author

Listed:
  • Tingyu Yuan
  • Xi Zhang
  • Xuanjing Chen

Abstract

In the face of global economic uncertainty, financial auditing has become essential for regulatory compliance and risk mitigation. Traditional manual auditing methods are increasingly limited by large data volumes, complex business structures, and evolving fraud tactics. This study proposes an AI-driven framework for enterprise financial audits and high-risk identification, leveraging machine learning to improve efficiency and accuracy. Using a dataset from the Big Four accounting firms (EY, PwC, Deloitte, KPMG) from 2020 to 2025, the research examines trends in risk assessment, compliance violations, and fraud detection. The dataset includes key indicators such as audit project counts, high-risk cases, fraud instances, compliance breaches, employee workload, and client satisfaction, capturing both audit behaviors and AI's impact on operations. To build a robust risk prediction model, three algorithms - Support Vector Machine (SVM), Random Forest (RF), and K-Nearest Neighbors (KNN) - are evaluated. SVM uses hyperplane optimization for complex classification, RF combines decision trees to manage high-dimensional, nonlinear data with resistance to overfitting, and KNN applies distance-based learning for flexible performance. Through hierarchical K-fold cross-validation and evaluation using F1-score, accuracy, and recall, Random Forest achieves the best performance, with an F1-score of 0.9012, excelling in identifying fraud and compliance anomalies. Feature importance analysis reveals audit frequency, past violations, employee workload, and client ratings as key predictors. The study recommends adopting Random Forest as a core model, enhancing features via engineering, and implementing real-time risk monitoring. This research contributes valuable insights into using machine learning for intelligent auditing and risk management in modern enterprises.

Suggested Citation

  • Tingyu Yuan & Xi Zhang & Xuanjing Chen, 2025. "Machine Learning based Enterprise Financial Audit Framework and High Risk Identification," Papers 2507.06266, arXiv.org.
  • Handle: RePEc:arx:papers:2507.06266
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2507.06266
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Rui Ding, 2022. "Enterprise Intelligent Audit Model by Using Deep Learning Approach," Computational Economics, Springer;Society for Computational Economics, vol. 59(4), pages 1335-1354, April.
    2. Akshit Kurani & Pavan Doshi & Aarya Vakharia & Manan Shah, 2023. "A Comprehensive Comparative Study of Artificial Neural Network (ANN) and Support Vector Machines (SVM) on Stock Forecasting," Annals of Data Science, Springer, vol. 10(1), pages 183-208, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Saima Akhtar & Sulman Shahzad & Asad Zaheer & Hafiz Sami Ullah & Heybet Kilic & Radomir Gono & Michał Jasiński & Zbigniew Leonowicz, 2023. "Short-Term Load Forecasting Models: A Review of Challenges, Progress, and the Road Ahead," Energies, MDPI, vol. 16(10), pages 1-29, May.
    2. Xuecheng He & Jujie Wang, 2024. "A Hybrid Forecasting System Based on Comprehensive Feature Selection and Intelligent Optimization for Stock Price Index Forecasting," Mathematics, MDPI, vol. 12(23), pages 1-27, November.
    3. Murat Tasci & Hidir Duzkaya, 2025. "Estimation of Working Error of Electricity Meter Using Artificial Neural Network (ANN)," Energies, MDPI, vol. 18(5), pages 1-16, March.
    4. Dezheng Zhang & Jing Li & Yonghong Xie & Aziguli Wulamu, 2023. "Research on performance variations of classifiers with the influence of pre-processing methods for Chinese short text classification," PLOS ONE, Public Library of Science, vol. 18(10), pages 1-22, October.
    5. Caixia Wang, 2023. "Optimization of sports effect evaluation technology from random forest algorithm and elastic network algorithm," PLOS ONE, Public Library of Science, vol. 18(10), pages 1-18, October.
    6. Chin Soon Ku & Jiale Xiong & Yen-Lin Chen & Shing Dhee Cheah & Hoong Cheng Soong & Lip Yee Por, 2023. "Improving Stock Market Predictions: An Equity Forecasting Scanner Using Long Short-Term Memory Method with Dynamic Indicators for Malaysia Stock Market," Mathematics, MDPI, vol. 11(11), pages 1-20, May.
    7. Qin, Fuli & Tong, Mingyu & Huang, Ying & Zhang, Yubo, 2024. "Modeling, prediction and analysis of natural gas consumption in China using a novel dynamic nonlinear multivariable grey delay model," Energy, Elsevier, vol. 305(C).
    8. Syed Hasan Jafar & Shakeb Akhtar & Hani El-Chaarani & Parvez Alam Khan & Ruaa Binsaddig, 2023. "Forecasting of NIFTY 50 Index Price by Using Backward Elimination with an LSTM Model," JRFM, MDPI, vol. 16(10), pages 1-23, September.
    9. Jiahao Chen & Xiaofei Li & Junjie Du, 2025. "Analysis of Frequent Trading Effects of Various Machine Learning Models," Computational Economics, Springer;Society for Computational Economics, vol. 65(3), pages 1707-1740, March.
    10. Jin, Ting & Liang, Feiyan & Dong, Xiaoqi & Cao, Xiaojuan, 2023. "Research on land resource management integrated with support vector machine —Based on the perspective of green innovation," Resources Policy, Elsevier, vol. 86(PB).
    11. Adel S. Aldosary & Baqer Al-Ramadan & Abdulla Al Kafy & Hamad Ahmed Altuwaijri & Zullyadini A. Rahaman, 2025. "Forecasting climate risk and heat stress hazards in arid ecosystems: Machine learning and ensemble models for specific humidity prediction in Dammam, Saudi Arabia," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 121(8), pages 9281-9309, May.
    12. Thiago Conte & Roberto Oliveira, 2024. "Comparative Analysis between Intelligent Machine Committees and Hybrid Deep Learning with Genetic Algorithms in Energy Sector Forecasting: A Case Study on Electricity Price and Wind Speed in the Brazi," Energies, MDPI, vol. 17(4), pages 1-31, February.
    13. Mokhtar Jlidi & Oscar Barambones & Faiçal Hamidi & Mohamed Aoun, 2024. "ANN for Temperature and Irradiation Prediction and Maximum Power Point Tracking Using MRP-SMC," Energies, MDPI, vol. 17(12), pages 1-21, June.
    14. Farwah Ali Syed & Kwo-Ting Fang & Adiqa Kausar Kiani & Muhammad Shoaib & Muhammad Asif Zahoor Raja, 2025. "Design of Neuro-Stochastic Bayesian Networks for Nonlinear Chaotic Differential Systems in Financial Mathematics," Computational Economics, Springer;Society for Computational Economics, vol. 65(1), pages 241-270, January.
    15. Minsuk Song & Ryun-Han Koo & Jangsaeng Kim & Chang-Hyeon Han & Jiyong Yim & Jonghyun Ko & Sijung Yoo & Duk-hyun Choe & Sangwook Kim & Wonjun Shin & Daewoong Kwon, 2025. "Ferroelectric NAND for efficient hardware bayesian neural networks," Nature Communications, Nature, vol. 16(1), pages 1-14, December.
    16. Pulikandala Nithish Kumar & Nneka Umeorah & Alex Alochukwu, 2024. "Dynamic graph neural networks for enhanced volatility prediction in financial markets," Papers 2410.16858, arXiv.org.
    17. Asma Fekih, 2025. "A conceptual framework for integrating Robotic Process Automation in logistics audits and supply chain management [Un cadre conceptuel pour l'intégration de l'automatisation des processus robotisés," Post-Print hal-05128257, HAL.
    18. Agnieszka Wawrzyniak & Andrzej Przybylak & Piotr Boniecki & Agnieszka Sujak & Maciej Zaborowicz, 2023. "Neural Modelling in the Study of the Relationship between Herd Structure, Amount of Manure and Slurry Produced, and Location of Herds in Poland," Agriculture, MDPI, vol. 13(7), pages 1-13, July.
    19. Peng, Yaohao & de Moraes Souza, João Gabriel, 2024. "Chaos, overfitting and equilibrium: To what extent can machine learning beat the financial market?," International Review of Financial Analysis, Elsevier, vol. 95(PB).
    20. Kun Yang & Nikhil Krishnan & Sanjeev R. Kulkarni, 2025. "Financial Data Analysis with Robust Federated Logistic Regression," Papers 2504.20250, arXiv.org.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2507.06266. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.