IDEAS home Printed from https://ideas.repec.org/a/eee/finlet/v46y2022ipbs1544612321004578.html
   My bibliography  Save this article

Detection of fraud statement based on word vector: Evidence from financial companies in China

Author

Listed:
  • Zhang, Yi
  • Hu, Ailing
  • Wang, Jiahua
  • Zhang, Yaojie

Abstract

This paper aims to detect plausible frauds of financial companies via text analysis of annual and quarterly reports of China's listed companies. The Management Discussion and Analysis (MD&A) is digitized as vectors. The empirical results indicate that compared with various vector indexes, the bag-of-words (BoW) model and machine learning algorithm have a prediction effect and the ability to recognize frauds where the BoW model can correctly recognize 77% of the fraud reports. This would be helpful for audit authorities to identify fraud reports.

Suggested Citation

  • Zhang, Yi & Hu, Ailing & Wang, Jiahua & Zhang, Yaojie, 2022. "Detection of fraud statement based on word vector: Evidence from financial companies in China," Finance Research Letters, Elsevier, vol. 46(PB).
  • Handle: RePEc:eee:finlet:v:46:y:2022:i:pb:s1544612321004578
    DOI: 10.1016/j.frl.2021.102477
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S1544612321004578
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.frl.2021.102477?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Bonsall, Samuel B. & Leone, Andrew J. & Miller, Brian P. & Rennekamp, Kristina, 2017. "A plain English measure of financial reporting readability," Journal of Accounting and Economics, Elsevier, vol. 63(2), pages 329-357.
    2. Sunita Goel & Jagdish Gangolly, 2012. "Beyond The Numbers: Mining The Annual Reports For Hidden Cues Indicative Of Financial Statement Fraud," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 19(2), pages 75-89, April.
    3. Lynnette Purda & David Skillicorn, 2015. "Accounting Variables, Deception, and a Bag of Words: Assessing the Tools of Fraud Detection," Contemporary Accounting Research, John Wiley & Sons, vol. 32(3), pages 1193-1223, September.
    4. Xin‐Ping Song & Zhi‐Hua Hu & Jian‐Guo Du & Zhao‐Han Sheng, 2014. "Application of Machine Learning Methods to Risk Assessment of Financial Statement Fraud: Evidence from China," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 33(8), pages 611-626, December.
    5. Jiang, Fuwei & Lee, Joshua & Martin, Xiumin & Zhou, Guofu, 2019. "Manager sentiment and stock returns," Journal of Financial Economics, Elsevier, vol. 132(1), pages 126-149.
    6. Kong, Dongmin & Shi, Lu & Zhang, Fan, 2021. "Explain or conceal? Causal language intensity in annual report and stock price crash risk," Economic Modelling, Elsevier, vol. 94(C), pages 715-725.
    7. Werner Antweiler & Murray Z. Frank, 2004. "Is All That Talk Just Noise? The Information Content of Internet Stock Message Boards," Journal of Finance, American Finance Association, vol. 59(3), pages 1259-1294, June.
    8. Dai, Zhifeng & Kang, Jie & Wen, Fenghua, 2021. "Predicting stock returns: A risk measurement perspective," International Review of Financial Analysis, Elsevier, vol. 74(C).
    9. Feng Li, 2010. "The Information Content of Forward‐Looking Statements in Corporate Filings—A Naïve Bayesian Machine Learning Approach," Journal of Accounting Research, Wiley Blackwell, vol. 48(5), pages 1049-1102, December.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Li, Jing & Li, Nan & Xia, Tongshui & Guo, Jinjin, 2023. "Textual analysis and detection of financial fraud: Evidence from Chinese manufacturing firms," Economic Modelling, Elsevier, vol. 126(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Senave, Elseline & Jans, Mieke J. & Srivastava, Rajendra P., 2023. "The application of text mining in accounting," International Journal of Accounting Information Systems, Elsevier, vol. 50(C).
    2. Jiao Ji & Oleksandr Talavera & Shuxing Yin, 2018. "The Hidden Information Content: Evidence from the Tone of Independent Director Reports," Working Papers 2018-28, Swansea University, School of Management.
    3. Berkin, Anil & Aerts, Walter & Van Caneghem, Tom, 2023. "Feasibility analysis of machine learning for performance-related attributional statements," International Journal of Accounting Information Systems, Elsevier, vol. 48(C).
    4. Blankespoor, Elizabeth & deHaan, Ed & Marinovic, Iván, 2020. "Disclosure processing costs, investors’ information choice, and equity market outcomes: A review," Journal of Accounting and Economics, Elsevier, vol. 70(2).
    5. Wenbo Ma & Xinjie Wang & Yuan Wang & Ge Wu, 2021. "Measuring misleading information in IPO prospectuses," Review of Quantitative Finance and Accounting, Springer, vol. 57(3), pages 819-843, October.
    6. Dai, Zhifeng & Chang, Xiaoming, 2021. "Forecasting stock market volatility: Can the risk aversion measure exert an important role?," The North American Journal of Economics and Finance, Elsevier, vol. 58(C).
    7. Andrew Todd & James Bowden & Yashar Moshfeghi, 2024. "Text‐based sentiment analysis in finance: Synthesising the existing literature and exploring future directions," Intelligent Systems in Accounting, Finance and Management, John Wiley & Sons, Ltd., vol. 31(1), March.
    8. Fengler, Matthias & Phan, Minh Tri, 2023. "A Topic Model for 10-K Management Disclosures," Economics Working Paper Series 2307, University of St. Gallen, School of Economics and Political Science.
    9. Fahd Alduais & Nashat Ali Almasria & Abeer Samara & Ali Masadeh, 2022. "Conciseness, Financial Disclosure, and Market Reaction: A Textual Analysis of Annual Reports in Listed Chinese Companies," IJFS, MDPI, vol. 10(4), pages 1-22, November.
    10. John L. Campbell & Hye Seung “Grace” Lee & Hsin‐Min Lu & Logan B. Steele, 2020. "Express Yourself: Why Managers' Disclosure Tone Varies Across Time and What Investors Learn From It," Contemporary Accounting Research, John Wiley & Sons, vol. 37(2), pages 1140-1171, June.
    11. Ahmed, Yousry & Elshandidy, Tamer, 2016. "The effect of bidder conservatism on M&A decisions: Text-based evidence from US 10-K filings," International Review of Financial Analysis, Elsevier, vol. 46(C), pages 176-190.
    12. Thomas Renault, 2020. "Sentiment analysis and machine learning in finance: a comparison of methods and models on one million messages," Digital Finance, Springer, vol. 2(1), pages 1-13, September.
    13. Bryan T. Kelly & Asaf Manela & Alan Moreira, 2019. "Text Selection," NBER Working Papers 26517, National Bureau of Economic Research, Inc.
    14. Al-Nasseri, Alya & Menla Ali, Faek & Tucker, Allan, 2021. "Investor sentiment and the dispersion of stock returns: Evidence based on the social network of investors," International Review of Financial Analysis, Elsevier, vol. 78(C).
    15. Ding, Rong & Hou, Wenxuan & Liu, Yue (Lucy) & Zhang, John Ziyang, 2018. "Media censorship and stock price: Evidence from the foreign share discount in China," Journal of International Financial Markets, Institutions and Money, Elsevier, vol. 55(C), pages 112-133.
    16. Xin Xu & Feng Xiong & Zhe An, 2023. "Using Machine Learning to Predict Corporate Fraud: Evidence Based on the GONE Framework," Journal of Business Ethics, Springer, vol. 186(1), pages 137-158, August.
    17. James P. Ryans, 2021. "Textual classification of SEC comment letters," Review of Accounting Studies, Springer, vol. 26(1), pages 37-80, March.
    18. Sanjiv Das & Xin Huang & Soji Adeshina & Patrick Yang & Leonardo Bachega, 2023. "Credit Risk Modeling with Graph Machine Learning," INFORMS Joural on Data Science, INFORMS, vol. 2(2), pages 197-217, October.
    19. An, Suwei, 2023. "Essays on incentive contracts, M&As, and firm risk," Other publications TiSEM dd97d2f5-1c9d-47c5-ba62-f, Tilburg University, School of Economics and Management.
    20. Yongan Xu & Jianqiong Wang & Zhonglu Chen & Chao Liang, 2023. "Sentiment indices and stock returns: Evidence from China," International Journal of Finance & Economics, John Wiley & Sons, Ltd., vol. 28(1), pages 1063-1080, January.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:finlet:v:46:y:2022:i:pb:s1544612321004578. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/frl .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.