IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2511.15364.html

Anonymization and Information Loss

Author

Listed:
  • Ke Wu
  • Baozhong Yang
  • Zhenkun Ying
  • Dexin Zhou

Abstract

We show that while anonymization effectively obscures firm identity, it significantly reduces the power of textual understanding, thereby diminishing models' ability to extract meaningful economic signals from financial texts. This information loss is particularly severe when numerical and object entities are removed from texts and is amplified in texts characterized by high linguistic uncertainty and firm specificity. Importantly, in the setting of sentiment extraction from earnings call transcripts, we find that information loss induced by anonymization is more pervasive and severe than the effects of look-ahead bias, suggesting that the costs of anonymization may outweigh its benefits in certain financial applications.

Suggested Citation

  • Ke Wu & Baozhong Yang & Zhenkun Ying & Dexin Zhou, 2025. "Anonymization and Information Loss," Papers 2511.15364, arXiv.org.
  • Handle: RePEc:arx:papers:2511.15364
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2511.15364
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Alejandro Lopez-Lira & Yuehua Tang & Mingyin Zhu, 2025. "The Memorization Problem: Can We Trust LLMs' Economic Forecasts?," Papers 2504.14765, arXiv.org, revised Dec 2025.
    2. Daniel, Kent, et al, 1997. "Measuring Mutual Fund Performance with Characteristic-Based Benchmarks," Journal of Finance, American Finance Association, vol. 52(3), pages 1035-1058, July.
    3. García, Diego & Hu, Xiaowen & Rohrer, Maximilian, 2023. "The colour of finance words," Journal of Financial Economics, Elsevier, vol. 147(3), pages 525-549.
    4. Federico Siano, 2025. "The News in Earnings Announcement Disclosures: Capturing Word Context Using LLM Methods," Management Science, INFORMS, vol. 71(11), pages 9831-9855, November.
    5. Jens Ludwig & Sendhil Mullainathan & Ashesh Rambachan, 2024. "Large Language Models: An Applied Econometric Framework," Papers 2412.07031, arXiv.org, revised Dec 2025.
    6. Tim Loughran & Bill Mcdonald, 2014. "Measuring Readability in Financial Disclosures," Journal of Finance, American Finance Association, vol. 69(4), pages 1643-1671, August.
    7. Breitung, Christian & Müller, Sebastian, 2025. "Global Business Networks," Journal of Financial Economics, Elsevier, vol. 166(C).
    8. Christopher Clayton & Antonio Coppola & Matteo Maggiori & Jesse Schreger, 2025. "Geoeconomic Pressure," NBER Working Papers 34020, National Bureau of Economic Research, Inc.
    9. Frankel, R & Johnson, M & Skinner, DJ, 1999. "An empirical examination of conference calls as a voluntary disclosure medium," Journal of Accounting Research, John Wiley & Sons, Ltd., vol. 37(1), pages 133-150.
    10. Tim Loughran & Bill Mcdonald, 2011. "When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10‐Ks," Journal of Finance, American Finance Association, vol. 66(1), pages 35-65, February.
    11. Brown, Stephen & Hillegeist, Stephen A. & Lo, Kin, 2004. "Conference calls and information asymmetry," Journal of Accounting and Economics, Elsevier, vol. 37(3), pages 343-366, September.
    12. He, Jie (Jack) & Tian, Xuan, 2013. "The dark side of analyst coverage: The case of innovation," Journal of Financial Economics, Elsevier, vol. 109(3), pages 856-878.
    13. David Hirshleifer & Sonya Seongyeon Lim & Siew Hong Teoh, 2009. "Driven to Distraction: Extraneous Events and Underreaction to Earnings News," Journal of Finance, American Finance Association, vol. 64(5), pages 2289-2325, October.
    14. Bernard, Vl & Thomas, Jk, 1989. "Post-Earnings-Announcement Drift - Delayed Price Response Or Risk Premium," Journal of Accounting Research, John Wiley & Sons, Ltd., vol. 27, pages 1-36.
    15. Zheng Tracy Ke & Bryan T. Kelly & Dacheng Xiu, 2019. "Predicting Returns With Text Data," NBER Working Papers 26186, National Bureau of Economic Research, Inc.
    16. Manish Jha & Jialin Qian & Michael Weber & Baozhong Yang, 2024. "ChatGPT and Corporate Policies," Papers 2409.17933, arXiv.org, revised Feb 2025.
    17. Gerard Hoberg & Gordon Phillips, 2016. "Text-Based Network Industries and Endogenous Product Differentiation," Journal of Political Economy, University of Chicago Press, vol. 124(5), pages 1423-1465.
    18. Paul Glasserman & Caden Lin, 2023. "Assessing Look-Ahead Bias in Stock Return Predictions Generated By GPT Sentiment Analysis," Papers 2309.17322, arXiv.org.
    19. Leland Bybee & Bryan Kelly & Yinan Su & Tarun Ramadorai, 2023. "Narrative Asset Pricing: Interpretable Systematic Risk Factors from News Text," The Review of Financial Studies, Society for Financial Studies, vol. 36(12), pages 4759-4787.
    20. Paul C. Tetlock, 2010. "Does Public Financial News Resolve Asymmetric Information?," The Review of Financial Studies, Society for Financial Studies, vol. 23(9), pages 3520-3557.
    21. Songrun He & Linying Lv & Asaf Manela & Jimmy Wu, 2025. "Chronologically Consistent Large Language Models," Papers 2502.21206, arXiv.org, revised Jul 2025.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Giuseppe Matera, 2025. "Corporate Earnings Calls and Analyst Beliefs," Papers 2511.15214, arXiv.org, revised Nov 2025.
    2. Zeckhauser, Richard, 2017. "Straight Talkers and Vague Talkers: The Effects of Managerial Style in Earnings Conference Calls," Working Paper Series rwp17-017, Harvard University, John F. Kennedy School of Government.
    3. Noh, Joonki & Zhou, Dexin, 2022. "Executives’ Blaming external factors and market reactions: Evidence from earnings conference calls," Journal of Banking & Finance, Elsevier, vol. 134(C).
    4. García, Diego & Hu, Xiaowen & Rohrer, Maximilian, 2023. "The colour of finance words," Journal of Financial Economics, Elsevier, vol. 147(3), pages 525-549.
    5. Devrimi Kaya & Christian Maier & Tobias Böhmer, 2020. "Empirische Kapitalmarktforschung zu Conference Calls: Eine Literaturanalyse [Empirical Capital Market Research on Conference Calls: A Literature Review]," Schmalenbach Journal of Business Research, Springer, vol. 72(2), pages 183-212, June.
    6. Blankespoor, Elizabeth & deHaan, Ed & Marinovic, Iván, 2020. "Disclosure processing costs, investors’ information choice, and equity market outcomes: A review," Journal of Accounting and Economics, Elsevier, vol. 70(2).
    7. Bhagwat, Vineet & Shirley, Sara E. & Stark, Jeffrey R., 2024. "Task-oriented speech and information processing," Journal of Banking & Finance, Elsevier, vol. 161(C).
    8. Jiang, George J. & Zhu, Kevin X., 2017. "Information Shocks and Short-Term Market Underreaction," Journal of Financial Economics, Elsevier, vol. 124(1), pages 43-64.
    9. Songrun He & Linying Lv & Asaf Manela & Jimmy Wu, 2025. "Instruction Tuning Chronologically Consistent Language Models," Papers 2510.11677, arXiv.org, revised Nov 2025.
    10. Price, S. McKay & Doran, James S. & Peterson, David R. & Bliss, Barbara A., 2012. "Earnings conference calls and stock returns: The incremental informativeness of textual tone," Journal of Banking & Finance, Elsevier, vol. 36(4), pages 992-1011.
    11. Borochin, Paul A. & Cicon, James E. & DeLisle, R. Jared & Price, S. McKay, 2018. "The effects of conference call tones on market perceptions of value uncertainty," Journal of Financial Markets, Elsevier, vol. 40(C), pages 75-91.
    12. Rizzo, Emanuele, 2018. "Essays on corporate governance and the impact of regulation on financial markets," Other publications TiSEM b5158260-ea13-4763-b992-6, Tilburg University, School of Economics and Management.
    13. Jiang, Hao & Li, Sophia Zhengzi & Wang, Hao, 2021. "Pervasive underreaction: Evidence from high-frequency data," Journal of Financial Economics, Elsevier, vol. 141(2), pages 573-599.
    14. Jin, Zuben, 2024. "Business aspects in focus, investor underreaction and return predictability," Journal of Corporate Finance, Elsevier, vol. 84(C).
    15. DeLisle, R. Jared & Grant, Andrew & Mao, Ruiqi, 2024. "Does environmental and social performance affect pricing efficiency? Evidence from earnings conference call tones," Journal of Corporate Finance, Elsevier, vol. 86(C).
    16. Jian Chen & Guohao Tang & Guofu Zhou & Wu Zhu, 2025. "ChatGPT and Deepseek: Can They Predict the Stock Market and Macroeconomy?," Papers 2502.10008, arXiv.org.
    17. Schnaubelt, Matthias & Seifert, Oleg, 2020. "Valuation ratios, surprises, uncertainty or sentiment: How does financial machine learning predict returns from earnings announcements?," FAU Discussion Papers in Economics 04/2020, Friedrich-Alexander University Erlangen-Nuremberg, Institute for Economics.
    18. Wu, Chen-Hui & Lin, Chan-Jane, 2017. "The impact of media coverage on investor trading behavior and stock returns," Pacific-Basin Finance Journal, Elsevier, vol. 43(C), pages 151-172.
    19. Songrun He & Linying Lv & Asaf Manela & Jimmy Wu, 2025. "Chronologically Consistent Large Language Models," Papers 2502.21206, arXiv.org, revised Jul 2025.
    20. Campbell, John L. & Zheng, Xin & Zhou, Dexin, 2025. "Number of numbers: Does a greater proportion of quantitative textual disclosure reduce information risk?," Journal of Corporate Finance, Elsevier, vol. 94(C).

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2511.15364. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.