IDEAS home Printed from https://ideas.repec.org/a/spr/jcsosc/v5y2022i2d10.1007_s42001-022-00165-9.html
   My bibliography  Save this article

Interpolation of non-random missing values in financial statements’ big data using CatBoost

Author

Listed:
  • Shouji Fujimoto

    (Kanazawa Gakuin University)

  • Takayuki Mizuno

    (National Institute of Informatics
    The Graduate University for Advanced Studies
    The University of Tokyo)

  • Atushi Ishikawa

    (Kanazawa Gakuin University)

Abstract

Financial statements’ big data have the characteristics of “Incompleteness” and “Nonrepresentative”. In this paper, employing the world’s largest commercial database on finance, ORBIS, we first find that the rate of missing data varies depending on the country, the type and size of financial items, and the year. Using information on missing data, we interpolate non-random missing financial variables from the previous- and/or next-year values of the same financial item, the values of other financial items, and the conditions of missing values determined by CatBoost. Because the distribution of financial values obeys Zipf’s law in the large-scale range and mean and variance diverge, we employ an inverse hyperbolic function to convert the value of a financial item as a target variable. We introduce two types of missing interpolation models according to the two types of situations involving missing objective variables. After verifying the accuracies and stabilities of these models, we describe the properties of firm-scale variables in which non-random missing values are interpolated. In the final stage of this work, we combine these two models. From our observations, we confirm that the range in which Zipf’s law is established becomes wider than before interpolation.

Suggested Citation

  • Shouji Fujimoto & Takayuki Mizuno & Atushi Ishikawa, 2022. "Interpolation of non-random missing values in financial statements’ big data using CatBoost," Journal of Computational Social Science, Springer, vol. 5(2), pages 1281-1301, November.
  • Handle: RePEc:spr:jcsosc:v:5:y:2022:i:2:d:10.1007_s42001-022-00165-9
    DOI: 10.1007/s42001-022-00165-9
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s42001-022-00165-9
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s42001-022-00165-9?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. van Buuren, Stef & Groothuis-Oudshoorn, Karin, 2011. "mice: Multivariate Imputation by Chained Equations in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 45(i03).
    2. Samuel Pinto Ribeiro & Stefano Menghinello & Koen De Backer, 2010. "The OECD ORBIS Database: Responding to the Need for Firm-Level Micro-Data in the OECD," OECD Statistics Working Papers 2010/1, OECD Publishing.
    3. Fujimoto, Shouji & Ishikawa, Atushi & Mizuno, Takayuki & Watanabe, Tsutomu, 2011. "A new method for measuring tail exponents of firm size distributions," Economics Discussion Papers 2011-29, Kiel Institute for the World Economy (IfW Kiel).
    4. Bee, Marco & Riccaboni, Massimo & Schiavo, Stefano, 2017. "Where Gibrat meets Zipf: Scale and scope of French firms," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 481(C), pages 265-275.
    5. Lina M Cortés & Juan M Lozada & Javier Perote, 2021. "Firm size and economic concentration: An analysis from a lognormal expansion," PLOS ONE, Public Library of Science, vol. 16(7), pages 1-21, July.
    6. Fujimoto, Shouji & Ishikawa, Atushi & Mizuno, Takayuki & Watanabe, Tsutomu, 2011. "A new method for measuring tail exponents of firm size distributions," Economics - The Open-Access, Open-Assessment E-Journal (2007-2020), Kiel Institute for the World Economy (IfW Kiel), vol. 5, pages 1-20.
    7. Marc F. Bellemare & Casey J. Wichman, 2020. "Elasticities and the Inverse Hyperbolic Sine Transformation," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 82(1), pages 50-61, February.
    8. Riccaboni, Massimo & Wang, Xu & Zhu, Zhen, 2021. "Firm performance in networks: The interplay between firm centrality and corporate group size," Journal of Business Research, Elsevier, vol. 129(C), pages 641-653.
    9. Sørensen, Bent E & Kalemli-Özcan, Sebnem & Volosovych, Vadym & Villegas-Sanchez, Carolina & Yesiltas, Sevcan, 2015. "How to construct nationally representative firm level data from the ORBIS global database," CEPR Discussion Papers 10829, C.E.P.R. Discussion Papers.
    10. Sebastian Beer & Jan Loeprick, 2015. "Profit shifting: drivers of transfer (mis)pricing and the potential of countermeasures," International Tax and Public Finance, Springer;International Institute of Public Finance, vol. 22(3), pages 426-451, June.
    11. Alberto Osnago & Nadia Rocha & Michele Ruta, 2017. "Do Deep Trade Agreements Boost Vertical FDI?," The World Bank Economic Review, World Bank, vol. 30(Supplemen), pages 119-125.
    12. Cortés, Lina M. & Mora-Valencia, Andrés & Perote, Javier, 2017. "Measuring firm size distribution with semi-nonparametric densities," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 485(C), pages 35-47.
    13. Sebnem Kalemli-Ozcan & Bent Sorensen & Carolina Villegas-Sanchez & Vadym Volosovych & Sevcan Yesiltas, 2015. "How to Construct Nationally Representative Firm Level Data from the Orbis Global Database: New Facts and Aggregate Implications," NBER Working Papers 21558, National Bureau of Economic Research, Inc.
    14. Fujimoto, S. & Ishikawa, A. & Mizuno, T. & Watanabe, T. & 渡辺, 努 & ワタナベ, ツトム, 2011. "A New Method for Measuring Tail Exponents of Firm Size Distributions," Working Paper Series 7, Center for Interfirm Network, Institute of Economic Research, Hitotsubashi University.
    15. Peter N. Gal, 2013. "Measuring Total Factor Productivity at the Firm Level using OECD-ORBIS," OECD Economics Department Working Papers 1049, OECD Publishing.
    16. Loet Leydesdorff & Ping Zhou, 2014. "Measuring the knowledge-based economy of China in terms of synergy among technological, organizational, and geographic attributes of firms," Scientometrics, Springer;Akadémiai Kiadó, vol. 98(3), pages 1703-1719, March.
    17. Marco Opazo-Basáez & Ferran Vendrell-Herrero & Oscar F. Bustinza, 2018. "Uncovering Productivity Gains of Digital and Green Servitization: Implications from the Automotive Industry," Sustainability, MDPI, vol. 10(5), pages 1-17, May.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Atushi Ishikawa & Takayuki Mizuno & Shouji Fujimoto, 2022. "Employee Number Dependence in Labor Productivity Distribution," The Review of Socionetwork Strategies, Springer, vol. 16(2), pages 465-477, October.
    2. Shouji Fujimoto & Atushi Ishikawa & Takayuki Mizuno, 2022. "Copula-Based Synthetic Data Generation in Firm-Size Variables," The Review of Socionetwork Strategies, Springer, vol. 16(2), pages 479-492, October.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Peter Gal & Alexander Hijzen, 2016. "The short-term impact of product market reforms: A cross-country firm-level analysis," OECD Economics Department Working Papers 1311, OECD Publishing.
    2. Serhan Cevik & Fedor Miryugin, 2022. "Death and taxes: Does taxation matter for firm survival?," Economics and Politics, Wiley Blackwell, vol. 34(1), pages 92-112, March.
    3. Loredana Fattorini & Mahdi Ghodsi & Armando Rungi, 2020. "Cohesion Policy Meets Heterogeneous Firms," Journal of Common Market Studies, Wiley Blackwell, vol. 58(4), pages 803-817, July.
    4. Falco J. Bargagli-Stoffi & Fabio Incerti & Massimo Riccaboni & Armando Rungi, 2023. "Machine Learning for Zombie Hunting: Predicting Distress from Firms' Accounts and Missing Values," Papers 2306.08165, arXiv.org.
    5. Mr. Sergi Lanau & Petia Topalova, 2016. "The Impact of Product Market Reforms on Firm Productivity in Italy," IMF Working Papers 2016/119, International Monetary Fund.
    6. Valeria Gattai & Piergiovanna Natale & Francesca Rossi, 2022. "Board Diversity and Outward FDI: Evidence from Europe," Working Papers 491, University of Milano-Bicocca, Department of Economics, revised Mar 2022.
    7. Bajgar, Matej & Berlingieri, Giuseppe & Calligaris, Sara & Criscuolo, Chiara & Timmis, Jonathan, 2019. "Industry concentration in Europe and North America," LSE Research Online Documents on Economics 103427, London School of Economics and Political Science, LSE Library.
    8. Pietro Dallari & Nicolas End & Fedor Miryugin & Alexander F. Tieman & Seyed Reza Yousefi, 2020. "Pouring oil on fire: interest deductibility and corporate debt," International Tax and Public Finance, Springer;International Institute of Public Finance, vol. 27(6), pages 1520-1556, December.
    9. Gattai, Valeria & Natale, Piergiovanna & Rossi, Francesca, 2023. "Board diversity and outward FDI: Evidence from europe," Economic Modelling, Elsevier, vol. 120(C).
    10. Maria Borga & Perla Ibarlucea Flores & Monika Sztajerowska, 2020. "Drivers of divestment decisions of multinational enterprises - A cross-country firm-level perspective," OECD Working Papers on International Investment 2019/03, OECD Publishing.
    11. Mr. Federico J Diez & Jiayue Fan & Carolina Villegas-Sánchez, 2019. "Global Declining Competition," IMF Working Papers 2019/082, International Monetary Fund.
    12. Dolores Añón Higón & Juan A. Máñez & María E. Rochina-Barrachina & Amparo Sanchis & Juan A. Sanchis, 2022. "Firms’ distance to the European productivity frontier," Eurasian Business Review, Springer;Eurasia Business and Economics Society, vol. 12(2), pages 197-228, June.
    13. Takayuki Mizuno & Takaaki Ohnishi & Tsutomu Watanabe, 2016. "Power laws in market capitalization during the dot-com and Shanghai bubble periods," Evolutionary and Institutional Economics Review, Springer, vol. 13(2), pages 445-454, December.
    14. Anderson, Gareth & Riley, Rebecca & Young, Garry, 2019. "Distressed banks, distorted decisions?," LSE Research Online Documents on Economics 100947, London School of Economics and Political Science, LSE Library.
    15. Takayuki Mizuno & Takaaki Ohnishi & Tsutomu Watanabe, 2016. "Power laws in market capitalization during the Dot-com and Shanghai bubble periods," CARF F-Series CARF-F-392, Center for Advanced Research in Finance, Faculty of Economics, The University of Tokyo.
    16. Villegas-Sanchez, Carolina & Díez, Federico & Fan, Jiayue, 2019. "Global Declining Competition," CEPR Discussion Papers 13696, C.E.P.R. Discussion Papers.
    17. Dan Andrews & Chiara Criscuolo & Peter N. Gal, 2019. "The best versus the rest: divergence across firms during the global productivity slowdown," CEP Discussion Papers dp1645, Centre for Economic Performance, LSE.
    18. José C. Fariñas & Ana Martín-Marcos & Francisco J. Velázquez, 2021. "The Geographical Scope of Multinational Firms and Heterogeneity," Open Economies Review, Springer, vol. 32(4), pages 761-788, September.
    19. Jovanovic, Franck & Schinckus, Christophe, 2017. "Econophysics and Financial Economics: An Emerging Dialogue," OUP Catalogue, Oxford University Press, number 9780190205034.
    20. Koray Aktas & Valeria Gattai & Piergiovanna Natale, 2021. "Board Gender Quotas and Outward Foreign Direct Investment: Evidence from France," Working Papers 485, University of Milano-Bicocca, Department of Economics, revised Dec 2021.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:jcsosc:v:5:y:2022:i:2:d:10.1007_s42001-022-00165-9. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.