IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2509.01478.html
   My bibliography  Save this paper

Handling Sparse Non-negative Data in Finance

Author

Listed:
  • Agostino Capponi
  • Zhaonan Qu

Abstract

We show that Poisson regression, though often recommended over log-linear regression for modeling count and other non-negative variables in finance and economics, can be far from optimal when heteroskedasticity and sparsity -- two common features of such data -- are both present. We propose a general class of moment estimators, encompassing Poisson regression, that balances the bias-variance trade-off under these conditions. A simple cross-validation procedure selects the optimal estimator. Numerical simulations and applications to corporate finance data reveal that the best choice varies substantially across settings and often departs from Poisson regression, underscoring the need for a more flexible estimation framework.

Suggested Citation

  • Agostino Capponi & Zhaonan Qu, 2025. "Handling Sparse Non-negative Data in Finance," Papers 2509.01478, arXiv.org.
  • Handle: RePEc:arx:papers:2509.01478
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2509.01478
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Craig, Ben & von Peter, Goetz, 2014. "Interbank tiering and money center banks," Journal of Financial Intermediation, Elsevier, vol. 23(3), pages 322-347.
    2. Gourieroux, Christian & Monfort, Alain & Trognon, Alain, 1984. "Pseudo Maximum Likelihood Methods: Theory," Econometrica, Econometric Society, vol. 52(3), pages 681-700, May.
    3. Cameron,A. Colin & Trivedi,Pravin K., 2013. "Regression Analysis of Count Data," Cambridge Books, Cambridge University Press, number 9781107667273, November.
    4. Duffie, Darrell & Saita, Leandro & Wang, Ke, 2007. "Multi-period corporate default prediction with stochastic covariates," Journal of Financial Economics, Elsevier, vol. 83(3), pages 635-665, March.
    5. Gourieroux, Christian & Monfort, Alain & Trognon, Alain, 1984. "Pseudo Maximum Likelihood Methods: Applications to Poisson Models," Econometrica, Econometric Society, vol. 52(3), pages 701-720, May.
    6. Alex Hollingsworth & Krzysztof Karbownik & Melissa A. Thomasson & Anthony Wray, 2024. "The Gift of a Lifetime: The Hospital, Modern Medicine, and Mortality," American Economic Review, American Economic Association, vol. 114(7), pages 2201-2238, July.
    7. James E. Anderson & Eric van Wincoop, 2003. "Gravity with Gravitas: A Solution to the Border Puzzle," American Economic Review, American Economic Association, vol. 93(1), pages 170-192, March.
    8. Vivian W. Fang & Xuan Tian & Sheri Tice, 2014. "Does Stock Liquidity Enhance or Impede Firm Innovation?," Journal of Finance, American Finance Association, vol. 69(5), pages 2085-2125, October.
    9. Mullahy, John, 1998. "Much ado about two: reconsidering retransformation and the two-part model in health econometrics," Journal of Health Economics, Elsevier, vol. 17(3), pages 247-281, June.
    10. John Mullahy, 1998. "Much Ado About Two: Reconsidering Retransformation and the Two-Part Model in Health Economics," NBER Technical Working Papers 0228, National Bureau of Economic Research, Inc.
    11. Santos Silva, J.M.C. & Tenreyro, Silvana, 2011. "Further simulation evidence on the performance of the Poisson pseudo-maximum likelihood estimator," Economics Letters, Elsevier, vol. 112(2), pages 220-222, August.
    12. Cohn, Jonathan B. & Liu, Zack & Wardlaw, Malcolm I., 2022. "Count (and count-like) data in finance," Journal of Financial Economics, Elsevier, vol. 146(2), pages 529-551.
    13. Jeffrey A. Frankel, 1997. "Regional Trading Blocs in the World Economic System," Peterson Institute Press: All Books, Peterson Institute for International Economics, number 72, October.
    14. Coles, Jeffrey L. & Daniel, Naveen D. & Naveen, Lalitha, 2006. "Managerial incentives and risk-taking," Journal of Financial Economics, Elsevier, vol. 79(2), pages 431-468, February.
    15. David Hirshleifer & Angie Low & Siew Hong Teoh, 2012. "Are Overconfident CEOs Better Innovators?," Journal of Finance, American Finance Association, vol. 67(4), pages 1457-1498, August.
    16. John Mullahy, 1997. "Instrumental-Variable Estimation Of Count Data Models: Applications To Models Of Cigarette Smoking Behavior," The Review of Economics and Statistics, MIT Press, vol. 79(4), pages 586-593, November.
    17. Cameron, A. Colin & Trivedi, Pravin K., 1990. "Regression-based tests for overdispersion in the Poisson model," Journal of Econometrics, Elsevier, vol. 46(3), pages 347-364, December.
    18. William H. Greene, 1994. "Accounting for Excess Zeros and Sample Selection in Poisson and Negative Binomial Regression Models," Working Papers 94-10, New York University, Leonard N. Stern School of Business, Department of Economics.
    19. Qiping Xu & Taehyun Kim, 2022. "Financial Constraints and Corporate Environmental Policies," The Review of Financial Studies, Society for Financial Studies, vol. 35(2), pages 576-635.
    20. repec:bla:jfinan:v:59:y:2004:i:2:p:831-868 is not listed on IDEAS
    21. Hausman, Jerry & Hall, Bronwyn H & Griliches, Zvi, 1984. "Econometric Models for Count Data with an Application to the Patents-R&D Relationship," Econometrica, Econometric Society, vol. 52(4), pages 909-938, July.
    22. Mullahy, John, 1986. "Specification and testing of some modified count data models," Journal of Econometrics, Elsevier, vol. 33(3), pages 341-365, December.
    23. Card, David, 2001. "Estimating the Return to Schooling: Progress on Some Persistent Econometric Problems," Econometrica, Econometric Society, vol. 69(5), pages 1127-1160, September.
    24. Windmeijer, F A G & Silva, J M C Santos, 1997. "Endogeneity in Count Data Models: An Application to Demand for Health Care," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 12(3), pages 281-294, May-June.
    25. Sergio Correia & Paulo Guimarães & Tom Zylkin, 2020. "Fast Poisson estimation with high-dimensional fixed effects," Stata Journal, StataCorp LLC, vol. 20(1), pages 95-115, March.
    26. Ai, Chunrong & Norton, Edward C., 2000. "Standard errors for the retransformation problem with heteroscedasticity," Journal of Health Economics, Elsevier, vol. 19(5), pages 697-718, September.
    27. D. Böhning & E. Dietz & P. Schlattmann & L. Mendonça & U. Kirchner, 1999. "The zero‐inflated Poisson model and the decayed, missing and filled teeth index in dental epidemiology," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 162(2), pages 195-209.
    28. He, Jie (Jack) & Tian, Xuan, 2013. "The dark side of analyst coverage: The case of innovation," Journal of Financial Economics, Elsevier, vol. 109(3), pages 856-878.
    29. Santos Silva, J.M.C. & Tenreyro, Silvana, 2010. "On the existence of the maximum likelihood estimates in Poisson regression," Economics Letters, Elsevier, vol. 107(2), pages 310-312, May.
    30. Jiafeng Chen & Jonathan Roth, 2024. "Logs with Zeros? Some Problems and Solutions," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 139(2), pages 891-936.
    31. Addoum, Jawad M. & Ng, David T. & Ortiz-Bobea, Ariel, 2023. "Temperature shocks and industry earnings news," Journal of Financial Economics, Elsevier, vol. 150(1), pages 1-45.
    32. Ron Bekkerman & Maxime C. Cohen & Edward Kung & John Maiden & Davide Proserpio, 2023. "The Effect of Short-Term Rentals on Residential Investment," Marketing Science, INFORMS, vol. 42(4), pages 819-834, July.
    33. Manning, Willard G. & Mullahy, John, 2001. "Estimating log models: to transform or not to transform?," Journal of Health Economics, Elsevier, vol. 20(4), pages 461-494, July.
    34. Hirk, Rainer & Vana, Laura & Hornik, Kurt, 2022. "A corporate credit rating model with autoregressive errors," Journal of Empirical Finance, Elsevier, vol. 69(C), pages 224-240.
    35. John Mullahy & Edward C. Norton, 2022. "Why Transform Y? A Critical Assessment of Dependent-Variable Transformations in Regression Models for Skewed and Sometimes-Zero Outcomes," NBER Working Papers 30735, National Bureau of Economic Research, Inc.
    36. Pat Akey & Ian Appel, 2021. "The Limits of Limited Liability: Evidence from Industrial Pollution," Journal of Finance, American Finance Association, vol. 76(1), pages 5-55, February.
    37. Anderson, James E, 1979. "A Theoretical Foundation for the Gravity Equation," American Economic Review, American Economic Association, vol. 69(1), pages 106-116, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Cohn, Jonathan B. & Liu, Zack & Wardlaw, Malcolm I., 2022. "Count (and count-like) data in finance," Journal of Financial Economics, Elsevier, vol. 146(2), pages 529-551.
    2. J. M. C. Santos Silva & Silvana Tenreyro, 2022. "The Log of Gravity at 15," Portuguese Economic Journal, Springer;Instituto Superior de Economia e Gestao, vol. 21(3), pages 423-437, September.
    3. Koen Jochmans & Vincenzo Verardi, 2022. "Instrumental‐variable estimation of exponential‐regression models with two‐way fixed effects with an application to gravity equations," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 37(6), pages 1121-1137, September.
    4. Silva João M. C. Santos & Tenreyro Silvana & Windmeijer Frank, 2015. "Testing Competing Models for Non-negative Data with Many Zeros," Journal of Econometric Methods, De Gruyter, vol. 4(1), pages 29-46, January.
    5. Kareem, Fatima Olanike & Martinez-Zarzoso, Inmaculada & Brümmer, Bernhard, 2016. "Fitting the Gravity Model when Zero Trade Flows are Frequent: a Comparison of Estimation Techniques using Africa's Trade Data," GlobalFood Discussion Papers 230588, Georg-August-Universitaet Goettingen, GlobalFood, Department of Agricultural Economics and Rural Development.
    6. J. M. C. Santos Silva & Silvana Tenreyro, 2006. "The Log of Gravity," The Review of Economics and Statistics, MIT Press, vol. 88(4), pages 641-658, November.
    7. Dongin Kim & Sandro Steinbach, 2024. "The Linder hypothesis for foreign direct investment revisited," Review of International Economics, Wiley Blackwell, vol. 32(4), pages 1901-1928, September.
    8. Head, Keith & Mayer, Thierry, 2014. "Gravity Equations: Workhorse,Toolkit, and Cookbook," Handbook of International Economics, in: Gopinath, G. & Helpman, . & Rogoff, K. (ed.), Handbook of International Economics, edition 1, volume 4, chapter 0, pages 131-195, Elsevier.
    9. Jones, A.M, 2010. "Models For Health Care," Health, Econometrics and Data Group (HEDG) Working Papers 10/01, HEDG, c/o Department of Economics, University of York.
    10. Rainer Winkelmann, 2015. "Counting on count data models," IZA World of Labor, Institute of Labor Economics (IZA), pages 148-148, May.
    11. Elisaveta Archanskaia & Guillaume Daudin, 2012. "Heterogeneity and the Distance Puzzle," Documents de Travail de l'OFCE 2012-17, Observatoire Francais des Conjonctures Economiques (OFCE).
    12. Jiantao Ma, 2023. "Effects of international trade on income revisited," Review of International Economics, Wiley Blackwell, vol. 31(4), pages 1286-1302, September.
    13. Margarita E. Romero Rodríguez & Enrique Los Arcos & Victor Cano Fernández & Miguel Sánchez Padrón, 2001. "Modelo para datos de recuentro de corte transversal con exceso de ceros. Aplicación a citas patentes," Documentos de trabajo conjunto ULL-ULPGC 2001-05, Facultad de Ciencias Económicas de la ULPGC.
    14. Mello, Marco & Moscelli, Giuseppe, 2022. "Voting, contagion and the trade-off between public health and political rights: Quasi-experimental evidence from the Italian 2020 polls," Journal of Economic Behavior & Organization, Elsevier, vol. 200(C), pages 1025-1052.
    15. Hirsch, Cornelius & Krisztin, Tamás & See, Linda, 2020. "Water Resources as Determinants for Foreign Direct Investments in Land - A Gravity Analysis of Foreign Land Acquisitions," Ecological Economics, Elsevier, vol. 170(C).
    16. Martijn Burger & Frank van Oort & Gert-Jan Linders, 2009. "On the Specification of the Gravity Model of Trade: Zeros, Excess Zeros and Zero-inflated Estimation," Spatial Economic Analysis, Taylor & Francis Journals, vol. 4(2), pages 167-190.
    17. Anna D’Ambrosio & Sandro Montresor, 2022. "The pro-export effect of subnational migration networks: new evidence from Spanish provinces," Review of World Economics (Weltwirtschaftliches Archiv), Springer;Institut für Weltwirtschaft (Kiel Institute for the World Economy), vol. 158(1), pages 53-107, February.
    18. Usala, Cristian & Primerano, Ilaria & Santelli, Francesco & Ragozini, Giancarlo, 2024. "The more the better? How degree programs’ variety affects university students’ churn risk," Socio-Economic Planning Sciences, Elsevier, vol. 94(C).
    19. Scott L. Baier & Amanda Kerr & Yoto V. Yotov, 2018. "Gravity, distance, and international trade," Chapters, in: Bruce A. Blonigen & Wesley W. Wilson (ed.), Handbook of International Trade and Transportation, chapter 2, pages 15-78, Edward Elgar Publishing.
    20. Donald S. Kenkel & Joseph V. Terza, 2001. "The effect of physician advice on alcohol consumption: count regression with an endogenous treatment effect," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 16(2), pages 165-184.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2509.01478. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.