IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2509.01478.html

Handling Sparse Non-negative Data in Finance

Author

Listed:
  • Agostino Capponi
  • Zhaonan Qu

Abstract

We show that Poisson regression, though often recommended over log-linear regression for modeling count and other non-negative variables in finance and economics, can be far from optimal when heteroskedasticity and sparsity -- two common features of such data -- are both present. We propose a general class of moment estimators, encompassing Poisson regression, that balances the bias-variance trade-off under these conditions. A simple cross-validation procedure selects the optimal estimator. Numerical simulations and applications to corporate finance data reveal that the best choice varies substantially across settings and often departs from Poisson regression, underscoring the need for a more flexible estimation framework.

Suggested Citation

  • Agostino Capponi & Zhaonan Qu, 2025. "Handling Sparse Non-negative Data in Finance," Papers 2509.01478, arXiv.org.
  • Handle: RePEc:arx:papers:2509.01478
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2509.01478
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Cameron,A. Colin & Trivedi,Pravin K., 2013. "Regression Analysis of Count Data," Cambridge Books, Cambridge University Press, number 9781107667273, January.
    2. Duffie, Darrell & Saita, Leandro & Wang, Ke, 2007. "Multi-period corporate default prediction with stochastic covariates," Journal of Financial Economics, Elsevier, vol. 83(3), pages 635-665, March.
    3. Hausman, Jerry & Hall, Bronwyn H & Griliches, Zvi, 1984. "Econometric Models for Count Data with an Application to the Patents-R&D Relationship," Econometrica, Econometric Society, vol. 52(4), pages 909-938, July.
    4. Gourieroux, Christian & Monfort, Alain & Trognon, Alain, 1984. "Pseudo Maximum Likelihood Methods: Applications to Poisson Models," Econometrica, Econometric Society, vol. 52(3), pages 701-720, May.
    5. Gourieroux, Christian & Monfort, Alain & Trognon, Alain, 1984. "Pseudo Maximum Likelihood Methods: Theory," Econometrica, Econometric Society, vol. 52(3), pages 681-700, May.
    6. James E. Anderson & Eric van Wincoop, 2003. "Gravity with Gravitas: A Solution to the Border Puzzle," American Economic Review, American Economic Association, vol. 93(1), pages 170-192, March.
    7. Vivian W. Fang & Xuan Tian & Sheri Tice, 2014. "Does Stock Liquidity Enhance or Impede Firm Innovation?," Journal of Finance, American Finance Association, vol. 69(5), pages 2085-2125, October.
    8. Mullahy, John, 1998. "Much ado about two: reconsidering retransformation and the two-part model in health econometrics," Journal of Health Economics, Elsevier, vol. 17(3), pages 247-281, June.
    9. Mullahy, John, 1986. "Specification and testing of some modified count data models," Journal of Econometrics, Elsevier, vol. 33(3), pages 341-365, December.
    10. John Mullahy, 1998. "Much Ado About Two: Reconsidering Retransformation and the Two-Part Model in Health Economics," NBER Technical Working Papers 0228, National Bureau of Economic Research, Inc.
    11. Cohn, Jonathan B. & Liu, Zack & Wardlaw, Malcolm I., 2022. "Count (and count-like) data in finance," Journal of Financial Economics, Elsevier, vol. 146(2), pages 529-551.
    12. Windmeijer, F A G & Silva, J M C Santos, 1997. "Endogeneity in Count Data Models: An Application to Demand for Health Care," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 12(3), pages 281-294, May-June.
    13. Alex Hollingsworth & Krzysztof Karbownik & Melissa A. Thomasson & Anthony Wray, 2024. "The Gift of a Lifetime: The Hospital, Modern Medicine, and Mortality," American Economic Review, American Economic Association, vol. 114(7), pages 2201-2238, July.
    14. J. M. C. Santos Silva & Silvana Tenreyro, 2006. "The Log of Gravity," The Review of Economics and Statistics, MIT Press, vol. 88(4), pages 641-658, November.
    15. Sergio Correia & Paulo Guimarães & Tom Zylkin, 2020. "Fast Poisson estimation with high-dimensional fixed effects," Stata Journal, StataCorp LLC, vol. 20(1), pages 95-115, March.
    16. Ai, Chunrong & Norton, Edward C., 2000. "Standard errors for the retransformation problem with heteroscedasticity," Journal of Health Economics, Elsevier, vol. 19(5), pages 697-718, September.
    17. Santos Silva, J.M.C. & Tenreyro, Silvana, 2011. "Further simulation evidence on the performance of the Poisson pseudo-maximum likelihood estimator," Economics Letters, Elsevier, vol. 112(2), pages 220-222, August.
    18. D. Böhning & E. Dietz & P. Schlattmann & L. Mendonça & U. Kirchner, 1999. "The zero‐inflated Poisson model and the decayed, missing and filled teeth index in dental epidemiology," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 162(2), pages 195-209.
    19. Jeffrey A. Frankel, 1997. "Regional Trading Blocs in the World Economic System," Peterson Institute Press: All Books, Peterson Institute for International Economics, number 72, October.
    20. Coles, Jeffrey L. & Daniel, Naveen D. & Naveen, Lalitha, 2006. "Managerial incentives and risk-taking," Journal of Financial Economics, Elsevier, vol. 79(2), pages 431-468, February.
    21. Craig, Ben & von Peter, Goetz, 2014. "Interbank tiering and money center banks," Journal of Financial Intermediation, Elsevier, vol. 23(3), pages 322-347.
    22. He, Jie (Jack) & Tian, Xuan, 2013. "The dark side of analyst coverage: The case of innovation," Journal of Financial Economics, Elsevier, vol. 109(3), pages 856-878.
    23. Santos Silva, J.M.C. & Tenreyro, Silvana, 2010. "On the existence of the maximum likelihood estimates in Poisson regression," Economics Letters, Elsevier, vol. 107(2), pages 310-312, May.
    24. Addoum, Jawad M. & Ng, David T. & Ortiz-Bobea, Ariel, 2023. "Temperature shocks and industry earnings news," Journal of Financial Economics, Elsevier, vol. 150(1), pages 1-45.
    25. David Hirshleifer & Angie Low & Siew Hong Teoh, 2012. "Are Overconfident CEOs Better Innovators?," Journal of Finance, American Finance Association, vol. 67(4), pages 1457-1498, August.
    26. Jiafeng Chen & Jonathan Roth, 2024. "Logs with Zeros? Some Problems and Solutions," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 139(2), pages 891-936.
    27. John Mullahy, 1997. "Instrumental-Variable Estimation Of Count Data Models: Applications To Models Of Cigarette Smoking Behavior," The Review of Economics and Statistics, MIT Press, vol. 79(4), pages 586-593, November.
    28. Ron Bekkerman & Maxime C. Cohen & Edward Kung & John Maiden & Davide Proserpio, 2023. "The Effect of Short-Term Rentals on Residential Investment," Marketing Science, INFORMS, vol. 42(4), pages 819-834, July.
    29. Cameron, A. Colin & Trivedi, Pravin K., 1990. "Regression-based tests for overdispersion in the Poisson model," Journal of Econometrics, Elsevier, vol. 46(3), pages 347-364, December.
    30. William H. Greene, 1994. "Accounting for Excess Zeros and Sample Selection in Poisson and Negative Binomial Regression Models," Working Papers 94-10, New York University, Leonard N. Stern School of Business, Department of Economics.
    31. Card, David, 2001. "Estimating the Return to Schooling: Progress on Some Persistent Econometric Problems," Econometrica, Econometric Society, vol. 69(5), pages 1127-1160, September.
    32. Hirk, Rainer & Vana, Laura & Hornik, Kurt, 2022. "A corporate credit rating model with autoregressive errors," Journal of Empirical Finance, Elsevier, vol. 69(C), pages 224-240.
    33. Qiping Xu & Taehyun Kim, 2022. "Financial Constraints and Corporate Environmental Policies," The Review of Financial Studies, Society for Financial Studies, vol. 35(2), pages 576-635.
    34. John Mullahy & Edward C. Norton, 2022. "Why Transform Y? A Critical Assessment of Dependent-Variable Transformations in Regression Models for Skewed and Sometimes-Zero Outcomes," NBER Working Papers 30735, National Bureau of Economic Research, Inc.
    35. Pat Akey & Ian Appel, 2021. "The Limits of Limited Liability: Evidence from Industrial Pollution," Journal of Finance, American Finance Association, vol. 76(1), pages 5-55, February.
    36. Manning, Willard G. & Mullahy, John, 2001. "Estimating log models: to transform or not to transform?," Journal of Health Economics, Elsevier, vol. 20(4), pages 461-494, July.
    37. repec:bla:jfinan:v:59:y:2004:i:2:p:831-868 is not listed on IDEAS
    38. Anderson, James E, 1979. "A Theoretical Foundation for the Gravity Equation," American Economic Review, American Economic Association, vol. 69(1), pages 106-116, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Cohn, Jonathan B. & Liu, Zack & Wardlaw, Malcolm I., 2022. "Count (and count-like) data in finance," Journal of Financial Economics, Elsevier, vol. 146(2), pages 529-551.
    2. J. M. C. Santos Silva & Silvana Tenreyro, 2022. "The Log of Gravity at 15," Portuguese Economic Journal, Springer;Instituto Superior de Economia e Gestao, vol. 21(3), pages 423-437, September.
    3. Koen Jochmans & Vincenzo Verardi, 2022. "Instrumental‐variable estimation of exponential‐regression models with two‐way fixed effects with an application to gravity equations," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 37(6), pages 1121-1137, September.
    4. Kareem, Fatima Olanike & Martinez-Zarzoso, Inmaculada & Brümmer, Bernhard, 2016. "Fitting the Gravity Model when Zero Trade Flows are Frequent: a Comparison of Estimation Techniques using Africa's Trade Data," GlobalFood Discussion Papers 230588, Georg-August-Universitaet Goettingen, GlobalFood, Department of Agricultural Economics and Rural Development.
    5. Silva João M. C. Santos & Tenreyro Silvana & Windmeijer Frank, 2015. "Testing Competing Models for Non-negative Data with Many Zeros," Journal of Econometric Methods, De Gruyter, vol. 4(1), pages 29-46, January.
    6. Head, Keith & Mayer, Thierry, 2014. "Gravity Equations: Workhorse,Toolkit, and Cookbook," Handbook of International Economics, in: Gopinath, G. & Helpman, . & Rogoff, K. (ed.), Handbook of International Economics, edition 1, volume 4, chapter 0, pages 131-195, Elsevier.
    7. Rainer Winkelmann, 2015. "Counting on count data models," World of Labour, LISER, pages 148-148, May.
    8. Hirsch, Cornelius & Krisztin, Tamás & See, Linda, 2020. "Water Resources as Determinants for Foreign Direct Investments in Land - A Gravity Analysis of Foreign Land Acquisitions," Ecological Economics, Elsevier, vol. 170(C).
    9. Martijn Burger & Frank van Oort & Gert-Jan Linders, 2009. "On the Specification of the Gravity Model of Trade: Zeros, Excess Zeros and Zero-inflated Estimation," Spatial Economic Analysis, Taylor & Francis Journals, vol. 4(2), pages 167-190.
    10. Anna D’Ambrosio & Sandro Montresor, 2022. "The pro-export effect of subnational migration networks: new evidence from Spanish provinces," Review of World Economics (Weltwirtschaftliches Archiv), Springer;Institut für Weltwirtschaft (Kiel Institute for the World Economy), vol. 158(1), pages 53-107, February.
    11. Dongin Kim & Sandro Steinbach, 2024. "The Linder hypothesis for foreign direct investment revisited," Review of International Economics, Wiley Blackwell, vol. 32(4), pages 1901-1928, September.
    12. Scott L. Baier & Amanda Kerr & Yoto V. Yotov, 2018. "Gravity, distance, and international trade," Chapters, in: Bruce A. Blonigen & Wesley W. Wilson (ed.), Handbook of International Trade and Transportation, chapter 2, pages 15-78, Edward Elgar Publishing.
    13. Jochmans, K. & Verardi, V., 2019. "Instrumental-Variable Estimation of Gravity Equations," Cambridge Working Papers in Economics 1994, Faculty of Economics, University of Cambridge.
    14. Hirokazu Ishise & Miwa Matsuo, 2015. "US–Canada border effect between 1993 and 2007: smaller, less asymmetrical, and declining," Review of World Economics (Weltwirtschaftliches Archiv), Springer;Institut für Weltwirtschaft (Kiel Institute for the World Economy), vol. 151(2), pages 291-308, May.
    15. James E. Anderson & Mario Larch & Yoto V. Yotov, 2018. "GEPPML: General equilibrium analysis with PPML," The World Economy, Wiley Blackwell, vol. 41(10), pages 2750-2782, October.
    16. Elina Bryngemark & Patrik Söderholm, 2022. "Green industrial policies and domestic production of biofuels: an econometric analysis of OECD countries," Environmental Economics and Policy Studies, Springer;Society for Environmental Economics and Policy Studies - SEEPS, vol. 24(2), pages 225-261, April.
    17. Jones, A.M, 2010. "Models For Health Care," Health, Econometrics and Data Group (HEDG) Working Papers 10/01, HEDG, c/o Department of Economics, University of York.
    18. Ferro, Esteban & Otsuki, Tsunehiro & Wilson, John S., 2015. "The effect of product standards on agricultural exports," Food Policy, Elsevier, vol. 50(C), pages 68-79.
    19. Felix Groba, 2014. "Determinants of trade with solar energy technology components: evidence on the porter hypothesis?," Applied Economics, Taylor & Francis Journals, vol. 46(5), pages 503-526, February.
    20. Delgadillo Chavarria, Carlos Bruno, 2019. "El Efecto de la Mediterraneidad sobre el Flujo Comercial Internacional: Evidencia Empírica Internacional y para América del Sur (1990-2016) [The Effect of Landlocked Country Status on International Trade Flow: International and South America Empir," MPRA Paper 96093, University Library of Munich, Germany, revised 10 Sep 2019.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2509.01478. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.