IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2107.08112.html
   My bibliography  Save this paper

Hamiltonian Monte Carlo for Regression with High-Dimensional Categorical Data

Author

Listed:
  • Szymon Sacher
  • Laura Battaglia
  • Stephen Hansen

Abstract

Latent variable models are increasingly used in economics for high-dimensional categorical data like text and surveys. We demonstrate the effectiveness of Hamiltonian Monte Carlo (HMC) with parallelized automatic differentiation for analyzing such data in a computationally efficient and methodologically sound manner. Our new model, Supervised Topic Model with Covariates, shows that carefully modeling this type of data can have significant implications on conclusions compared to a simpler, frequently used, yet methodologically problematic, two-step approach. A simulation study and revisiting Bandiera et al. (2020)'s study of executive time use demonstrate these results. The approach accommodates thousands of parameters and doesn't require custom algorithms specific to each model, making it accessible for applied researchers

Suggested Citation

  • Szymon Sacher & Laura Battaglia & Stephen Hansen, 2021. "Hamiltonian Monte Carlo for Regression with High-Dimensional Categorical Data," Papers 2107.08112, arXiv.org, revised Feb 2024.
  • Handle: RePEc:arx:papers:2107.08112
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2107.08112
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Bertsch, Christoph & Hull, Isaiah & Zhang, Xin, 2021. "Narrative fragmentation and the business cycle," Economics Letters, Elsevier, vol. 201(C).
    2. Matthew Gentzkow & Bryan Kelly & Matt Taddy, 2019. "Text as Data," Journal of Economic Literature, American Economic Association, vol. 57(3), pages 535-574, September.
    3. Evan Munro & Serena Ng, 2022. "Latent Dirichlet Analysis of Categorical Survey Responses," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 40(1), pages 256-271, January.
    4. Oriana Bandiera & Andrea Prat & Stephen Hansen & Raffaella Sadun, 2020. "CEO Behavior and Firm Performance," Journal of Political Economy, University of Chicago Press, vol. 128(4), pages 1325-1369.
    5. Stephen Hansen & Michael McMahon, 2016. "Shocking Language: Understanding the Macroeconomic Effects of Central Bank Communication," NBER Chapters, in: NBER International Seminar on Macroeconomics 2015, National Bureau of Economic Research, Inc.
    6. Carpenter, Bob & Gelman, Andrew & Hoffman, Matthew D. & Lee, Daniel & Goodrich, Ben & Betancourt, Michael & Brubaker, Marcus & Guo, Jiqiang & Li, Peter & Riddell, Allen, 2017. "Stan: A Probabilistic Programming Language," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 76(i01).
    7. Stephen Hansen & Michael McMahon & Andrea Prat, 2018. "Transparency and Deliberation Within the FOMC: A Computational Linguistics Approach," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 133(2), pages 801-870.
    8. J. Ignacio Conde-Ruiz & Juan-José Ganuza & Manu García & Luis A. Puch, 2021. "Gender Distribution across Topics in Top 5 Economics Journals: A Machine Learning Approach," Working Papers 1241, Barcelona School of Economics.
    9. Mueller, Hannes & Rauh, Christopher, 2018. "Reading Between the Lines: Prediction of Political Violence Using Newspaper Text," American Political Science Review, Cambridge University Press, vol. 112(2), pages 358-375, May.
    10. Dang, Khue-Dung & Quiroz, Matias & Kohn, Robert & Tran, Minh-Ngoc & Villani, Mattias, 2019. "Hamiltonian Monte Carlo with Energy Conserving Subsampling," Working Paper Series 372, Sveriges Riksbank (Central Bank of Sweden).
    11. Evan M. Munro & Serena Ng, 2020. "Latent Dirichlet Analysis of Categorical Survey Expectations," NBER Working Papers 27182, National Bureau of Economic Research, Inc.
    12. Jon Ellingsen & Vegard H. Larsen & Leif Anders Thorsrud, 2020. "News Media vs. FRED-MD for Macroeconomic Forecasting," CESifo Working Paper Series 8639, CESifo.
    13. Nimark, Kristoffer P. & Pitschner, Stefan, 2019. "News media and delegated information choice," Journal of Economic Theory, Elsevier, vol. 181(C), pages 160-196.
    14. Leif Anders Thorsrud, 2020. "Words are the New Numbers: A Newsy Coincident Index of the Business Cycle," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 38(2), pages 393-409, April.
    15. Hansen, Stephen & McMahon, Michael & Tong, Matthew, 2019. "The long-run information effect of central bank communication," Journal of Monetary Economics, Elsevier, vol. 108(C), pages 185-202.
    16. Adams, Renée B. & Ragunathan, Vanitha & Tumarkin, Robert, 2021. "Death by committee? An analysis of corporate board (sub-) committees," Journal of Financial Economics, Elsevier, vol. 141(3), pages 1119-1146.
    17. Draca, Mirko & Schwarz, Carlo, 2019. "How Polarized are Citizens? Measuring Ideology from the Ground-Up," The Warwick Economics Research Paper Series (TWERPS) 1218, University of Warwick, Department of Economics.
    18. Nimczik, Jan Sebastian, 2017. "Job Mobility Networks and Endogenous Labor Markets," VfS Annual Conference 2017 (Vienna): Alternative Structures for Money and Banking 168147, Verein für Socialpolitik / German Economic Association.
    19. Leland Bybee & Bryan T. Kelly & Asaf Manela & Dacheng Xiu, 2020. "The Structure of Economic News," NBER Working Papers 26648, National Bureau of Economic Research, Inc.
    20. Margaret E. Roberts & Brandon M. Stewart & Dustin Tingley & Christopher Lucas & Jetson Leder‐Luis & Shana Kushner Gadarian & Bethany Albertson & David G. Rand, 2014. "Structural Topic Models for Open‐Ended Survey Responses," American Journal of Political Science, John Wiley & Sons, vol. 58(4), pages 1064-1082, October.
    21. Kathleen Weiss Hanley & Gerard Hoberg, 2019. "Dynamic Interpretation of Emerging Risks in the Financial Sector," The Review of Financial Studies, Society for Financial Studies, vol. 32(12), pages 4543-4603.
    22. Margaret E. Roberts & Brandon M. Stewart & Edoardo M. Airoldi, 2016. "A Model of Text for Experimentation in the Social Sciences," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(515), pages 988-1003, July.
    23. Rachael Meager, 2019. "Understanding the Average Impact of Microcredit Expansions: A Bayesian Hierarchical Analysis of Seven Randomized Experiments," American Economic Journal: Applied Economics, American Economic Association, vol. 11(1), pages 57-91, January.
    24. Larsen, Vegard H. & Thorsrud, Leif A., 2019. "The value of news for economic developments," Journal of Econometrics, Elsevier, vol. 210(1), pages 203-218.
    25. Meager, Rachael, 2019. "Understanding the average impact of microcredit expansions: a Bayesian hierarchical analysis of seven randomized experiments," LSE Research Online Documents on Economics 88190, London School of Economics and Political Science, LSE Library.
    26. Draca, Mirko & Schwarz, Carlo, 2021. "How Polarized are Citizens? Measuring Ideology from the Ground-Up," The Warwick Economics Research Paper Series (TWERPS) 07, University of Warwick, Department of Economics.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. repec:hal:journl:hal-03533356 is not listed on IDEAS

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Laura Battaglia & Timothy Christensen & Stephen Hansen & Szymon Sacher, 2024. "Inference for Regression with Variables Generated from Unstructured Data," Papers 2402.15585, arXiv.org, revised Mar 2024.
    2. Larsen, Vegard H. & Thorsrud, Leif Anders & Zhulanova, Julia, 2021. "News-driven inflation expectations and information rigidities," Journal of Monetary Economics, Elsevier, vol. 117(C), pages 507-520.
    3. Leonardo N. Ferreira, 2021. "Forecasting with VAR-teXt and DFM-teXt Models:exploring the predictive power of central bank communication," Working Papers Series 559, Central Bank of Brazil, Research Department.
    4. Jon Ellingsen & Vegard H. Larsen & Leif Anders Thorsrud, 2020. "News Media vs. FRED-MD for Macroeconomic Forecasting," CESifo Working Paper Series 8639, CESifo.
    5. Jon Ellingsen & Vegard H. Larsen & Leif Anders Thorsrud, 2022. "News media versus FRED‐MD for macroeconomic forecasting," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 37(1), pages 63-81, January.
    6. Jianhao Lin & Jiacheng Fan & Yifan Zhang & Liangyuan Chen, 2023. "Real‐time macroeconomic projection using narrative central bank communication," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 38(2), pages 202-221, March.
    7. Peter Grajzl & Peter Murrell, 2021. "Characterizing a legal–intellectual culture: Bacon, Coke, and seventeenth-century England," Cliometrica, Journal of Historical Economics and Econometric History, Association Française de Cliométrie (AFC), vol. 15(1), pages 43-88, January.
    8. Grajzl, Peter & Murrell, Peter, 2019. "Toward understanding 17th century English culture: A structural topic model of Francis Bacon's ideas," Journal of Comparative Economics, Elsevier, vol. 47(1), pages 111-135.
    9. Martin Baumgaertner & Johannes Zahner, 2021. "Whatever it takes to understand a central banker - Embedding their words using neural networks," MAGKS Papers on Economics 202130, Philipps-Universität Marburg, Faculty of Business Administration and Economics, Department of Economics (Volkswirtschaftliche Abteilung).
    10. Celso Brunetti & Marc Joëts & Valérie Mignon, 2023. "Reasons Behind Words: OPEC Narratives and the Oil Market," Working Papers 2023-19, CEPII research center.
    11. Vegard Høghaug Larsen & Leif Anders Thorsrud, 2022. "Asset returns, news topics, and media effects," Scandinavian Journal of Economics, Wiley Blackwell, vol. 124(3), pages 838-868, July.
    12. Shapiro, Adam Hale & Sudhof, Moritz & Wilson, Daniel J., 2022. "Measuring news sentiment," Journal of Econometrics, Elsevier, vol. 228(2), pages 221-243.
    13. Saskia Ter Ellen & Vegard H. Larsen & Leif Anders Thorsrud, 2022. "Narrative Monetary Policy Surprises and the Media," Journal of Money, Credit and Banking, Blackwell Publishing, vol. 54(5), pages 1525-1549, August.
    14. Peter Grajzl & Cindy Irby, 2019. "Reflections on study abroad: a computational linguistics approach," Journal of Computational Social Science, Springer, vol. 2(2), pages 151-181, July.
    15. Simon Fritzsch & Philipp Scharner & Gregor Weiß, 2021. "Estimating the relation between digitalization and the market value of insurers," Journal of Risk & Insurance, The American Risk and Insurance Association, vol. 88(3), pages 529-567, September.
    16. Amarasinghe, Ashani, 2022. "Diverting domestic turmoil," Journal of Public Economics, Elsevier, vol. 208(C).
    17. Lin, Jianhao & Mei, Ziwei & Chen, Liangyuan & Zhu, Chuanqi, 2023. "Is the People's Bank of China consistent in words and deeds?," China Economic Review, Elsevier, vol. 78(C).
    18. Valérie Mignon & Marc Joëts & Bertrand Candelon, 2023. "What Makes Econometric Ideas Popular: The Role of Connectivity," Working Papers hal-04343996, HAL.
    19. Marozzi, Armando, 2021. "The ECB's tracker: nowcasting the press conferences of the ECB," Working Paper Series 2609, European Central Bank.
    20. Daniel Borup & Jorge Wolfgang Hansen & Benjamin Dybro Liengaard & Erik Christian Montes Schütte, 2023. "Quantifying investor narratives and their role during COVID‐19," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 38(4), pages 512-532, June.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2107.08112. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.