IDEAS home Printed from https://ideas.repec.org/p/fip/fedpwp/99851.html
   My bibliography  Save this paper

On the Testability of the Anchor-Words Assumption in Topic Models

Author

Listed:
  • Simon Freyaldenhoven
  • Shikun Ke
  • Dingyi Li
  • Jose Luis Montiel Olea

Abstract

What does the Fed talk about in its monetary policy discussions? We introduce a new statistical methodology to analyze text documents, and we use that methodology to recover the topics discussed during FOMC meetings. Topic models are a simple and popular tool for the statistical analysis of textual data. Their identification and estimation are typically enabled by assuming the existence of anchor words; that is, words that are exclusive to specific topics. In this paper we show that the existence of anchor words is statistically testable: There exists a hypothesis test with correct size that has nontrivial power. This means that the anchor-words assumption cannot be viewed simply as a convenient normalization. Central to our results is a simple characterization of when a column-stochastic matrix with known nonnegative rank admits a separable factorization. We test for the existence of anchor words in two different datasets derived from monetary policy discussions in the Federal Reserve and reject the null hypothesis that anchor words exist in one of them.

Suggested Citation

  • Simon Freyaldenhoven & Shikun Ke & Dingyi Li & Jose Luis Montiel Olea, 2025. "On the Testability of the Anchor-Words Assumption in Topic Models," Working Papers 25-14, Federal Reserve Bank of Philadelphia.
  • Handle: RePEc:fip:fedpwp:99851
    DOI: 10.21799/frbp.wp.2025.14
    as

    Download full text from publisher

    File URL: https://www.philadelphiafed.org/-/media/FRBP/Assets/working-papers/2025/wp25-14.pdf
    Download Restriction: no

    File URL: https://libkey.io/10.21799/frbp.wp.2025.14?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Ivan A. Canay & Andres Santos & Azeem M. Shaikh, 2013. "On the Testability of Identification in Some Nonparametric Models With Endogeneity," Econometrica, Econometric Society, vol. 81(6), pages 2535-2559, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Xiaohong Chen & Andres Santos, 2018. "Overidentification in Regular Models," Econometrica, Econometric Society, vol. 86(5), pages 1771-1817, September.
    2. Jarociński, Marek & Marcet, Albert, 2019. "Priors about observables in vector autoregressions," Journal of Econometrics, Elsevier, vol. 209(2), pages 238-255.
    3. Babii, Andrii, 2020. "Honest Confidence Sets In Nonparametric Iv Regression And Other Ill-Posed Models," Econometric Theory, Cambridge University Press, vol. 36(4), pages 658-706, August.
    4. Joachim Freyberger & Joel L. Horowitz, 2012. "Identification and shape restrictions in nonparametric instrumental variables estimation," CeMMAP working papers CWP15/12, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    5. Rodrigo Adão & Costas Arkolakis & Sharat Ganapati, 2020. "Aggregate Implications of Firm Heterogeneity: A Nonparametric Analysis of Monopolistic Competition Trade Models," Working Papers 2020-161, Becker Friedman Institute for Research In Economics.
    6. Krief, Jerome M., 2017. "Direct instrumental nonparametric estimation of inverse regression functions," Journal of Econometrics, Elsevier, vol. 201(1), pages 95-107.
    7. Andrews, Donald W.K., 2017. "Examples of L2-complete and boundedly-complete distributions," Journal of Econometrics, Elsevier, vol. 199(2), pages 213-220.
    8. Hidehiko Ichimura & Whitney K. Newey, 2022. "The influence function of semiparametric estimators," Quantitative Economics, Econometric Society, vol. 13(1), pages 29-61, January.
    9. Christoph Breunig & Peter Haan, 2018. "Nonparametric Regression with Selectively Missing Covariates," Papers 1810.00411, arXiv.org, revised Oct 2020.
    10. Hu, Yingyao, 2017. "The Econometrics of Unobservables -- Latent Variable and Measurement Error Models and Their Applications in Empirical Industrial Organization and Labor Economics [The Econometrics of Unobservables]," Economics Working Paper Archive 64578, The Johns Hopkins University,Department of Economics, revised 2021.
    11. Yu Zhu, 2020. "Inference in nonparametric/semiparametric moment equality models with shape restrictions," Quantitative Economics, Econometric Society, vol. 11(2), pages 609-636, May.
    12. Hu, Yingyao & Schennach, Susanne & Shiu, Ji-Liang, 2022. "Identification of nonparametric monotonic regression models with continuous nonclassical measurement errors," Journal of Econometrics, Elsevier, vol. 226(2), pages 269-294.
    13. Daniel Wilhelm, 2018. "Testing for the presence of measurement error," CeMMAP working papers CWP45/18, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    14. Centorrino Samuele & Feve Frederique & Florens Jean-Pierre, 2017. "Additive Nonparametric Instrumental Regressions: A Guide to Implementation," Journal of Econometric Methods, De Gruyter, vol. 6(1), pages 1-25, January.
    15. Manuel Arellano & Stéphane Bonhomme, 2016. "Nonlinear panel data estimation via quantile regressions," Econometrics Journal, Royal Economic Society, vol. 19(3), pages 61-94, October.
    16. Ben Deaner, 2019. "Nonparametric Instrumental Variables Estimation Under Misspecification," Papers 1901.01241, arXiv.org, revised Dec 2022.
    17. Joachim Freyberger & Joel L. Horowitz, 2013. "Identification and shape restrictions in nonparametric instrumental variables estimation," CeMMAP working papers CWP31/13, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    18. Yilin Li & Wang Miao & Ilya Shpitser & Eric J. Tchetgen Tchetgen, 2023. "A self‐censoring model for multivariate nonignorable nonmonotone missing data," Biometrics, The International Biometric Society, vol. 79(4), pages 3203-3214, December.
    19. Manuel Arellano & Stéphane Bonhomme, 2017. "Nonlinear Panel Data Methods for Dynamic Heterogeneous Agent Models," Annual Review of Economics, Annual Reviews, vol. 9(1), pages 471-496, September.
    20. Samuele CENTORRINO & Jeffrey S. RACINE, 2017. "Semiparametric Varying Coefficient Models with Endogenous Covariates," Annals of Economics and Statistics, GENES, issue 128, pages 261-295.

    More about this item

    Keywords

    Anchor Words; Topic Models; Nonnegative Matrix Factorization; Hypothesis Testing;
    All these keywords.

    JEL classification:

    • C39 - Mathematical and Quantitative Methods - - Multiple or Simultaneous Equation Models; Multiple Variables - - - Other
    • C55 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Large Data Sets: Modeling and Analysis

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:fip:fedpwp:99851. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Beth Paul (email available below). General contact details of provider: https://edirc.repec.org/data/frbphus.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.