IDEAS home Printed from https://ideas.repec.org/a/eee/ejores/v299y2022i2p653-673.html
   My bibliography  Save this article

Acceptable set topic modeling

Author

Listed:
  • Berk Wheelock, Lauren
  • Pachamanova, Dessislava A.

Abstract

Topic modeling is a significant branch of natural language processing and machine learning focused on inferring the generative process of text. Traditionally, algorithms for estimating topic models have relied on Bayesian inference and Gibbs sampling. This paper proposes a novel “acceptable set” framework for formulating topic modeling problems inspired by ideas from discrete component analysis and data-driven robust optimization. Our approach not only simplifies the design and inference of topic models, but also allows for extensions and generalizations that are challenging to integrate into traditional approaches. Different restrictions (e.g., sparsity) and assumptions (e.g., alternative generative processes) can be easily incorporated into our formulations through additional or modified constraints. Our formulations also naturally control a widely used metric of solution quality, perplexity. We adapt state-of-the-art stochastic gradient methods to find good local optima for the optimization formulations. The algorithms are efficient, scaling to realistic problem sizes with runtimes comparable to existing methods. Through extensive computational experiments, we show that our methods have improved solution quality compared to baseline methods and reconstruct more reliably the underlying generative models. Our framework overcomes known vulnerabilities of traditional topic modeling algorithms: our methods are effective in low-data settings, register good out-of-sample performance, and perform well for a variety of initial assumptions on input parameter values.

Suggested Citation

  • Berk Wheelock, Lauren & Pachamanova, Dessislava A., 2022. "Acceptable set topic modeling," European Journal of Operational Research, Elsevier, vol. 299(2), pages 653-673.
  • Handle: RePEc:eee:ejores:v:299:y:2022:i:2:p:653-673
    DOI: 10.1016/j.ejor.2021.11.024
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0377221721009759
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.ejor.2021.11.024?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Grimmer, Justin, 2010. "A Bayesian Hierarchical Topic Model for Political Texts: Measuring Expressed Agendas in Senate Press Releases," Political Analysis, Cambridge University Press, vol. 18(1), pages 1-35, January.
    2. Margaret Roberts & Brandon Stewart & Tingley, Dustin & Edoardo Airoldi, 2013. "The structural topic model and applied social science," Working Paper 132666, Harvard University OpenScholar.
    3. Erzurumlu, S. Sinan & Pachamanova, Dessislava, 2020. "Topic modeling and technology forecasting for assessing the commercial viability of healthcare innovations," Technological Forecasting and Social Change, Elsevier, vol. 156(C).
    4. Dimitris Bertsimas & Aurélie Thiele, 2006. "A Robust Optimization Approach to Inventory Theory," Operations Research, INFORMS, vol. 54(1), pages 150-168, February.
    5. Lucas, Christopher & Nielsen, Richard A. & Roberts, Margaret E. & Stewart, Brandon M. & Storer, Alex & Tingley, Dustin, 2015. "Computer-Assisted Text Analysis for Comparative Politics," Political Analysis, Cambridge University Press, vol. 23(2), pages 254-277, April.
    6. Gabrel, Virginie & Murat, Cécile & Thiele, Aurélie, 2014. "Recent advances in robust optimization: An overview," European Journal of Operational Research, Elsevier, vol. 235(3), pages 471-483.
    7. Babak Zafari & Tahir Ekin, 2019. "Topic modelling for medical prescription fraud and abuse detection," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 68(3), pages 751-769, April.
    8. Aharon Ben-Tal & Dick den Hertog & Anja De Waegenaere & Bertrand Melenberg & Gijs Rennen, 2013. "Robust Solutions of Optimization Problems Affected by Uncertain Probabilities," Management Science, INFORMS, vol. 59(2), pages 341-357, April.
    9. Dimitris Bertsimas & David B. Brown, 2009. "Constructing Uncertainty Sets for Robust Linear Optimization," Operations Research, INFORMS, vol. 57(6), pages 1483-1495, December.
    10. Karthik Natarajan & Dessislava Pachamanova & Melvyn Sim, 2009. "Constructing Risk Measures from Uncertainty Sets," Operations Research, INFORMS, vol. 57(5), pages 1129-1141, October.
    11. Dimitris Bertsimas & Angela King, 2016. "OR Forum—An Algorithmic Approach to Linear Regression," Operations Research, INFORMS, vol. 64(1), pages 2-16, February.
    12. Sebastián Ceria & Robert A Stubbs, 2006. "Incorporating estimation errors into portfolio selection: Robust portfolio construction," Journal of Asset Management, Palgrave Macmillan, vol. 7(2), pages 109-127, July.
    13. Li, Xin & Xie, Qianqian & Daim, Tugrul & Huang, Lucheng, 2019. "Forecasting technology trends using text mining of the gaps between science and technology: The case of perovskite solar cell technology," Technological Forecasting and Social Change, Elsevier, vol. 146(C), pages 432-449.
    14. Edoardo M. Airoldi & Jonathan M. Bischof, 2016. "Improving and Evaluating Topic Models and Other Models of Text," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(516), pages 1381-1403, October.
    15. Scott Deerwester & Susan T. Dumais & George W. Furnas & Thomas K. Landauer & Richard Harshman, 1990. "Indexing by latent semantic analysis," Journal of the American Society for Information Science, Association for Information Science & Technology, vol. 41(6), pages 391-407, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Mengshi Lu & Zuo‐Jun Max Shen, 2021. "A Review of Robust Operations Management under Model Uncertainty," Production and Operations Management, Production and Operations Management Society, vol. 30(6), pages 1927-1943, June.
    2. Postek, K.S. & den Hertog, D. & Melenberg, B., 2015. "Computationally Tractable Counterparts of Distributionally Robust Constraints on Risk Measures (revision of CentER DP 2014-031)," Discussion Paper 2015-047, Tilburg University, Center for Economic Research.
    3. Andrew J. Keith & Darryl K. Ahner, 2021. "A survey of decision making and optimization under uncertainty," Annals of Operations Research, Springer, vol. 300(2), pages 319-353, May.
    4. Pavel Bazovkin & Karl Mosler, 2015. "A general solution for robust linear programs with distortion risk constraints," Annals of Operations Research, Springer, vol. 229(1), pages 103-120, June.
    5. Postek, K.S. & den Hertog, D. & Melenberg, B., 2015. "Computationally Tractable Counterparts of Distributionally Robust Constraints on Risk Measures (revision of CentER DP 2014-031)," Other publications TiSEM eeb9c898-6943-4199-b747-3, Tilburg University, School of Economics and Management.
    6. Qiu, Ruozhen & Sun, Minghe & Lim, Yun Fong, 2017. "Optimizing (s, S) policies for multi-period inventory models with demand distribution uncertainty: Robust dynamic programing approaches," European Journal of Operational Research, Elsevier, vol. 261(3), pages 880-892.
    7. Wolfram Wiesemann & Daniel Kuhn & Melvyn Sim, 2014. "Distributionally Robust Convex Optimization," Operations Research, INFORMS, vol. 62(6), pages 1358-1376, December.
    8. Claire Nicolas & Stéphane Tchung-Ming & Emmanuel Hache, 2016. "Energy transition in transportation under cost uncertainty, an assessment based on robust optimization," Working Papers hal-02475943, HAL.
    9. Gabrel, Virginie & Murat, Cécile & Thiele, Aurélie, 2014. "Recent advances in robust optimization: An overview," European Journal of Operational Research, Elsevier, vol. 235(3), pages 471-483.
    10. Erzurumlu, S. Sinan & Pachamanova, Dessislava, 2020. "Topic modeling and technology forecasting for assessing the commercial viability of healthcare innovations," Technological Forecasting and Social Change, Elsevier, vol. 156(C).
    11. Fernandes, Betina & Street, Alexandre & Valladão, Davi & Fernandes, Cristiano, 2016. "An adaptive robust portfolio optimization model with loss constraints based on data-driven polyhedral uncertainty sets," European Journal of Operational Research, Elsevier, vol. 255(3), pages 961-970.
    12. Chassein, André & Goerigk, Marc, 2018. "Compromise solutions for robust combinatorial optimization with variable-sized uncertainty," European Journal of Operational Research, Elsevier, vol. 269(2), pages 544-555.
    13. Mehdi Ansari & Juan S. Borrero & Leonardo Lozano, 2023. "Robust Minimum-Cost Flow Problems Under Multiple Ripple Effect Disruptions," INFORMS Journal on Computing, INFORMS, vol. 35(1), pages 83-103, January.
    14. Choi, Hyunhong & Woo, JongRoul, 2022. "Investigating emerging hydrogen technology topics and comparing national level technological focus: Patent analysis using a structural topic model," Applied Energy, Elsevier, vol. 313(C).
    15. Pengyu Qian & Zizhuo Wang & Zaiwen Wen, 2015. "A Composite Risk Measure Framework for Decision Making under Uncertainty," Papers 1501.01126, arXiv.org.
    16. Marla, Lavanya & Rikun, Alexander & Stauffer, Gautier & Pratsini, Eleni, 2020. "Robust modeling and planning: Insights from three industrial applications," Operations Research Perspectives, Elsevier, vol. 7(C).
    17. Sandra Cruz Caçador & Pedro Manuel Cortesão Godinho & Joana Maria Pina Cabral Matos Dias, 2022. "A minimax regret portfolio model based on the investor’s utility loss," Operational Research, Springer, vol. 22(1), pages 449-484, March.
    18. Ruiz, C. & Conejo, A.J., 2015. "Robust transmission expansion planning," European Journal of Operational Research, Elsevier, vol. 242(2), pages 390-401.
    19. Daphné Lorne & Stéphane Tchung-Ming, 2012. "The French biofuels mandates under cost uncertainty - an assesment based on robust optimization," Working Papers hal-03206367, HAL.
    20. Matthew Gentzkow & Bryan T. Kelly & Matt Taddy, 2017. "Text as Data," NBER Working Papers 23276, National Bureau of Economic Research, Inc.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:ejores:v:299:y:2022:i:2:p:653-673. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/eor .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.