IDEAS home Printed from https://ideas.repec.org/r/cup/polals/v26y2018i02p168-189_00.html
   My bibliography  Save this item

Text Preprocessing For Unsupervised Learning: Why It Matters, When It Misleads, And What To Do About It

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
as


Cited by:

  1. Jason Anastasopoulos & George J. Borjas & Gavin G. Cook & Michael Lachanski, 2018. "Job Vacancies, the Beveridge Curve, and Supply Shocks: The Frequency and Content of Help-Wanted Ads in Pre- and Post-Mariel Miami," NBER Working Papers 24580, National Bureau of Economic Research, Inc.
  2. Mircea Popa, 2025. "Modelling policy action using natural language processing: evidence for a long-run increase in policy activism in the UK," Journal of Computational Social Science, Springer, vol. 8(2), pages 1-51, May.
  3. Michal Ovádek & Nicolas Lampach & Arthur Dyevre, 2020. "What’s the talk in Brussels? Leveraging daily news coverage to measure issue attention in the European Union," European Union Politics, , vol. 21(2), pages 204-232, June.
  4. Javier De la Hoz-M & Mª José Fernández-Gómez & Susana Mendes, 2021. "LDAShiny: An R Package for Exploratory Review of Scientific Literature Based on a Bayesian Probabilistic Model and Machine Learning Tools," Mathematics, MDPI, vol. 9(14), pages 1-21, July.
  5. Yeomans, Michael, 2021. "A concrete example of construct construction in natural language," Organizational Behavior and Human Decision Processes, Elsevier, vol. 162(C), pages 81-94.
  6. Philine Widmer & Cl'ementine Abed Meraim & Sergio Galletta & Elliott Ash, 2022. "Media Slant is Contagious," Papers 2202.07269, arXiv.org, revised Feb 2025.
  7. Miklos Sebők & Zoltán Kacsuk & Ákos Máté, 2022. "The (real) need for a human touch: testing a human–machine hybrid topic classification workflow on a New York Times corpus," Quality & Quantity: International Journal of Methodology, Springer, vol. 56(5), pages 3621-3643, October.
  8. Justyna Klejdysz & Robin L. Lumsdaine, 2023. "Shifts in ECB Communication: A Textual Analysis of the Press Conference," International Journal of Central Banking, International Journal of Central Banking, vol. 19(2), pages 473-542, June.
  9. Anton Berg & Matti Nelimarkka, 2023. "Do you see what I see? Measuring the semantic differences in image‐recognition services' outputs," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 74(11), pages 1307-1324, November.
  10. Camilla Salvatore & Silvia Biffignandi & Annamaria Bianchi, 2024. "Augmenting business statistics information by combining traditional data with textual data: a composite indicator approach," METRON, Springer;Sapienza Università di Roma, vol. 82(1), pages 71-91, April.
  11. Erkan Gunes & Christoffer Koch Florczak, 2025. "Replacing or enhancing the human coder? Multiclass classification of policy documents with large language models," Journal of Computational Social Science, Springer, vol. 8(2), pages 1-20, May.
  12. Ai-Hsuan Chiang & Silvana Trimi & Tun-Chih Kou, 2024. "Critical Factors for Implementing Smart Manufacturing: A Supply Chain Perspective," Sustainability, MDPI, vol. 16(22), pages 1-21, November.
  13. Tyler Andrew Scott & Nicola Ulibarri & Omar Perez Figueroa, 2020. "NEPA and National Trends in Federal Infrastructure Siting in the United States," Review of Policy Research, Policy Studies Organization, vol. 37(5), pages 605-633, September.
  14. Latifi, Albina & Naboka-Krell, Viktoriia & Tillmann, Peter & Winker, Peter, 2024. "Fiscal policy in the Bundestag: Textual analysis and macroeconomic effects," European Economic Review, Elsevier, vol. 168(C).
  15. Tim Schatto-Eckrodt & Robin Janzik & Felix Reer & Svenja Boberg & Thorsten Quandt, 2020. "A Computational Approach to Analyzing the Twitter Debate on Gaming Disorder," Media and Communication, Cogitatio Press, vol. 8(3), pages 205-218.
  16. Bastiaan Bruinsma & Moa Johansson, 2024. "Finding the structure of parliamentary motions in the Swedish Riksdag 1971–2015," Quality & Quantity: International Journal of Methodology, Springer, vol. 58(4), pages 3275-3301, August.
  17. Tim Schatto-Eckrodt & Robin Janzik & Felix Reer & Svenja Boberg & Thorsten Quandt, 2020. "A Computational Approach to Analyzing the Twitter Debate on Gaming Disorder," Media and Communication, Cogitatio Press, vol. 8(3), pages 205-218.
  18. Seraphine F. Maerz & Carsten Q. Schneider, 2020. "Comparing public communication in democracies and autocracies: automated text analyses of speeches by heads of government," Quality & Quantity: International Journal of Methodology, Springer, vol. 54(2), pages 517-545, April.
  19. Mennig, Philipp, 2025. "Who cares about agriculture? Analyzing German parliamentary debates on agriculture and food with structural topic modeling," Food Policy, Elsevier, vol. 130(C).
  20. Jaeho Choi & Anoop Menon & Haris Tabakovic, 2021. "Using machine learning to revisit the diversification–performance relationship," Strategic Management Journal, Wiley Blackwell, vol. 42(9), pages 1632-1661, September.
  21. Purwoko Haryadi Santoso & Edi Istiyono & Haryanto & Wahyu Hidayatulloh, 2022. "Thematic Analysis of Indonesian Physics Education Research Literature Using Machine Learning," Data, MDPI, vol. 7(11), pages 1-41, October.
  22. Andres Algaba & David Ardia & Keven Bluteau & Samuel Borms & Kris Boudt, 2020. "Econometrics Meets Sentiment: An Overview Of Methodology And Applications," Journal of Economic Surveys, Wiley Blackwell, vol. 34(3), pages 512-547, July.
  23. Noailly, Joëlle & Nowzohour, Laura & van den Heuvel, Matthias & Pla, Ireneu, 2024. "Heard the news? Environmental policy and clean investments," Journal of Public Economics, Elsevier, vol. 238(C).
  24. Mohamed M. Mostafa, 2023. "A one-hundred-year structural topic modeling analysis of the knowledge structure of international management research," Quality & Quantity: International Journal of Methodology, Springer, vol. 57(4), pages 3905-3935, August.
  25. repec:osf:socarx:yfzsh_v1 is not listed on IDEAS
  26. Hung, Shih-Chang & Chang, Shu-Chen, 2023. "Framing the virus: The political, economic, biomedical and social understandings of the COVID-19 in Taiwan," Technological Forecasting and Social Change, Elsevier, vol. 188(C).
  27. Jaehwan LIM & Asei ITO & Hongyong ZHANG, 2023. "Policy Agenda and Trajectory of the Xi Jinping Administration: Textual Evidence from 2012 to 2022," Policy Discussion Papers 23008, Research Institute of Economy, Trade and Industry (RIETI).
  28. Silva, Lucas Emmanuel Nascimento & Gomes, Leonardo Augusto de Vasconcelos & Faria, Aline Mariane de & Borini, Felipe Mendes, 2024. "Innovation processes in ecosystem settings: An integrative framework and future directions," Technovation, Elsevier, vol. 132(C).
  29. Camilla Salvatore & Silvia Biffignandi & Annamaria Bianchi, 2022. "Corporate Social Responsibility Activities Through Twitter: From Topic Model Analysis to Indexes Measuring Communication Characteristics," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 164(3), pages 1217-1248, December.
  30. Lorien Stice-Lawrence, 2022. "Practical issues to consider when working with big data," Review of Accounting Studies, Springer, vol. 27(3), pages 1117-1124, September.
  31. Swarnalakshmi Umamaheswaran & Vandita Dar & Jagadish Thaker, 2022. "The Evolution of Climate Change Reporting in Business Media: Longitudinal Analysis of a Business Newspaper," Sustainability, MDPI, vol. 14(22), pages 1-21, November.
  32. Kim, Da Yeon & Kim, Sang Yong, 2022. "The impact of customer-generated evaluation information on sales in online platform-based markets," Journal of Retailing and Consumer Services, Elsevier, vol. 68(C).
  33. Paweł Matuszewski, 2023. "How to prepare data for the automatic classification of politically related beliefs expressed on Twitter? The consequences of researchers’ decisions on the number of coders, the algorithm learning pro," Quality & Quantity: International Journal of Methodology, Springer, vol. 57(1), pages 301-321, February.
  34. Karell, Daniel & Freedman, Michael Raphael, 2019. "Rhetorics of Radicalism," SocArXiv yfzsh, Center for Open Science.
  35. Josephine Lukito & Jason Greenfield & Yunkang Yang & Ross Dahlke & Megan A. Brown & Rebecca Lewis & Bin Chen, 2024. "Audio-as-Data Tools: Replicating Computational Data Processing," Media and Communication, Cogitatio Press, vol. 12.
  36. Wang, Yan & Luo, Ting, 2023. "Politicizing for the idol: China’s idol fandom nationalism in pandemic," LSE Research Online Documents on Economics 117741, London School of Economics and Political Science, LSE Library.
  37. W. Benedikt Schmal, 2024. "Academic Knowledge: Does it Reflect the Combinatorial Growth of Technology?," Papers 2409.20282, arXiv.org.
  38. Iasmin Goes, 2023. "Examining the effect of IMF conditionality on natural resource policy," Economics and Politics, Wiley Blackwell, vol. 35(1), pages 227-285, March.
  39. Camilla Salvatore, 2023. "Inference with non-probability samples and survey data integration: a science mapping study," METRON, Springer;Sapienza Università di Roma, vol. 81(1), pages 83-107, April.
IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.