IDEAS home Printed from https://ideas.repec.org/a/bla/stratm/v44y2023i2p491-519.html
   My bibliography  Save this article

Using supervised machine learning for large‐scale classification in management research: The case for identifying artificial intelligence patents

Author

Listed:
  • Milan Miric
  • Nan Jia
  • Kenneth G. Huang

Abstract

Research Summary Researchers increasingly use unstructured text data to construct quantitative variables for analysis. This goal has traditionally been achieved using keyword‐based approaches, which require researchers to specify a dictionary of keywords mapped to the theoretical concepts of interest. However, recent machine learning (ML) tools for text classification and natural language processing can be used to construct quantitative variables and to classify unstructured text documents. In this paper, we demonstrate how to employ ML tools for this purpose and discuss one application for identifying artificial intelligence (AI) technologies in patents. We compare and contrast various ML methods with the keyword‐based approach, demonstrating the advantages of the ML approach. We also leverage the classification outcomes generated by ML models to demonstrate general patterns of AI technological innovation development. Managerial Summary Text‐based documents offer a wealth of information for researchers and business analysts. However, researchers often need to find a way to classify these documents to use in subsequent research projects. In this paper, we demonstrate how supervised ML methods can be used to automate the process of classifying textual documents into pre‐defined categories or groups. We provide an overview of when such techniques may be used in comparison to other methods, and the considerations and tradeoffs associated with each method. We apply these methods to identify AI‐based technologies from all patents in the United States, based on patent abstract text. This allows us to show interesting patterns of AI innovation development in the United States. We also provide the code and data used in this paper for future research.

Suggested Citation

  • Milan Miric & Nan Jia & Kenneth G. Huang, 2023. "Using supervised machine learning for large‐scale classification in management research: The case for identifying artificial intelligence patents," Strategic Management Journal, Wiley Blackwell, vol. 44(2), pages 491-519, February.
  • Handle: RePEc:bla:stratm:v:44:y:2023:i:2:p:491-519
    DOI: 10.1002/smj.3441
    as

    Download full text from publisher

    File URL: https://doi.org/10.1002/smj.3441
    Download Restriction: no

    File URL: https://libkey.io/10.1002/smj.3441?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Daron Acemoglu & Pascual Restrepo, 2019. "Automation and New Tasks: How Technology Displaces and Reinstates Labor," Journal of Economic Perspectives, American Economic Association, vol. 33(2), pages 3-30, Spring.
    2. Bronwyn H. Hall & Adam Jaffe & Manuel Trajtenberg, 2005. "Market Value and Patent Citations," RAND Journal of Economics, The RAND Corporation, vol. 36(1), pages 16-38, Spring.
    3. Stefan Wager & Susan Athey, 2018. "Estimation and Inference of Heterogeneous Treatment Effects using Random Forests," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(523), pages 1228-1242, July.
    4. Gambardella, Alfonso & Giuri, Paola & Luzzi, Alessandra, 2007. "The market for patents in Europe," Research Policy, Elsevier, vol. 36(8), pages 1163-1183, October.
    5. Sarah Kaplan & Keyvan Vakili, 2015. "The double-edged sword of recombination in breakthrough innovation," Strategic Management Journal, Wiley Blackwell, vol. 36(10), pages 1435-1457, October.
    6. Alexandre Belloni & Victor Chernozhukov & Christian Hansen, 2014. "Inference on Treatment Effects after Selection among High-Dimensional Controlsâ€," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 81(2), pages 608-650.
    7. Jorge Guzman & Scott Stern, 2020. "The State of American Entrepreneurship: New Estimates of the Quantity and Quality of Entrepreneurship for 32 US States, 1988–2014," American Economic Journal: Economic Policy, American Economic Association, vol. 12(4), pages 212-243, November.
    8. Hausman, Jerry & Hall, Bronwyn H & Griliches, Zvi, 1984. "Econometric Models for Count Data with an Application to the Patents-R&D Relationship," Econometrica, Econometric Society, vol. 52(4), pages 909-938, July.
    9. King, Gary & Zeng, Langche, 2001. "Logistic Regression in Rare Events Data," Political Analysis, Cambridge University Press, vol. 9(2), pages 137-163, January.
    10. Daniel P. Gross, 2020. "Creativity Under Fire: The Effects of Competition on Creative Production," The Review of Economics and Statistics, MIT Press, vol. 102(3), pages 583-599, July.
    11. Yash Raj Shrestha & Vivianna Fang He & Phanish Puranam & Georg von Krogh, 2021. "Algorithm Supported Induction for Building Theory: How Can We Use Prediction Models to Theorize?," Organization Science, INFORMS, vol. 32(3), pages 856-880, May.
    12. Joshua S. Gans & David H. Hsu & Scott Stern, 2008. "The Impact of Uncertain Intellectual Property Rights on the Market for Ideas: Evidence from Patent Grant Delays," Management Science, INFORMS, vol. 54(5), pages 982-997, May.
    13. Mengke Qiao & Ke-Wei Huang, 2021. "Correcting Misclassification Bias in Regression Models with Variables Generated via Data Mining," Information Systems Research, INFORMS, vol. 32(2), pages 462-480, June.
    14. Jason M. Rathje & Riitta Katila, 2021. "Enabling Technologies and the Role of Private Firms: A Machine Learning Matching Analysis," Strategy Science, INFORMS, vol. 6(1), pages 5-21, March.
    15. Miric, Milan & Boudreau, Kevin J. & Jeppesen, Lars Bo, 2019. "Protecting their digital assets: The use of formal & informal appropriability strategies by App developers," Research Policy, Elsevier, vol. 48(8), pages 1-1.
    16. Milan Miric & Margherita Pagani & Omar El Sawy, 2021. "When and Who Do Platform Companies Acquire?," Post-Print hal-03188254, HAL.
    17. Jeffrey L. Furman & Florenta Teodoridis, 2020. "Automation, Research Technology, and Researchers’ Trajectories: Evidence from Computer Science and Electrical Engineering," Organization Science, INFORMS, vol. 31(2), pages 330-354, March.
    18. Prithwiraj Choudhury & Evan Starr & Rajshree Agarwal, 2020. "Machine learning and human capital complementarities: Experimental evidence on bias mitigation," Strategic Management Journal, Wiley Blackwell, vol. 41(8), pages 1381-1411, August.
    19. Ron Tidhar & Kathleen M. Eisenhardt, 2020. "Get rich or die trying… finding revenue model fit using machine learning and multiple cases," Strategic Management Journal, Wiley Blackwell, vol. 41(7), pages 1245-1273, July.
    20. Huang, Kenneth G. & Murray, Fiona E., 2010. "Entrepreneurial experiments in science policy: Analyzing the Human Genome Project," Research Policy, Elsevier, vol. 39(5), pages 567-582, June.
    21. Prithwiraj Choudhury & Ryan T. Allen & Michael G. Endres, 2021. "Machine learning for pattern discovery in management research," Strategic Management Journal, Wiley Blackwell, vol. 42(1), pages 30-57, January.
    22. Edward Felten & Manav Raj & Robert Seamans, 2021. "Occupational, industry, and geographic exposure to artificial intelligence: A novel dataset and its potential uses," Strategic Management Journal, Wiley Blackwell, vol. 42(12), pages 2195-2217, December.
    23. Prithwiraj Choudhury & Do Yoon Kim, 2019. "The ethnic migrant inventor effect: Codification and recombination of knowledge across borders," Strategic Management Journal, Wiley Blackwell, vol. 40(2), pages 203-229, February.
    24. Kenneth G. Huang & Fiona E. Murray, 2010. "Entrepreneurial Experiments in Science Policy: Analizing the Human Genome Project," DRUID Working Papers 10-22, DRUID, Copenhagen Business School, Department of Industrial Economics and Strategy/Aalborg University, Department of Business Studies.
    25. Prithwiraj Choudhury & Dan Wang & Natalie A. Carlson & Tarun Khanna, 2019. "Machine learning approaches to facial and text analysis: Discovering CEO oral communication styles," Strategic Management Journal, Wiley Blackwell, vol. 40(11), pages 1705-1732, November.
    26. Theresa S. Cho & Donald C. Hambrick, 2006. "Attention as the Mediator Between Top Management Team Characteristics and Strategic Change: The Case of Airline Deregulation," Organization Science, INFORMS, vol. 17(4), pages 453-469, August.
    27. Vivianna Fang He & Phanish Puranam & Yash Raj Shrestha & Georg von Krogh, 2020. "Resolving governance disputes in communities: A study of software license decisions," Strategic Management Journal, Wiley Blackwell, vol. 41(10), pages 1837-1868, October.
    28. Lee Fleming & Olav Sorenson, 2004. "Science as a map in technological search," Strategic Management Journal, Wiley Blackwell, vol. 25(8‐9), pages 909-928, August.
    29. Mochen Yang & Gediminas Adomavicius & Gordon Burtch & Yuqing Rena, 2018. "Mind the Gap: Accounting for Measurement Error and Misclassification in Variables Generated via Data Mining," Information Systems Research, INFORMS, vol. 29(1), pages 4-24, March.
    30. Iain M. Cockburn & Rebecca Henderson & Scott Stern, 2018. "The Impact of Artificial Intelligence on Innovation," NBER Working Papers 24449, National Bureau of Economic Research, Inc.
    31. Siliang Tong & Nan Jia & Xueming Luo & Zheng Fang, 2021. "The Janus face of artificial intelligence feedback: Deployment versus disclosure effects on employee performance," Strategic Management Journal, Wiley Blackwell, vol. 42(9), pages 1600-1631, September.
    32. Jaeho Choi & Anoop Menon & Haris Tabakovic, 2021. "Using machine learning to revisit the diversification–performance relationship," Strategic Management Journal, Wiley Blackwell, vol. 42(9), pages 1632-1661, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Constance E. Helfat & Aseem Kaul & David J. Ketchen & Jay B. Barney & Olivier Chatain & Harbir Singh, 2023. "Renewing the resource‐based view: New contexts, new concepts, and new methods," Strategic Management Journal, Wiley Blackwell, vol. 44(6), pages 1357-1390, June.
    2. Prithwiraj Choudhury & Ryan T. Allen & Michael G. Endres, 2021. "Machine learning for pattern discovery in management research," Strategic Management Journal, Wiley Blackwell, vol. 42(1), pages 30-57, January.
    3. Prothit Sen & Phanish Puranam, 2022. "Do Alliance portfolios encourage or impede new business practice adoption? Theory and evidence from the private equity industry," Strategic Management Journal, Wiley Blackwell, vol. 43(11), pages 2279-2312, November.
    4. Majid Majzoubi & Eric Yanfei Zhao, 2023. "Going beyond optimal distinctiveness: Strategic positioning for gaining an audience composition premium," Strategic Management Journal, Wiley Blackwell, vol. 44(3), pages 737-777, March.
    5. Novelli, Elena, 2015. "An examination of the antecedents and implications of patent scope," Research Policy, Elsevier, vol. 44(2), pages 493-507.
    6. Lu, Qinli & Chesbrough, Henry, 2022. "Measuring open innovation practices through topic modelling: Revisiting their impact on firm financial performance," Technovation, Elsevier, vol. 114(C).
    7. Kenneth G Huang & Jiatao Li, 2019. "Adopting knowledge from reverse innovations? Transnational patents and signaling from an emerging economy," Journal of International Business Studies, Palgrave Macmillan;Academy of International Business, vol. 50(7), pages 1078-1102, September.
    8. Yi Yang & Kunpeng Zhang & Yangyang Fan, 2023. "sDTM: A Supervised Bayesian Deep Topic Model for Text Analytics," Information Systems Research, INFORMS, vol. 34(1), pages 137-156, March.
    9. Yuchen Zhang & Wei Yang, 2022. "Breakthrough invention and problem complexity: Evidence from a quasi‐experiment," Strategic Management Journal, Wiley Blackwell, vol. 43(12), pages 2510-2544, December.
    10. Montobbio, Fabio & Staccioli, Jacopo & Virgillito, Maria Enrica & Vivarelli, Marco, 2022. "Robots and the origin of their labour-saving impact," Technological Forecasting and Social Change, Elsevier, vol. 174(C).
    11. Fischer, Timo & Henkel, Joachim, 2012. "Patent trolls on markets for technology – An empirical analysis of NPEs’ patent acquisitions," Research Policy, Elsevier, vol. 41(9), pages 1519-1533.
    12. Tubiana, Matteo & Miguelez, Ernest & Moreno, Rosina, 2022. "In knowledge we trust: Learning-by-interacting and the productivity of inventors," Research Policy, Elsevier, vol. 51(1).
    13. Lee, Jangwook & Chung, Jiyoon, 2022. "Women in top management teams and their impact on innovation," Technological Forecasting and Social Change, Elsevier, vol. 183(C).
    14. Grimpe, Christoph & Sofka, Wolfgang, 2016. "Complementarities in the search for innovation—Managing markets and relationships," Research Policy, Elsevier, vol. 45(10), pages 2036-2053.
    15. Choi, Jin-Uk & Lee, Chang-Yang, 2022. "The differential effects of basic research on firm R&D productivity: The conditioning role of technological diversification," Technovation, Elsevier, vol. 118(C).
    16. David H. Hsu & Kwanghui Lim, 2014. "Knowledge Brokering and Organizational Innovation: Founder Imprinting Effects," Organization Science, INFORMS, vol. 25(4), pages 1134-1153, August.
    17. Apa, Roberta & De Noni, Ivan & Orsi, Luigi & Sedita, Silvia Rita, 2018. "Knowledge space oddity: How to increase the intensity and relevance of the technological progress of European regions," Research Policy, Elsevier, vol. 47(9), pages 1700-1712.
    18. Ugo Rizzo & Nicolò Barbieri & Laura Ramaciotti & Demian Iannantuono, 2020. "The division of labour between academia and industry for the generation of radical inventions," The Journal of Technology Transfer, Springer, vol. 45(2), pages 393-413, April.
    19. Antonio Messeni Petruzzelli & Daniele Rotolo & Vito Albino, 2014. "Determinants of Patent Citations in Biotechnology: An Analysis of Patent Influence Across the Industrial and Organizational Boundaries," SPRU Working Paper Series 2014-05, SPRU - Science Policy Research Unit, University of Sussex Business School.
    20. Laurie Ciaramella & Catalina Martínez & Yann Ménière, 2017. "Tracking patent transfers in different European countries: methods and a first application to medical technologies," Scientometrics, Springer;Akadémiai Kiadó, vol. 112(2), pages 817-850, August.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:stratm:v:44:y:2023:i:2:p:491-519. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://onlinelibrary.wiley.com/journal/10.1111/0143-2095 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.