IDEAS home Printed from https://ideas.repec.org/p/iip/wpaper/31.html

A dataset of scientific citations in U.S. patent Office Actions

Author

Listed:
  • Kyle Higham

    (Motu Economic and Public Policy Research)

  • Hannah Kotula

    (Motu Economic and Public Policy Research)

  • Emma Scharfmann

    (University of California, Berkeley)

  • Steve Gong

    (Google)

  • Gaétan de Rassenfosse

    (Ecole polytechnique fédérale de Lausanne)

Abstract

We present a curated dataset of about 850,000 citations extracted from Office Actions issued by examiners at the United States Patent and Trademark Office. These references, historically underused due to accessibility challenges, provide a granular view into the patent examination process and complement traditional front-page citation data. We classify each citation into one of 14 categories and focus on the 265,000 references to scientific literature, which we parse, clean, and disambiguate using machine learning and external bibliographic services. To enhance reusability, disambiguated records are linked to OpenAlex, a comprehensive research metadata platform. The dataset enables new research on examiner behavior, science–technology linkages, and the construction of citation-based metrics. All data and code are openly available to facilitate reuse across disciplines.

Suggested Citation

  • Kyle Higham & Hannah Kotula & Emma Scharfmann & Steve Gong & Gaétan de Rassenfosse, 2026. "A dataset of scientific citations in U.S. patent Office Actions," Working Papers 31, Chair of Science, Technology, and Innovation Policy.
  • Handle: RePEc:iip:wpaper:31
    as

    Download full text from publisher

    File URL: https://cdm-repec.epfl.ch/iip-wpaper/WP31.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Matt Marx & Aaron Fuegi, 2022. "Reliance on science by inventors: Hybrid extraction of in‐text patent‐to‐article citations," Journal of Economics & Management Strategy, Wiley Blackwell, vol. 31(2), pages 369-392, April.
    2. Monica Berger, 2021. "Bibliodiversity at the Centre: Decolonizing Open Access," Development and Change, International Institute of Social Studies, vol. 52(2), pages 383-404, March.
    3. Cristian Candia & C. Jara-Figueroa & Carlos Rodriguez-Sickert & Albert-László Barabási & César A. Hidalgo, 2019. "The universal decay of collective memory and attention," Nature Human Behaviour, Nature, vol. 3(1), pages 82-91, January.
    4. Manuel Trajtenberg, 1990. "A Penny for Your Quotes: Patent Citations and the Value of Innovations," RAND Journal of Economics, The RAND Corporation, vol. 21(1), pages 172-187, Spring.
    5. Mengxue Zheng & Lili Miao & Yi Bu & Vincent Larivière, 2025. "Understanding discrepancies in the coverage of OpenAlex: The case of China," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 76(11), pages 1591-1601, November.
    6. Adam B. Jaffe & Manuel Trajtenberg & Rebecca Henderson, 1993. "Geographic Localization of Knowledge Spillovers as Evidenced by Patent Citations," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 108(3), pages 577-598.
    7. Florian Seliger & Gaéran de Rassenfosse & Jan Kozak, 2019. "Geocoding of worldwide patent data," KOF Working papers 19-458, KOF Swiss Economic Institute, ETH Zurich.
    8. Martin Meyer, 2000. "What is Special about Patent Citations? Differences between Scientific and Patent Citations," Scientometrics, Springer;Akadémiai Kiadó, vol. 49(1), pages 93-123, August.
    9. Shiu-Wan Hung & An-Pang Wang, 2010. "Examining the small world phenomenon in the patent citation network: a case study of the radio frequency identification (RFID) network," Scientometrics, Springer;Akadémiai Kiadó, vol. 82(1), pages 121-134, January.
    10. Lucía Céspedes & Diego Kozlowski & Carolina Pradier & Maxime Holmberg Sainte‐Marie & Natsumi Solange Shokida & Pierre Benz & Constance Poitras & Anton Boudreau Ninkov & Saeideh Ebrahimy & Philips Ayen, 2025. "Evaluating the linguistic coverage of OpenAlex: An assessment of metadata accuracy and completeness," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 76(6), pages 884-895, June.
    11. Michael Gusenbauer, 2022. "Search where you will find most: Comparing the disciplinary coverage of 56 bibliographic databases," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(5), pages 2683-2745, May.
    12. Kevin A. Bryan & Yasin Ozcan, 2021. "The Impact of Open Access Mandates on Invention," The Review of Economics and Statistics, MIT Press, vol. 103(5), pages 954-967, December.
    13. Adam B. Jaffe & Gaétan de Rassenfosse, 2017. "Patent citation data in social science research: Overview and best practices," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 68(6), pages 1360-1374, June.
    14. Matt Marx & Aaron Fuegi, 2020. "Reliance on science: Worldwide front‐page patent citations to scientific articles," Strategic Management Journal, Wiley Blackwell, vol. 41(9), pages 1572-1594, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Verluise, Cyril & Cristelli, Gabriele & Higham, Kyle & de Rassenfosse, Gaetan, 2020. "The Missing 15 Percent of Patent Citations," SocArXiv x78ys, Center for Open Science.
    2. Denter, Nils M. & Waterstraat, Joe & Moehrle, Martin G., 2025. "Avoiding the pitfalls of direct linkage: A novelty-driven approach to measuring scientific impact on patents," Journal of Informetrics, Elsevier, vol. 19(2).
    3. repec:osf:socarx:x78ys_v1 is not listed on IDEAS
    4. Bryan, Kevin A. & Ozcan, Yasin & Sampat, Bhaven, 2020. "In-text patent citations: A user's guide," Research Policy, Elsevier, vol. 49(4).
    5. Yuandi Wang & Xiongfeng Pan & Yantai Chen & Xin Gu, 2013. "Do references in transferred patent documents signal learning opportunities for the receiving firms?," Scientometrics, Springer;Akadémiai Kiadó, vol. 95(2), pages 731-752, May.
    6. Adam B. Jaffe & Gaétan de Rassenfosse, 2017. "Patent citation data in social science research: Overview and best practices," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 68(6), pages 1360-1374, June.
    7. Wang, Fang, 2024. "Does the recombination of distant scientific knowledge generate valuable inventions? An analysis of pharmaceutical patents," Technovation, Elsevier, vol. 130(C).
    8. Fernández, Ana María & Ferrándiz, Esther & Medina, Jennifer, 2022. "The diffusion of energy technologies. Evidence from renewable, fossil, and nuclear energy patents," MPRA Paper 123361, University Library of Munich, Germany.
    9. Fernández, Ana María & Ferrándiz, Esther & Medina, Jennifer, 2022. "The diffusion of energy technologies. Evidence from renewable, fossil, and nuclear energy patents," Technological Forecasting and Social Change, Elsevier, vol. 178(C).
    10. Manuel Acosta & Daniel Coronado & Esther Ferrándiz & Manuel Jiménez, 2022. "Effects of knowledge spillovers between competitors on patent quality: what patent citations reveal about a global duopoly," The Journal of Technology Transfer, Springer, vol. 47(5), pages 1451-1487, October.
    11. Inchae Park & Yujin Jeong & Byungun Yoon, 2017. "Analyzing the value of technology based on the differences of patent citations between applicants and examiners," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(2), pages 665-691, May.
    12. Bahar, Dany & Choudhury, Prithwiraj & Miguelez, Ernest & Signorelli, Sara, 2024. "Global Mobile Inventors," Journal of Development Economics, Elsevier, vol. 171(C).
    13. Balsmeier, B. & Lück, S. & Fleming, L., 2025. "Science knowledge localizes," Research Policy, Elsevier, vol. 54(10).
    14. Ahmad Barirani & Bruno Agard & Catherine Beaudry, 2013. "Discovering and assessing fields of expertise in nanomedicine: a patent co-citation network perspective," Scientometrics, Springer;Akadémiai Kiadó, vol. 94(3), pages 1111-1136, March.
    15. Ron Boschma & Ernest Miguelez & Rosina Moreno & Diego B. Ocampo-Corrales, 2021. "Technological breakthroughs in European regions: the role of related and unrelated combinations," Papers in Evolutionary Economic Geography (PEEG) 2118, Utrecht University, Department of Human Geography and Spatial Planning, Group Economic Geography, revised Jun 2021.
    16. Christian Rutzer & Dragan Filimonovic & Jeffrey T. Macher & Rolf Weder, 2026. "Towards Measuring Disruptive Innovation Across Countries," Papers 2603.17881, arXiv.org.
    17. Francesco Lissoni & Ernest Miguelez, 2024. "Migration and Innovation: Learning from Patent and Inventor Data," Journal of Economic Perspectives, American Economic Association, vol. 38(1), pages 27-54, Winter.
    18. Sandro Montresor & Gianluca Orsatti & Francesco Quatraro, 2023. "Technological novelty and key enabling technologies: evidence from European regions," Economics of Innovation and New Technology, Taylor & Francis Journals, vol. 32(6), pages 851-872, August.
    19. Gianluca ORSATTI, 2019. "Public R&D and green knowledge diffusion:\r\nEvidence from patent citation data," Cahiers du GREThA (2007-2019) 2019-17, Groupe de Recherche en Economie Théorique et Appliquée (GREThA).
    20. Sandro Mendonca & Hugo Confraria & Manuel Mira Godinho, 2021. "Appropriating the returns of patent statistics: Take-up and development in the wake of Zvi Griliches," SPRU Working Paper Series 2021-07, SPRU - Science Policy Research Unit, University of Sussex Business School.
    21. Satoshi Yasukawa & Shingo Kano, 2014. "Validating the usefulness of examiners’ forward citations from the viewpoint of applicants’ self-selection during the patent application procedure," Scientometrics, Springer;Akadémiai Kiadó, vol. 99(3), pages 895-909, June.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;
    ;

    JEL classification:

    • O34 - Economic Development, Innovation, Technological Change, and Growth - - Innovation; Research and Development; Technological Change; Intellectual Property Rights - - - Intellectual Property and Intellectual Capital
    • K29 - Law and Economics - - Regulation and Business Law - - - Other
    • D83 - Microeconomics - - Information, Knowledge, and Uncertainty - - - Search; Learning; Information and Knowledge; Communication; Belief; Unawareness
    • C81 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Methodology for Collecting, Estimating, and Organizing Microeconomic Data; Data Access

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:iip:wpaper:31. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Gaétan de Rassenfosse (email available below). General contact details of provider: https://stip.epfl.ch/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.