IDEAS home Printed from https://ideas.repec.org/p/ulp/sbbeta/2025-27.html
   My bibliography  Save this paper

Exploring Academic Patent–Paper Pairs in Japan: Benchmarking Existing Detection Models

Author

Listed:
  • Van-Thien Nguyen
  • Rene Carraz

Abstract

This study expands on the patent-paper pair (PPP) detection model developed by Nguyen and Carraz (2025, Scientometrics) by systematically comparing it with two prominent large-scale approaches: Marx and Scharfmann (2024) and Wang et al. (2025). Although these models all aim to identify instances where the same research result is disclosed through both a patent and a scientific paper, they differ substantially in scope, design, and methodological assumptions. The Nguyen and Carraz model is designed for the Japanese academic context and integrates inventor–author matching, citation overlap, and semantic and lexical similarity within a supervised learning framework. In contrast, Marx and Scharfmann rely on detecting long identical word sequences (“self-plagiarism”) via a random forest classifier, and Wang et al. implement an inventor-centric clustering method with logistic regression applied to title and abstract similarity. We directly compare the Nguyen and Carraz dataset with those of Marx and Scharfmann and Wang et al., focusing on PPPs involving Japanese academic assignees. Despite the shared national context, there is minimal overlap: only 168 PPPs overlap with the Marx and Scharfmann model and 425 overlap with the Wang et al. model. When evaluated on a shared validation set, the Nguyen and Carraz model outperforms both alternatives in the Japanese academic context, especially with logistic regression features. Feature extensions such as self-plagiarism and geographic distance offer only modest improvements under non-linear models. These findings highlight the importance of designing context-specific models and exercising caution when applying global PPP datasets to localized settings.

Suggested Citation

  • Van-Thien Nguyen & Rene Carraz, 2025. "Exploring Academic Patent–Paper Pairs in Japan: Benchmarking Existing Detection Models," Working Papers of BETA 2025-27, Bureau d'Economie Théorique et Appliquée, UDS, Strasbourg.
  • Handle: RePEc:ulp:sbbeta:2025-27
    as

    Download full text from publisher

    File URL: http://beta.u-strasbg.fr/WP/2025/2025-27.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Lissoni, Francesco & Montobbio, Fabio & Zirulia, Lorenzo, 2013. "Inventorship and authorship as attribution rights: An enquiry into the economics of scientific credit," Journal of Economic Behavior & Organization, Elsevier, vol. 95(C), pages 49-69.
    2. Van Thien Nguyen & Rene Carraz, 2025. "‘‘Exploring academic patent-paper pairs: a new methodology for analyzing Japan’s research landscape’’," Scientometrics, Springer;Akadémiai Kiadó, vol. 130(3), pages 1329-1356, March.
    3. Yuhang Wang & Lei Pei & Jianjun Sun & Lele Kang, 2025. "Trace on both sides: a two-step text mining method to identify academic inventors’ patent–paper pairs," Scientometrics, Springer;Akadémiai Kiadó, vol. 130(2), pages 833-860, February.
    4. Matt Marx & Aaron Fuegi, 2020. "Reliance on science: Worldwide front‐page patent citations to scientific articles," Strategic Management Journal, Wiley Blackwell, vol. 41(9), pages 1572-1594, September.
    5. Murray, Fiona & Stern, Scott, 2007. "Do formal intellectual property rights hinder the free flow of scientific knowledge?: An empirical test of the anti-commons hypothesis," Journal of Economic Behavior & Organization, Elsevier, vol. 63(4), pages 648-687, August.
    6. Fiona E. Murray & Scott Stern, 2007. "Do Formal Intellectual Property Rights Hinder the Free Flow of Scientific Knowledge?: An Empirical Test of the Anti-Commons Hypothesis," NBER Chapters, in: Academic Science and Entrepreneurship: Dual Engines of Growth, National Bureau of Economic Research, Inc.
    7. Matt Marx & Aaron Fuegi, 2022. "Reliance on science by inventors: Hybrid extraction of in‐text patent‐to‐article citations," Journal of Economics & Management Strategy, Wiley Blackwell, vol. 31(2), pages 369-392, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yuhang Wang & Lei Pei & Jianjun Sun & Lele Kang, 2025. "Trace on both sides: a two-step text mining method to identify academic inventors’ patent–paper pairs," Scientometrics, Springer;Akadémiai Kiadó, vol. 130(2), pages 833-860, February.
    2. Van Thien Nguyen & Rene Carraz, 2025. "‘‘Exploring academic patent-paper pairs: a new methodology for analyzing Japan’s research landscape’’," Scientometrics, Springer;Akadémiai Kiadó, vol. 130(3), pages 1329-1356, March.
    3. Choi, Jin-Uk & Lee, Chang-Yang, 2022. "The differential effects of basic research on firm R&D productivity: The conditioning role of technological diversification," Technovation, Elsevier, vol. 118(C).
    4. Stéphane Maraut & Catalina Martínez, 2014. "Identifying author–inventors from Spain: methods and a first insight into results," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(1), pages 445-476, October.
    5. Seokbeom Kwon & Kazuyuki Motohashi & Kenta Ikeuchi, 2022. "Chasing two hares at once? Effect of joint institutional change for promoting commercial use of university knowledge and scientific research," The Journal of Technology Transfer, Springer, vol. 47(4), pages 1242-1272, August.
    6. Elodie Carpentier & Alexander Cuntz & Alessio Muscarnera & Julio Raffo, 2025. "Digital Access to Knowledge and Women in Science," WIPO Economic Research Working Papers 88, World Intellectual Property Organization - Economics and Statistics Division.
    7. Feldman, Maryann & Kenney, Martin & Lissoni, Francesco, 2015. "The New Data Frontier," Research Policy, Elsevier, vol. 44(9), pages 1629-1632.
    8. Anckaert, Paul-Emmanuel, 2025. "When the drugs (don’t) work: The role of science in product commercialization," Research Policy, Elsevier, vol. 54(5).
    9. Hottenrott, Hanna & Lawson, Cornelia, 2017. "Fishing for complementarities: Research grants and research productivity," International Journal of Industrial Organization, Elsevier, vol. 51(C), pages 1-38.
    10. Olof Ejermo & John Källström, 2016. "What is the causal effect of R&D on patenting activity in a “professor’s privilege” country? Evidence from Sweden," Small Business Economics, Springer, vol. 47(3), pages 677-694, October.
    11. Wang, Qinyu Ryan & Zheng, Yanfeng, 2023. "Patent regime and the geography of cumulative innovation," Research Policy, Elsevier, vol. 52(7).
    12. Hans K. Hvide & Benjamin F. Jones, 2018. "University Innovation and the Professor's Privilege," American Economic Review, American Economic Association, vol. 108(7), pages 1860-1898, July.
    13. Wipo, 2011. "World Intellectual Property Report 2011- The Changing Face of Innovation," WIPO Economics & Statistics Series, World Intellectual Property Organization - Economics and Statistics Division, number 2011:944, January.
    14. Lin, Jenny X. & Lincoln, William F., 2017. "Pirate's treasure," Journal of International Economics, Elsevier, vol. 109(C), pages 235-245.
    15. Michael Noel & Mark Schankerman, 2013. "Strategic Patenting and Software Innovation," Journal of Industrial Economics, Wiley Blackwell, vol. 61(3), pages 481-520, September.
    16. Abramo, Giovanni & D'Angelo, Ciriaco Andrea & Di Costa, Flavia, 2021. "The scholarly impact of private sector research: A multivariate analysis," Journal of Informetrics, Elsevier, vol. 15(3).
    17. Mark J. McCabe & Christopher M. Snyder, 2015. "Does Online Availability Increase Citations? Theory and Evidence from a Panel of Economics and Business Journals," The Review of Economics and Statistics, MIT Press, vol. 97(1), pages 144-165, March.
    18. Giovanni Abramo & Ciriaco Andrea D'Angelo & Flavia Di Costa, 2020. "The relative impact of private research on scientific advancement," Papers 2012.04908, arXiv.org.
    19. Laura Magazzini & Fabio Pammolli & Massimo Riccaboni & Maria Alessandra Rossi, 2009. "Patent disclosure and R&D competition in pharmaceuticals," Economics of Innovation and New Technology, Taylor & Francis Journals, vol. 18(5), pages 467-486.
    20. Heidi L. Williams, 2016. "Intellectual Property Rights and Innovation: Evidence from Health Care Markets," Innovation Policy and the Economy, University of Chicago Press, vol. 16(1), pages 53-87.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:ulp:sbbeta:2025-27. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: the person in charge The email address of this maintainer does not seem to be valid anymore. Please ask the person in charge to update the entry or send us the correct address (email available below). General contact details of provider: https://edirc.repec.org/data/bestrfr.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.