IDEAS home Printed from https://ideas.repec.org/p/ulp/sbbeta/2025-27.html
   My bibliography  Save this paper

Exploring Academic Patent–Paper Pairs in Japan: Benchmarking Existing Detection Models

Author

Listed:
  • Van-Thien Nguyen
  • Rene Carraz

Abstract

This study expands on the patent-paper pair (PPP) detection model developed by Nguyen and Carraz (2025, Scientometrics) by systematically comparing it with two prominent large-scale approaches: Marx and Scharfmann (2024) and Wang et al. (2025). Although these models all aim to identify instances where the same research result is disclosed through both a patent and a scientific paper, they differ substantially in scope, design, and methodological assumptions. The Nguyen and Carraz model is designed for the Japanese academic context and integrates inventor–author matching, citation overlap, and semantic and lexical similarity within a supervised learning framework. In contrast, Marx and Scharfmann rely on detecting long identical word sequences (“self-plagiarism”) via a random forest classifier, and Wang et al. implement an inventor-centric clustering method with logistic regression applied to title and abstract similarity. We directly compare the Nguyen and Carraz dataset with those of Marx and Scharfmann and Wang et al., focusing on PPPs involving Japanese academic assignees. Despite the shared national context, there is minimal overlap: only 168 PPPs overlap with the Marx and Scharfmann model and 425 overlap with the Wang et al. model. When evaluated on a shared validation set, the Nguyen and Carraz model outperforms both alternatives in the Japanese academic context, especially with logistic regression features. Feature extensions such as self-plagiarism and geographic distance offer only modest improvements under non-linear models. These findings highlight the importance of designing context-specific models and exercising caution when applying global PPP datasets to localized settings.

Suggested Citation

  • Van-Thien Nguyen & Rene Carraz, 2025. "Exploring Academic Patent–Paper Pairs in Japan: Benchmarking Existing Detection Models," Working Papers of BETA 2025-27, Bureau d'Economie Théorique et Appliquée, UDS, Strasbourg.
  • Handle: RePEc:ulp:sbbeta:2025-27
    as

    Download full text from publisher

    File URL: http://beta.u-strasbg.fr/WP/2025/2025-27.pdf
    Download Restriction: no
    ---><---

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:ulp:sbbeta:2025-27. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: the person in charge The email address of this maintainer does not seem to be valid anymore. Please ask the person in charge to update the entry or send us the correct address (email available below). General contact details of provider: https://edirc.repec.org/data/bestrfr.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.