IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2409.08379.html

The Impact of Large Language Models on Open-source Innovation: Evidence from GitHub Copilot

Author

Listed:
  • Doron Yeverechyahu
  • Raveesh Mayya
  • Gal Oestreicher-Singer

Abstract

Large Language Models (LLMs) are reshaping knowledge work, yet their impact on voluntary, self-guided open innovation forums (contributors choose tasks without managerial direction) may differ fundamentally from effects observed in organizational settings. We study this question in open-source software development, where individuals' contributions collectively drive innovation at a community level. Unlike product innovation, where typologies for classifying innovation are well established, knowledge work in open-source settings calls for a distinction grounded in the cognitive demand a task places on the contributor. Burgeoning literature distinguishes substantive contributions, which require creative problem formulation to introduce new functionality, from incremental contributions, which draw on comprehension of existing code to maintain and refine it. We exploit a natural experiment around GitHub Copilot's launch in October 2021, where Copilot supported languages like Python while not supporting R for business reasons, creating an exogenous partition between otherwise comparable ecosystems. Using three complementary identification strategies and two classification approaches, we find that Copilot availability increases open-source contributions by 28 to 40 percent. The increase in incremental contributions is significantly larger than the increase in substantive contributions across all specifications. This disparity is more pronounced in projects with higher activity levels and widens following a model upgrade: LLMs function more effectively when existing context helps define the problem and constrain solutions, tilting collaborative innovation toward exploitation of established codebases rather than exploration of new functionality. This paper provides a rare instance of causal field evidence on LLM effects, given the speed at which GenAI has exploded across the knowledge economy.

Suggested Citation

  • Doron Yeverechyahu & Raveesh Mayya & Gal Oestreicher-Singer, 2024. "The Impact of Large Language Models on Open-source Innovation: Evidence from GitHub Copilot," Papers 2409.08379, arXiv.org, revised May 2026.
  • Handle: RePEc:arx:papers:2409.08379
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2409.08379
    File Function: Latest version
    Download Restriction: no
    ---><---

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2409.08379. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.