IDEAS home Printed from https://ideas.repec.org/a/gam/jftint/v18y2026i2p94-d1862285.html

Benchmarking Large Language Models for Embedded Systems Programming in Microcontroller-Driven IoT Applications

Author

Listed:
  • Marek Babiuch

    (Department of Control Systems and Instrumentation, VSB—Technical University of Ostrava, 70800 Ostrava, Czech Republic)

  • Pavel Smutný

    (Department of Control Systems and Instrumentation, VSB—Technical University of Ostrava, 70800 Ostrava, Czech Republic)

Abstract

Large language models (LLMs) have shown strong potential for automated code generation in software development, yet their effectiveness in embedded systems programming—requiring understanding of software logic and hardware constraints—has not been well studied. Existing evaluation frameworks do not comprehensively cover practical microcontroller development scenarios in real-world Internet of Things (IoT) projects. This study systematically evaluates 27 state-of-the-art LLMs across eight embedded systems scenarios of increasing complexity, from basic sensor reading to complete cloud database integration with visualization dashboards. Using ESP32 microcontrollers with environmental and motion sensors, we employed the Analytic Hierarchy Process with four weighted criteria: functional, instructions, output and creativity, evaluated independently by two expert reviewers. Top-performing models were Claude Sonnet 4.5, Claude Opus 4.1, and Gemini 2.5 Pro, with scores from 0.984 to 0.910. Performance degraded with complexity: 19–23 models generated compilable code for simple applications, but only 3–5 produced functional solutions for complex scenarios involving Grafana and cloud databases. The most frequent failure was hallucinated non-existent libraries or incorrect API usage, with functional capability as the primary barrier and instruction-following quality the key differentiator among competent models. These findings provide empirical guidance for embedded developers on LLM selection and identify limitations of zero-shot prompting for hardware-dependent IoT development.

Suggested Citation

  • Marek Babiuch & Pavel Smutný, 2026. "Benchmarking Large Language Models for Embedded Systems Programming in Microcontroller-Driven IoT Applications," Future Internet, MDPI, vol. 18(2), pages 1-20, February.
  • Handle: RePEc:gam:jftint:v:18:y:2026:i:2:p:94-:d:1862285
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1999-5903/18/2/94/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1999-5903/18/2/94/
    Download Restriction: no
    ---><---

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;

    JEL classification:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jftint:v:18:y:2026:i:2:p:94-:d:1862285. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.