Author
Listed:
- Marek Babiuch
(Department of Control Systems and Instrumentation, VSB—Technical University of Ostrava, 70800 Ostrava, Czech Republic)
- Pavel Smutný
(Department of Control Systems and Instrumentation, VSB—Technical University of Ostrava, 70800 Ostrava, Czech Republic)
Abstract
Large language models (LLMs) have shown strong potential for automated code generation in software development, yet their effectiveness in embedded systems programming—requiring understanding of software logic and hardware constraints—has not been well studied. Existing evaluation frameworks do not comprehensively cover practical microcontroller development scenarios in real-world Internet of Things (IoT) projects. This study systematically evaluates 27 state-of-the-art LLMs across eight embedded systems scenarios of increasing complexity, from basic sensor reading to complete cloud database integration with visualization dashboards. Using ESP32 microcontrollers with environmental and motion sensors, we employed the Analytic Hierarchy Process with four weighted criteria: functional, instructions, output and creativity, evaluated independently by two expert reviewers. Top-performing models were Claude Sonnet 4.5, Claude Opus 4.1, and Gemini 2.5 Pro, with scores from 0.984 to 0.910. Performance degraded with complexity: 19–23 models generated compilable code for simple applications, but only 3–5 produced functional solutions for complex scenarios involving Grafana and cloud databases. The most frequent failure was hallucinated non-existent libraries or incorrect API usage, with functional capability as the primary barrier and instruction-following quality the key differentiator among competent models. These findings provide empirical guidance for embedded developers on LLM selection and identify limitations of zero-shot prompting for hardware-dependent IoT development.
Suggested Citation
Marek Babiuch & Pavel Smutný, 2026.
"Benchmarking Large Language Models for Embedded Systems Programming in Microcontroller-Driven IoT Applications,"
Future Internet, MDPI, vol. 18(2), pages 1-20, February.
Handle:
RePEc:gam:jftint:v:18:y:2026:i:2:p:94-:d:1862285
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jftint:v:18:y:2026:i:2:p:94-:d:1862285. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.