Author
Listed:
- Codrina-Victoria Lisaru
- Claudiu-Vasile Kifor
Abstract
Background: Artificial intelligence (AI) is increasingly used both to test software (T1) and to assure AI-based systems (T2), with adjacent software-engineering work that shapes testing practice (T3). Prior reviews are mostly descriptive and rarely report comparable maturity or replicability signals.Objective: To provide a PRISMA-style systematic review (2015-2025, Web of Science) that maps T1-T2-T3 within a testing-centric frame, audits evidence maturity, threats reporting, and artefact openness per paper, and adds an explicit lens of large language models or generative AI (LLMs/GenAI).Methods: We queried the Web of Science Core Collection (2015-2025), screened via a predefined protocol, and extracted ten items (D1-D10) per study to normalize comparisons. Seventy-two papers met the criteria. Findings are organized into three themes: (T1) AI-based software testing, (T2) testing/validation of AI systems, and (T3) AI-related software engineering topics with implications for testing-T3 corresponding to the "beyond" in the paper's title.Results: The corpus is limited in practice-oriented evidence: 31 laboratory/simulation, 3 industrial, 10 hybrid, 6 conceptual/guideline and 22 secondary studies. Only 18/72 provide public artefacts; 33/72 report no empirical metrics. By theme, T1=32, T2=15, T3=25; the LLMs/GenAI subset totals 10 papers. Openness strongly co-occurs with measurable outcomes (88.9% of artefact-sharing papers report metrics vs 42.6% without), yet "all-three credible" studies (industrial/hybrid + open artefacts + metrics) are rare (4/72 overall; 1/10 for LLMs/GenAI).Conclusion: AI shows promise for testing, but evidence remains thin on industrial adoption and reproducibility. We recommend prioritizing hybrid/industrial validations, releasing artefacts by default, and using standardized task-metric bundles. The review presents T1 and T2 results, separates T3 for scope clarity, and provides actionable maturity and replicability signals to guide responsible, empirical adoption.
Suggested Citation
Codrina-Victoria Lisaru & Claudiu-Vasile Kifor, .
"Artificial Intelligence in Software Testing and Beyond: A Review of Current Practices and Emerging Challenges,"
Acta Informatica Pragensia, Prague University of Economics and Business, vol. 0.
Handle:
RePEc:prg:jnlaip:v:preprint:id:303
DOI: 10.18267/j.aip.303
Download full text from publisher
As the access to this document is restricted, you may want to
for a different version of it.
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:prg:jnlaip:v:preprint:id:303. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Stanislav Vojir (email available below). General contact details of provider: https://edirc.repec.org/data/uevsecz.html .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.