Author
Listed:
- Julian Aberger
(Chair of Waste Processing Technology and Waste Management, Technical University of Leoben, 8700 Leoben, Austria)
- Lena Brensberger
(Chair of Waste Processing Technology and Waste Management, Technical University of Leoben, 8700 Leoben, Austria)
- Gerald Koinig
(Chair of Waste Processing Technology and Waste Management, Technical University of Leoben, 8700 Leoben, Austria)
- Benedikt Häcker
(Siemens Aktiengesellschaft, 1210 Vienna, Austria)
- Jesús Pestana
(Pro2Future GmbH, 8010 Graz, Austria)
- Renato Sarc
(Chair of Waste Processing Technology and Waste Management, Technical University of Leoben, 8700 Leoben, Austria)
Abstract
Influence-based data selection methods, such as TracIn, aim to estimate the impact of individual training samples on model predictions and are increasingly used for dataset curation and reduction. This study investigates whether selecting the most positively influential training examples can be used to create compressed yet effective training datasets for transfer learning in plastic waste classification. Using a ResNet-18 model trained on a custom dataset of plastic waste images, TracIn was applied to compute influence scores across multiple training checkpoints. The top 50 influential samples per class were extracted and used to train a new model. Contrary to expectations, models trained on these highly influential subsets significantly underperformed compared to models trained on either the full dataset or an equally sized random sample. Further analysis revealed that many top-ranked influential images originated from different classes, indicating model biases and potential label confusion. These findings highlight the limitations of using influence scores for dataset compression. However, TracIn proved valuable for identifying problematic or ambiguous samples, class imbalance issues, and issues with fuzzy class boundaries. Based on the results, the utilized TracIn approach is recommended as a diagnostic instrument rather than for dataset curation.
Suggested Citation
Julian Aberger & Lena Brensberger & Gerald Koinig & Benedikt Häcker & Jesús Pestana & Renato Sarc, 2025.
"Limitations of Influence-Based Dataset Compression for Waste Classification,"
Data, MDPI, vol. 10(8), pages 1-16, August.
Handle:
RePEc:gam:jdataj:v:10:y:2025:i:8:p:127-:d:1719328
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jdataj:v:10:y:2025:i:8:p:127-:d:1719328. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.