Author
Listed:
- Lenka Tětková
- Erik Schou Dreier
- Robin Malm
- Lars Kai Hansen
Abstract
Much machine learning research progress is based on developing models and evaluating them on a benchmark dataset (e.g., ImageNet for images). However, applying such benchmark-successful methods to real-world data often does not work as expected. This is particularly the case for biological data where we expect variability at multiple time and spatial scales. Typical benchmark data has simple, dominant semantics, such as a number, an object type, or a word. In contrast, biological samples often have multiple semantic components leading to complex and entangled signals. Complexity is added if the signal of interest is related to atypical states, e.g., disease, and if there is limited data available for learning.In this work, we focus on image classification of real-world biological data that are, indeed, different from standard images. We are using grain data and the goal is to detect diseases and damages, for example, “pink fusarium” and “skinned”. Pink fusarium, skinned grains, and other diseases and damages are key factors in setting the price of grains or excluding dangerous grains from food production. Apart from challenges stemming from differences of the data from the standard toy datasets, we also present challenges that need to be overcome when explaining deep learning models. For example, explainability methods have many hyperparameters that can give different results, and the ones published in the papers do not work on dissimilar images. Other challenges are more general: problems with visualization of the explanations and their comparison since the magnitudes of their values differ from method to method. An open fundamental question also is: How to evaluate explanations? It is a non-trivial task because the “ground truth” is usually missing or ill-defined. Also, human annotators may create what they think is an explanation of the task at hand, yet the machine learning model might solve it in a different and perhaps counter-intuitive way. We discuss several of these challenges and evaluate various post-hoc explainability methods on grain data. We focus on robustness, quality of explanations, and similarity to particular “ground truth” annotations made by experts. The goal is to find the methods that overall perform well and could be used in this challenging task. We hope that the proposed pipeline would be used as a framework for evaluating explainability methods in specific use cases.
Suggested Citation
Lenka Tětková & Erik Schou Dreier & Robin Malm & Lars Kai Hansen, 2025.
"Challenges in explaining deep learning models for data with biological variation,"
PLOS ONE, Public Library of Science, vol. 20(10), pages 1-20, October.
Handle:
RePEc:plo:pone00:0333965
DOI: 10.1371/journal.pone.0333965
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0333965. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.