Author
Listed:
- George Lee
- David Edmundo Romo Bucheli
- Anant Madabhushi
Abstract
Medical diagnostics is often a multi-attribute problem, necessitating sophisticated tools for analyzing high-dimensional biomedical data. Mining this data often results in two crucial bottlenecks: 1) high dimensionality of features used to represent rich biological data and 2) small amounts of labelled training data due to the expense of consulting highly specific medical expertise necessary to assess each study. Currently, no approach that we are aware of has attempted to use active learning in the context of dimensionality reduction approaches for improving the construction of low dimensional representations. We present our novel methodology, AdDReSS (Adaptive Dimensionality Reduction with Semi-Supervision), to demonstrate that fewer labeled instances identified via AL in embedding space are needed for creating a more discriminative embedding representation compared to randomly selected instances. We tested our methodology on a wide variety of domains ranging from prostate gene expression, ovarian proteomic spectra, brain magnetic resonance imaging, and breast histopathology. Across these various high dimensional biomedical datasets with 100+ observations each and all parameters considered, the median classification accuracy across all experiments showed AdDReSS (88.7%) to outperform SSAGE, a SSDR method using random sampling (85.5%), and Graph Embedding (81.5%). Furthermore, we found that embeddings generated via AdDReSS achieved a mean 35.95% improvement in Raghavan efficiency, a measure of learning rate, over SSAGE. Our results demonstrate the value of AdDReSS to provide low dimensional representations of high dimensional biomedical data while achieving higher classification rates with fewer labelled examples as compared to without active learning.
Suggested Citation
George Lee & David Edmundo Romo Bucheli & Anant Madabhushi, 2016.
"Adaptive Dimensionality Reduction with Semi-Supervision (AdDReSS): Classifying Multi-Attribute Biomedical Data,"
PLOS ONE, Public Library of Science, vol. 11(7), pages 1-23, July.
Handle:
RePEc:plo:pone00:0159088
DOI: 10.1371/journal.pone.0159088
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0159088. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.