IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1010984.html
   My bibliography  Save this article

The effect of non-linear signal in classification problems using gene expression

Author

Listed:
  • Benjamin J Heil
  • Jake Crawford
  • Casey S Greene

Abstract

Those building predictive models from transcriptomic data are faced with two conflicting perspectives. The first, based on the inherent high dimensionality of biological systems, supposes that complex non-linear models such as neural networks will better match complex biological systems. The second, imagining that complex systems will still be well predicted by simple dividing lines prefers linear models that are easier to interpret. We compare multi-layer neural networks and logistic regression across multiple prediction tasks on GTEx and Recount3 datasets and find evidence in favor of both possibilities. We verified the presence of non-linear signal when predicting tissue and metadata sex labels from expression data by removing the predictive linear signal with Limma, and showed the removal ablated the performance of linear methods but not non-linear ones. However, we also found that the presence of non-linear signal was not necessarily sufficient for neural networks to outperform logistic regression. Our results demonstrate that while multi-layer neural networks may be useful for making predictions from gene expression data, including a linear baseline model is critical because while biological systems are high-dimensional, effective dividing lines for predictive models may not be.Author summary: If we could consistently predict biological conditions from mRNA levels, it could help discover biomarkers for disease diagnosis. Deep learning has become widely used for many tasks including biomarker discovery. It is unclear whether the complexity of these models is helpful. We evaluate whether or not more complex non-linear models have an advantage over simpler linear ones for a set of prediction tasks. We find that, at least for tissue prediction and prediction of metadata-derived sex prediction, linear models perform just as well as non-linear ones. However, we also demonstrate the presence of a predictive signal in the data that only the non-linear models can use. Our results suggest that the non-linear signals may be redundant with linear ones or that current deep neural networks are not able to successfully use the signal when linear signals are present.

Suggested Citation

  • Benjamin J Heil & Jake Crawford & Casey S Greene, 2023. "The effect of non-linear signal in classification problems using gene expression," PLOS Computational Biology, Public Library of Science, vol. 19(3), pages 1-12, March.
  • Handle: RePEc:plo:pcbi00:1010984
    DOI: 10.1371/journal.pcbi.1010984
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1010984
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1010984&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1010984?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Alexander Lachmann & Denis Torre & Alexandra B. Keenan & Kathleen M. Jagodnik & Hoyjin J. Lee & Lily Wang & Moshe C. Silverstein & Avi Ma’ayan, 2018. "Massive mining of publicly available RNA-seq data from human and mouse," Nature Communications, Nature, vol. 9(1), pages 1-10, December.
    2. Zifeng Wang & Aria Masoomi & Zhonghui Xu & Adel Boueiz & Sool Lee & Tingting Zhao & Russell Bowler & Michael Cho & Edwin K Silverman & Craig Hersh & Jennifer Dy & Peter J Castaldi, 2021. "Improved prediction of smoking status via isoform-aware RNA-seq deep learning models," PLOS Computational Biology, Public Library of Science, vol. 17(10), pages 1-19, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jonathan P. Ling & Alexei M. Bygrave & Clayton P. Santiago & Rogger P. Carmen-Orozco & Vickie T. Trinh & Minzhong Yu & Yini Li & Ying Liu & Kyra D. Bowden & Leighton H. Duncan & Jeong Han & Kamil Tane, 2022. "Cell-specific regulation of gene expression using splicing-dependent frameshifting," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    2. Kenneth A. Wilson & Sudipta Bar & Eric B. Dammer & Enrique M. Carrera & Brian A. Hodge & Tyler A. U. Hilsabeck & Joanna Bons & George W. Brownridge & Jennifer N. Beck & Jacob Rose & Melia Granath-Pane, 2024. "OXR1 maintains the retromer to delay brain aging under dietary restriction," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    3. Nicholas Markarian & Kimberly M Van Auken & Dustin Ebert & Paul W Sternberg, 2024. "Enrichment on steps, not genes, improves inference of differentially expressed pathways," PLOS Computational Biology, Public Library of Science, vol. 20(3), pages 1-23, March.
    4. Hao Chen & Frederick J. King & Bin Zhou & Yu Wang & Carter J. Canedy & Joel Hayashi & Yang Zhong & Max W. Chang & Lars Pache & Julian L. Wong & Yong Jia & John Joslin & Tao Jiang & Christopher Benner , 2024. "Drug target prediction through deep learning functional representation of gene signatures," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    5. Milton Pividori & Sumei Lu & Binglan Li & Chun Su & Matthew E. Johnson & Wei-Qi Wei & Qiping Feng & Bahram Namjou & Krzysztof Kiryluk & Iftikhar J. Kullo & Yuan Luo & Blair D. Sullivan & Benjamin F. V, 2023. "Projecting genetic associations through gene expression patterns highlights disease etiology and drug mechanisms," Nature Communications, Nature, vol. 14(1), pages 1-18, December.
    6. Isabel Tundidor & Marta Seijo-Vila & Sandra Blasco-Benito & María Rubert-Hernández & Sandra Adámez & Clara Andradas & Sara Manzano & Isabel Álvarez-López & Cristina Sarasqueta & María Villa-Morales & , 2023. "Identification of fatty acid amide hydrolase as a metastasis suppressor in breast cancer," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    7. Lucas A. Mavromatis & Daniel B. Rosoff & Andrew S. Bell & Jeesun Jung & Josephin Wagner & Falk W. Lohoff, 2023. "Multi-omic underpinnings of epigenetic aging and human longevity," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    8. Daewon Lee & Eunju Yoon & Su Jin Ham & Kunwoo Lee & Hansaem Jang & Daihn Woo & Da Hyun Lee & Sehyeon Kim & Sekyu Choi & Jongkyeong Chung, 2024. "Diabetic sensory neuropathy and insulin resistance are induced by loss of UCHL1 in Drosophila," Nature Communications, Nature, vol. 15(1), pages 1-22, December.
    9. Juliane Tschuck & Lea Theilacker & Ina Rothenaigner & Stefanie A. I. Weiß & Banu Akdogan & Van Thanh Lam & Constanze Müller & Roman Graf & Stefanie Brandner & Christian Pütz & Tamara Rieder & Philippe, 2023. "Farnesoid X receptor activation by bile acids suppresses lipid peroxidation and ferroptosis," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    10. Xiaotao Zhang & Huaming Li & Yichen Gu & An Ping & Jiarui Chen & Qia Zhang & Zhouhan Xu & Junjie Wang & Shenjie Tang & Rui Wang & Jianan Lu & Lingxiao Lu & Chenghao Jin & Ziyang Jin & Jianmin Zhang & , 2025. "Repair-associated macrophages increase after early-phase microglia attenuation to promote ischemic stroke recovery," Nature Communications, Nature, vol. 16(1), pages 1-20, December.
    11. Naoko Iida & Ai Okada & Yoshihisa Kobayashi & Kenichi Chiba & Yasushi Yatabe & Yuichi Shiraishi, 2025. "Systematically developing a registry of splice-site creating variants utilizing massive publicly available transcriptome sequence data," Nature Communications, Nature, vol. 16(1), pages 1-15, December.
    12. Yaqiong Li & Zhipeng Niu & Jichao Yang & Xuke Yang & Yukun Chen & Yingying Li & Xiaohan Liang & Jingwen Zhang & Fuqiang Fan & Ping Wu & Chao Peng & Bang Shen, 2023. "Rapid metabolic reprogramming mediated by the AMP-activated protein kinase during the lytic cycle of Toxoplasma gondii," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    13. Nikolai Schleussner & Pierre Cauchy & Vedran Franke & Maciej Giefing & Oriol Fornes & Naveen Vankadari & Salam A. Assi & Mariantonia Costanza & Marc A. Weniger & Altuna Akalin & Ioannis Anagnostopoulo, 2023. "Transcriptional reprogramming by mutated IRF4 in lymphoma," Nature Communications, Nature, vol. 14(1), pages 1-18, December.
    14. Luigi Mazzeo & Soumitra Ghosh & Emery Di Cicco & Jovan Isma & Daniele Tavernari & Anastasia Samarkina & Paola Ostano & Markus K. Youssef & Christian Simon & G. Paolo Dotto, 2024. "ANKRD1 is a mesenchymal-specific driver of cancer-associated fibroblast activation bridging androgen receptor loss to AP-1 activation," Nature Communications, Nature, vol. 15(1), pages 1-20, December.
    15. Xiaochu Tong & Ning Qu & Xiangtai Kong & Shengkun Ni & Jingyi Zhou & Kun Wang & Lehan Zhang & Yiming Wen & Jiangshan Shi & Sulin Zhang & Xutong Li & Mingyue Zheng, 2024. "Deep representation learning of chemical-induced transcriptional profile for phenotype-based drug discovery," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    16. Samir Rachid Zaim & Mark-Phillip Pebworth & Imran McGrath & Lauren Okada & Morgan Weiss & Julian Reading & Julie L. Czartoski & Troy R. Torgerson & M. Juliana McElrath & Thomas F. Bumol & Peter J. Ske, 2024. "MOCHA’s advanced statistical modeling of scATAC-seq data enables functional genomic inference in large human cohorts," Nature Communications, Nature, vol. 15(1), pages 1-24, December.
    17. Arianna Landini & Irena Trbojević-Akmačić & Pau Navarro & Yakov A. Tsepilov & Sodbo Z. Sharapov & Frano Vučković & Ozren Polašek & Caroline Hayward & Tea Petrović & Marija Vilaj & Yurii S. Aulchenko &, 2022. "Genetic regulation of post-translational modification of two distinct proteins," Nature Communications, Nature, vol. 13(1), pages 1-13, December.
    18. Royce W. Zhou & Jia Xu & Tiphaine C. Martin & Alexis L. Zachem & John He & Sait Ozturk & Deniz Demircioglu & Ankita Bansal & Andrew P. Trotta & Bruno Giotti & Berkley Gryder & Yao Shen & Xuewei Wu & S, 2022. "A local tumor microenvironment acquired super-enhancer induces an oncogenic driver in colorectal carcinoma," Nature Communications, Nature, vol. 13(1), pages 1-17, December.
    19. Pablo Jané & Xiaoying Xu & Vincent Taelman & Eduardo Jané & Karim Gariani & Rebecca A. Dumont & Yonathan Garama & Francisco Kim & María Val Gomez & Martin A. Walter, 2023. "The Imageable Genome," Nature Communications, Nature, vol. 14(1), pages 1-15, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1010984. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.