IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/0030148.html
   My bibliography  Save this article

Automated Discovery of Functional Generality of Human Gene Expression Programs

Author

Listed:
  • Georg K Gerber
  • Robin D Dowell
  • Tommi S Jaakkola
  • David K Gifford

Abstract

An important research problem in computational biology is the identification of expression programs, sets of co-expressed genes orchestrating normal or pathological processes, and the characterization of the functional breadth of these programs. The use of human expression data compendia for discovery of such programs presents several challenges including cellular inhomogeneity within samples, genetic and environmental variation across samples, uncertainty in the numbers of programs and sample populations, and temporal behavior. We developed GeneProgram, a new unsupervised computational framework based on Hierarchical Dirichlet Processes that addresses each of the above challenges. GeneProgram uses expression data to simultaneously organize tissues into groups and genes into overlapping programs with consistent temporal behavior, to produce maps of expression programs, which are sorted by generality scores that exploit the automatically learned groupings. Using synthetic and real gene expression data, we showed that GeneProgram outperformed several popular expression analysis methods. We applied GeneProgram to a compendium of 62 short time-series gene expression datasets exploring the responses of human cells to infectious agents and immune-modulating molecules. GeneProgram produced a map of 104 expression programs, a substantial number of which were significantly enriched for genes involved in key signaling pathways and/or bound by NF-κB transcription factors in genome-wide experiments. Further, GeneProgram discovered expression programs that appear to implicate surprising signaling pathways or receptor types in the response to infection, including Wnt signaling and neurotransmitter receptors. We believe the discovered map of expression programs involved in the response to infection will be useful for guiding future biological experiments; genes from programs with low generality scores might serve as new drug targets that exhibit minimal “cross-talk,” and genes from high generality programs may maintain common physiological responses that go awry in disease states. Further, our method is multipurpose, and can be applied readily to novel compendia of biological data.: In recent years, DNA microarrays have been used to produce large compendia of human gene expression data, which are promising resources for discovery of expression programs, sets of co-expressed genes orchestrating important physiological or pathological processes. However, these compendia present particular challenges, including cellular inhomogeneity within samples, genetic and environmental variation across samples, uncertainty in the numbers of programs and sample populations, and temporal behavior. To address these challenges, we developed GeneProgram, a state-of-the-art statistical framework that automatically generates interpretable maps of expression programs from microarray data. GeneProgram accomplishes this by simultaneously organizing tissues into groups and genes into overlapping programs with consistent temporal behavior, and sorting programs by a generality score. Such maps may be valuable for guiding future biological experiments; genes from programs with low generality scores might serve as new drug targets that exhibit minimal “cross-talk,” and genes from high generality programs may maintain common physiological responses that go awry in disease states. Using synthetic and real data, we showed that GeneProgram outperformed several popular expression analysis methods. Further, on a compendium of time-series gene expression data measuring the responses of human cells to infectious agents, GeneProgram discovered programs that implicate surprising signaling pathways and receptor types.

Suggested Citation

  • Georg K Gerber & Robin D Dowell & Tommi S Jaakkola & David K Gifford, 2007. "Automated Discovery of Functional Generality of Human Gene Expression Programs," PLOS Computational Biology, Public Library of Science, vol. 3(8), pages 1-15, August.
  • Handle: RePEc:plo:pcbi00:0030148
    DOI: 10.1371/journal.pcbi.0030148
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.0030148
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.0030148&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.0030148?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Daniel D. Lee & H. Sebastian Seung, 1999. "Learning the parts of objects by non-negative matrix factorization," Nature, Nature, vol. 401(6755), pages 788-791, October.
    2. Teh, Yee Whye & Jordan, Michael I. & Beal, Matthew J. & Blei, David M., 2006. "Hierarchical Dirichlet Processes," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1566-1581, December.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Ali Faisal & Jaakko Peltonen & Elisabeth Georgii & Johan Rung & Samuel Kaski, 2014. "Toward Computational Cumulative Biology by Combining Models of Biological Datasets," PLOS ONE, Public Library of Science, vol. 9(11), pages 1-17, November.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yoshi Fujiwara & Rubaiyat Islam, 2021. "Bitcoin's Crypto Flow Network," Papers 2106.11446, arXiv.org, revised Jul 2021.
    2. Rafael Teixeira & Mário Antunes & Diogo Gomes & Rui L. Aguiar, 2024. "Comparison of Semantic Similarity Models on Constrained Scenarios," Information Systems Frontiers, Springer, vol. 26(4), pages 1307-1330, August.
    3. Del Corso, Gianna M. & Romani, Francesco, 2019. "Adaptive nonnegative matrix factorization and measure comparisons for recommender systems," Applied Mathematics and Computation, Elsevier, vol. 354(C), pages 164-179.
    4. P Fogel & C Geissler & P Cotte & G Luta, 2022. "Applying separative non-negative matrix factorization to extra-financial data," Working Papers hal-03689774, HAL.
    5. Xiao-Bai Li & Jialun Qin, 2017. "Anonymizing and Sharing Medical Text Records," Information Systems Research, INFORMS, vol. 28(2), pages 332-352, June.
    6. Michelle Dietzen & Haoran Zhai & Olivia Lucas & Oriol Pich & Christopher Barrington & Wei-Ting Lu & Sophia Ward & Yanping Guo & Robert E. Hynds & Simone Zaccaria & Charles Swanton & Nicholas McGranaha, 2024. "Replication timing alterations are associated with mutation acquisition during breast and lung cancer evolution," Nature Communications, Nature, vol. 15(1), pages 1-23, December.
    7. Redivo, Edoardo & Nguyen, Hien D. & Gupta, Mayetri, 2020. "Bayesian clustering of skewed and multimodal data using geometric skewed normal distributions," Computational Statistics & Data Analysis, Elsevier, vol. 152(C).
    8. Naiyang Guan & Lei Wei & Zhigang Luo & Dacheng Tao, 2013. "Limited-Memory Fast Gradient Descent Method for Graph Regularized Nonnegative Matrix Factorization," PLOS ONE, Public Library of Science, vol. 8(10), pages 1-10, October.
    9. Spelta, A. & Pecora, N. & Rovira Kaltwasser, P., 2019. "Identifying Systemically Important Banks: A temporal approach for macroprudential policies," Journal of Policy Modeling, Elsevier, vol. 41(1), pages 197-218.
    10. Jin, Xin & Maheu, John M., 2016. "Bayesian semiparametric modeling of realized covariance matrices," Journal of Econometrics, Elsevier, vol. 192(1), pages 19-39.
    11. M. Moghadam & K. Aminian & M. Asghari & M. Parnianpour, 2013. "How well do the muscular synergies extracted via non-negative matrix factorisation explain the variation of torque at shoulder joint?," Computer Methods in Biomechanics and Biomedical Engineering, Taylor & Francis Journals, vol. 16(3), pages 291-301.
    12. Markovsky, Ivan & Niranjan, Mahesan, 2010. "Approximate low-rank factorization with structured factors," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 3411-3420, December.
    13. Paul Fogel & Yann Gaston-Mathé & Douglas Hawkins & Fajwel Fogel & George Luta & S. Stanley Young, 2016. "Applications of a Novel Clustering Approach Using Non-Negative Matrix Factorization to Environmental Research in Public Health," IJERPH, MDPI, vol. 13(5), pages 1-14, May.
    14. Le Thi Khanh Hien & Duy Nhat Phan & Nicolas Gillis, 2022. "Inertial alternating direction method of multipliers for non-convex non-smooth optimization," Computational Optimization and Applications, Springer, vol. 83(1), pages 247-285, September.
    15. Zhaoyu Xing & Yang Wan & Juan Wen & Wei Zhong, 2024. "GOLFS: feature selection via combining both global and local information for high dimensional clustering," Computational Statistics, Springer, vol. 39(5), pages 2651-2675, July.
    16. Chae, Bongsug (Kevin), 2018. "The Internet of Things (IoT): A Survey of Topics and Trends using Twitter Data and Topic Modeling," 22nd ITS Biennial Conference, Seoul 2018. Beyond the boundaries: Challenges for business, policy and society 190376, International Telecommunications Society (ITS).
    17. Md Nazrul Islam & Md Mofazzal Hossain & Md Shafayet Shahed Ornob, 2024. "Business research on Industry 4.0: a systematic review using topic modelling approach," Future Business Journal, Springer, vol. 10(1), pages 1-15, December.
    18. Parvin Ahmadi & Iman Gholampour & Mahmoud Tabandeh, 2018. "Cluster-based sparse topical coding for topic mining and document clustering," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(3), pages 537-558, September.
    19. Jingfeng Guo & Chao Zheng & Shanshan Li & Yutong Jia & Bin Liu, 2022. "BiInfGCN: Bilateral Information Augmentation of Graph Convolutional Networks for Recommendation," Mathematics, MDPI, vol. 10(17), pages 1-16, August.
    20. Jianfei Cao & Han Yang & Jianshu Lv & Quanyuan Wu & Baolei Zhang, 2023. "Estimating Soil Salinity with Different Levels of Vegetation Cover by Using Hyperspectral and Non-Negative Matrix Factorization Algorithm," IJERPH, MDPI, vol. 20(4), pages 1-15, February.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:0030148. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.