IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1008550.html
   My bibliography  Save this article

CTD: An information-theoretic algorithm to interpret sets of metabolomic and transcriptomic perturbations in the context of graphical models

Author

Listed:
  • Lillian R Thistlethwaite
  • Varduhi Petrosyan
  • Xiqi Li
  • Marcus J Miller
  • Sarah H Elsea
  • Aleksandar Milosavljevic

Abstract

We consider the following general family of algorithmic problems that arises in transcriptomics, metabolomics and other fields: given a weighted graph G and a subset of its nodes S, find subsets of S that show significant connectedness within G. A specific solution to this problem may be defined by devising a scoring function, the Maximum Clique problem being a classic example, where S includes all nodes in G and where the score is defined by the size of the largest subset of S fully connected within G. Major practical obstacles for the plethora of algorithms addressing this type of problem include computational efficiency and, particularly for more complex scores which take edge weights into account, the computational cost of permutation testing, a statistical procedure required to obtain a bound on the p-value for a connectedness score. To address these problems, we developed CTD “Connect the Dots”, a fast algorithm based on data compression that detects highly connected subsets within S. CTD provides information-theoretic upper bounds on p-values when S contains a small fraction of nodes in G without requiring computationally costly permutation testing. We apply the CTD algorithm to interpret multi-metabolite perturbations due to inborn errors of metabolism and multi-transcript perturbations associated with breast cancer in the context of disease-specific Gaussian Markov Random Field networks learned directly from respective molecular profiling data.Author summary: A frequently encountered “omic” analysis problem is to identify a subset of nodes within a weighted graph G that is both highly connected in G and belongs to S, a subset of nodes in G. For example, G may represent a biological pathway, kinetic network model, biological interaction network, or a network learned directly from data, where edges represent co-variation relationships between abundances of molecular variables. S may be the set of molecular variables that are perturbed in an individual case or in a set of disease cases relative to controls. In this work, we develop a novel information-theoretic formulation of this problem and a local search algorithm that obviate the need for computationally costly permutation testing, a statistical procedure which is typically required to establish rigorous p-value bounds for other scoring-based methods.

Suggested Citation

  • Lillian R Thistlethwaite & Varduhi Petrosyan & Xiqi Li & Marcus J Miller & Sarah H Elsea & Aleksandar Milosavljevic, 2021. "CTD: An information-theoretic algorithm to interpret sets of metabolomic and transcriptomic perturbations in the context of graphical models," PLOS Computational Biology, Public Library of Science, vol. 17(1), pages 1-24, January.
  • Handle: RePEc:plo:pcbi00:1008550
    DOI: 10.1371/journal.pcbi.1008550
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008550
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1008550&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1008550?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1008550. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.