IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1010341.html
   My bibliography  Save this article

Tight basis cycle representatives for persistent homology of large biological data sets

Author

Listed:
  • Manu Aggarwal
  • Vipul Periwal

Abstract

Persistent homology (PH) is a popular tool for topological data analysis that has found applications across diverse areas of research. It provides a rigorous method to compute robust topological features in discrete experimental observations that often contain various sources of uncertainties. Although powerful in theory, PH suffers from high computation cost that precludes its application to large data sets. Additionally, most analyses using PH are limited to computing the existence of nontrivial features. Precise localization of these features is not generally attempted because, by definition, localized representations are not unique and because of even higher computation cost. Such a precise location is a sine qua non for determining functional significance, especially in biological applications. Here, we provide a strategy and algorithms to compute tight representative boundaries around nontrivial robust features in large data sets. To showcase the efficiency of our algorithms and the precision of computed boundaries, we analyze the human genome and protein crystal structures. In the human genome, we found a surprising effect of the impairment of chromatin loop formation on loops through chromosome 13 and the sex chromosomes. We also found loops with long-range interactions between functionally related genes. In protein homologs with significantly different topology, we found voids attributable to ligand-interaction, mutation, and differences between species.Author summary: The relative arrangement of constituents in a biological system is often functionally significant. Persistent homology computes the existence of regions devoid of constituents that are surrounded by regions of high density, which we can think of as holes, that are robust to experimental uncertainties. An important question then is what purpose do these robust topological features serve in the underlying system? To investigate this, it is important to compute their precise locations. However, this computation suffers from high cost and non-uniqueness of representative boundaries of these holes. In this work, we developed a set of algorithms and a strategy that computes representative boundaries around holes with high precision in large data sets. We were able to process the human genome at a high resolution in a few minutes, a computation that extant algorithms could not attempt. We also determined locations of significant topological differences in crystal structures of protein homologous sequences. This work enables research into the functional significance of robust features in large biological data sets.

Suggested Citation

  • Manu Aggarwal & Vipul Periwal, 2023. "Tight basis cycle representatives for persistent homology of large biological data sets," PLOS Computational Biology, Public Library of Science, vol. 19(5), pages 1-23, May.
  • Handle: RePEc:plo:pcbi00:1010341
    DOI: 10.1371/journal.pcbi.1010341
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1010341
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1010341&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1010341?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Dane Taylor & Florian Klimm & Heather A. Harrington & Miroslav Kramár & Konstantin Mischaikow & Mason A. Porter & Peter J. Mucha, 2015. "Erratum: Topological data analysis of contagion maps for examining spreading processes on networks," Nature Communications, Nature, vol. 6(1), pages 1-1, December.
    2. Dane Taylor & Florian Klimm & Heather A. Harrington & Miroslav Kramár & Konstantin Mischaikow & Mason A. Porter & Peter J. Mucha, 2015. "Topological data analysis of contagion maps for examining spreading processes on networks," Nature Communications, Nature, vol. 6(1), pages 1-11, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Krishnagopal, Sanjukta & Bianconi, Ginestra, 2023. "Topology and dynamics of higher-order multiplex networks," Chaos, Solitons & Fractals, Elsevier, vol. 177(C).
    2. M Ulmer & Lori Ziegelmeier & Chad M Topaz, 2019. "A topological approach to selecting models of biological experiments," PLOS ONE, Public Library of Science, vol. 14(3), pages 1-18, March.
    3. Li, Yan & Jiang, Xiong-Fei & Tian, Yue & Li, Sai-Ping & Zheng, Bo, 2019. "Portfolio optimization based on network topology," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 515(C), pages 671-681.
    4. John S. McAlister & Jesse L. Brunner & Danielle J. Galvin & Nina H. Fefferman, 2025. "A Game Theoretic Treatment of Contagion in Trade Networks," Papers 2504.06905, arXiv.org.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1010341. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.