IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1010129.html
   My bibliography  Save this article

HiCImpute: A Bayesian hierarchical model for identifying structural zeros and enhancing single cell Hi-C data

Author

Listed:
  • Qing Xie
  • Chenggong Han
  • Victor Jin
  • Shili Lin

Abstract

Single cell Hi-C techniques enable one to study cell to cell variability in chromatin interactions. However, single cell Hi-C (scHi-C) data suffer severely from sparsity, that is, the existence of excess zeros due to insufficient sequencing depth. Complicating the matter further is the fact that not all zeros are created equal: some are due to loci truly not interacting because of the underlying biological mechanism (structural zeros); others are indeed due to insufficient sequencing depth (sampling zeros or dropouts), especially for loci that interact infrequently. Differentiating between structural zeros and dropouts is important since correct inference would improve downstream analyses such as clustering and discovery of subtypes. Nevertheless, distinguishing between these two types of zeros has received little attention in the single cell Hi-C literature, where the issue of sparsity has been addressed mainly as a data quality improvement problem. To fill this gap, in this paper, we propose HiCImpute, a Bayesian hierarchical model that goes beyond data quality improvement by also identifying observed zeros that are in fact structural zeros. HiCImpute takes spatial dependencies of scHi-C 2D data structure into account while also borrowing information from similar single cells and bulk data, when such are available. Through an extensive set of analyses of synthetic and real data, we demonstrate the ability of HiCImpute for identifying structural zeros with high sensitivity, and for accurate imputation of dropout values. Downstream analyses using data improved from HiCImpute yielded much more accurate clustering of cell types compared to using observed data or data improved by several comparison methods. Most significantly, HiCImpute-improved data have led to the identification of subtypes within each of the excitatory neuronal cells of L4 and L5 in the prefrontal cortex.Author summary: Single cell Hi-C techniques enable one to study cell to cell variability in chromatin interactions, which has significant implications in gene regulations. However, insufficient sequencing depth—leading to some chromatin interactions with low frequencies not observed—has resulted in many zeros, called dropouts. There are also zeros due to biological mechanisms rather than insufficient coverage, referred to as structural zeros. As such, dropouts and structural zeros are confounded; that is, observed zeros are a mixture of both types. Differentiating between structural zeros and dropouts is important for improved downstream analyses, including cells-subtype discovery, but there is a paucity of available methods. In this paper, we develop a powerful method, HiCImpute, for identifying structural zeros and imputing dropouts. Through an extensive simulation study, we demonstrate the ability of HiCImpute for identifying structural zeros with high sensitivity and accurate imputation of dropout values, under a variety of settings. Applications of HiCImpute to three datasets yield improved data that lead to more accurate clustering of cell types, and further, discovery of subtypes in two of the cell types in the prefrontal cortex data.

Suggested Citation

  • Qing Xie & Chenggong Han & Victor Jin & Shili Lin, 2022. "HiCImpute: A Bayesian hierarchical model for identifying structural zeros and enhancing single cell Hi-C data," PLOS Computational Biology, Public Library of Science, vol. 18(6), pages 1-19, June.
  • Handle: RePEc:plo:pcbi00:1010129
    DOI: 10.1371/journal.pcbi.1010129
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1010129
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1010129&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1010129?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1010129. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.