IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1009548.html
   My bibliography  Save this article

Approximate distance correlation for selecting highly interrelated genes across datasets

Author

Listed:
  • Qunlun Shen
  • Shihua Zhang

Abstract

With the rapid accumulation of biological omics datasets, decoding the underlying relationships of cross-dataset genes becomes an important issue. Previous studies have attempted to identify differentially expressed genes across datasets. However, it is hard for them to detect interrelated ones. Moreover, existing correlation-based algorithms can only measure the relationship between genes within a single dataset or two multi-modal datasets from the same samples. It is still unclear how to quantify the strength of association of the same gene across two biological datasets with different samples. To this end, we propose Approximate Distance Correlation (ADC) to select interrelated genes with statistical significance across two different biological datasets. ADC first obtains the k most correlated genes for each target gene as its approximate observations, and then calculates the distance correlation (DC) for the target gene across two datasets. ADC repeats this process for all genes and then performs the Benjamini-Hochberg adjustment to control the false discovery rate. We demonstrate the effectiveness of ADC with simulation data and four real applications to select highly interrelated genes across two datasets. These four applications including 21 cancer RNA-seq datasets of different tissues; six single-cell RNA-seq (scRNA-seq) datasets of mouse hematopoietic cells across six different cell types along the hematopoietic cell lineage; five scRNA-seq datasets of pancreatic islet cells across five different technologies; coupled single-cell ATAC-seq (scATAC-seq) and scRNA-seq data of peripheral blood mononuclear cells (PBMC). Extensive results demonstrate that ADC is a powerful tool to uncover interrelated genes with strong biological implications and is scalable to large-scale datasets. Moreover, the number of such genes can serve as a metric to measure the similarity between two datasets, which could characterize the relative difference of diverse cell types and technologies.Author summary: The number and size of biological datasets (e.g., single-cell RNA-seq datasets) are booming recently. How to mine the relationships of genes across datasets is becoming an important issue. Computational tools of identifying differentially expressed genes have been comprehensively studied, but the interrelated genes across datasets are always neglected. Detecting of highly interrelated genes across datasets is hindered because the samples of them are always different and they could have different numbers of samples. To solve this problem, we present a new algorithm that can identify interrelated genes across datasets based on distance correlation. Our proposed algorithm is very efficient and works well in different technologies, i.e., RNA-seq, single-cell RNA-seq and single-cell ATAC-seq. Also, we found that the number of such highly interrelated genes can serve as a metric to measure the similarity between two datasets, which could characterize the relative difference of diverse cell types and technologies.

Suggested Citation

  • Qunlun Shen & Shihua Zhang, 2021. "Approximate distance correlation for selecting highly interrelated genes across datasets," PLOS Computational Biology, Public Library of Science, vol. 17(11), pages 1-18, November.
  • Handle: RePEc:plo:pcbi00:1009548
    DOI: 10.1371/journal.pcbi.1009548
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1009548
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1009548&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1009548?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Székely, Gábor J. & Rizzo, Maria L., 2013. "The distance correlation t-test of independence in high dimension," Journal of Multivariate Analysis, Elsevier, vol. 117(C), pages 193-213.
    2. Grace X. Y. Zheng & Jessica M. Terry & Phillip Belgrader & Paul Ryvkin & Zachary W. Bent & Ryan Wilson & Solongo B. Ziraldo & Tobias D. Wheeler & Geoff P. McDermott & Junjie Zhu & Mark T. Gregory & Jo, 2017. "Massively parallel digital transcriptional profiling of single cells," Nature Communications, Nature, vol. 8(1), pages 1-12, April.
    3. Anna, Petrenko, 2016. "Мaркування готової продукції як складова частина інформаційного забезпечення маркетингової діяльності підприємств овочепродуктового підкомплексу," Agricultural and Resource Economics: International Scientific E-Journal, Agricultural and Resource Economics: International Scientific E-Journal, vol. 2(1), March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Vivian Welch & Christine M. Mathew & Panteha Babelmorad & Yanfei Li & Elizabeth T. Ghogomu & Johan Borg & Monserrat Conde & Elizabeth Kristjansson & Anne Lyddiatt & Sue Marcus & Jason W. Nickerson & K, 2021. "Health, social care and technological interventions to improve functional ability of older adults living at home: An evidence and gap map," Campbell Systematic Reviews, John Wiley & Sons, vol. 17(3), September.
    2. Persson, Petra & Qiu, Xinyao & Rossin-Slater, Maya, 2021. "Family Spillover Effects of Marginal Diagnoses: The Case of ADHD," IZA Discussion Papers 14020, Institute of Labor Economics (IZA).
    3. Menkhoff, Lukas & Miethe, Jakob, 2019. "Tax evasion in new disguise? Examining tax havens' international bank deposits," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 176, pages 53-78.
    4. Ran Abramitzky & Roy Mill & Santiago Pérez, 2020. "Linking individuals across historical sources: A fully automated approach," Historical Methods: A Journal of Quantitative and Interdisciplinary History, Taylor & Francis Journals, vol. 53(2), pages 94-111, April.
    5. Werner Eichhorst & Ulf Rinne, 2017. "Digital Challenges for the Welfare State," CESifo Forum, ifo Institute - Leibniz Institute for Economic Research at the University of Munich, vol. 18(04), pages 03-08, December.
    6. Sant'Anna, Ana Claudia & Bergtold, Jason & Shanoyan, Aleksan & Caldas, Marcellus & Granco, Gabriel, 2021. "Deal or No Deal? Analysis of Bioenergy Feedstock Contract Choice with Multiple Opt-out Options and Contract Attribute Substitutability," 2021 Conference, August 17-31, 2021, Virtual 315289, International Association of Agricultural Economists.
    7. Tommaso Colussi & Ingo E. Isphording & Nico Pestel, 2021. "Minority Salience and Political Extremism," American Economic Journal: Applied Economics, American Economic Association, vol. 13(3), pages 237-271, July.
    8. Erkmen Giray Aslim, 2019. "The Relationship Between Health Insurance and Early Retirement: Evidence from the Affordable Care Act," Eastern Economic Journal, Palgrave Macmillan;Eastern Economic Association, vol. 45(1), pages 112-140, January.
    9. Balint, T. & Lamperti, F. & Mandel, A. & Napoletano, M. & Roventini, A. & Sapio, A., 2017. "Complexity and the Economics of Climate Change: A Survey and a Look Forward," Ecological Economics, Elsevier, vol. 138(C), pages 252-265.
    10. Edna P. Conwi & Alexander G. Cortez & Normita Ramos, 2016. "Effects of the Dualized Training Program on the Occupational Interest of the Students Enrolled in Bachelor of Science in Hotel and Restaurant Management," Indian Journal of Commerce and Management Studies, Educational Research Multimedia & Publications,India, vol. 7(1), pages 31-36, January.
    11. Nihan Akyelken, 2017. "Mobility-Related Economic Exclusion: Accessibility and Commuting Patterns in Industrial Zones in Turkey," Social Inclusion, Cogitatio Press, vol. 5(4), pages 175-182.
    12. Youngna Choi, 2022. "Economic Stimulus and Financial Instability: Recent Case of the U.S. Household," JRFM, MDPI, vol. 15(6), pages 1-25, June.
    13. Camillia Kong & John Coggon & Michael Dunn & Penny Cooper, 2019. "Judging Values and Participation in Mental Capacity Law," Laws, MDPI, vol. 8(1), pages 1-22, February.
    14. Dreher, Axel & Fuchs, Andreas & Langlotz, Sarah, 2019. "The effects of foreign aid on refugee flows," European Economic Review, Elsevier, vol. 112(C), pages 127-147.
    15. Dindo, Pietro & Massari, Filippo, 2020. "The wisdom of the crowd in dynamic economies," Theoretical Economics, Econometric Society, vol. 15(4), November.
    16. Ferrarini, Benno & Maupin, Julie & Hinojales , Marthe, 2017. "Distributed Ledger Technologies for Developing Asia," ADB Economics Working Paper Series 533, Asian Development Bank.
    17. Andrzej Cieślik & Sarhad Hamza, 2022. "Inward FDI, IFRS Adoption and Institutional Quality: Insights from the MENA Countries," IJFS, MDPI, vol. 10(3), pages 1-19, June.
    18. Georg Feigl & Markus Marterbauer & Miriam Rehm & Matthias Schnetzer & Sepp Zuckerstätter & Lars Nørvang Andersen & Thea Nissen & Signe Dahl & Peter Hohlfeld & Benjamin Lojak & Achim Truger & Andrew Wa, 2016. "The Elusive Recovery," SciencePo Working papers Main hal-03459084, HAL.
      • Georg Feigl & Markus Marterbauer & Miriam Rehm & Matthias Schnetzer & Sepp Zuckerstätter & Lars Nørvang Andersen & Thea Nissen & Signe Dahl & Peter Hohlfeld & Benjamin Lojak & Achim Truger & Andrew Wa, 2016. "The Elusive Recovery," PSE-Ecole d'économie de Paris (Postprint) hal-03459084, HAL.
      • Georg Feigl & Markus Marterbauer & Miriam Rehm & Matthias Schnetzer & Sepp Zuckerstätter & Lars Nørvang Andersen & Thea Nissen & Signe Dahl & Peter Hohlfeld & Benjamin Lojak & Thomas Theobald & Achim , 2016. "The Elusive Recovery," PSE Working Papers hal-03612850, HAL.
      • Georg Feigl & Markus Marterbauer & Miriam Rehm & Matthias Schnetzer & Sepp Zuckerstätter & Lars Nørvang Andersen & Thea Nissen & Signe Dahl & Peter Hohlfeld & Benjamin Lojak & Achim Truger & Andrew Wa, 2016. "The Elusive Recovery," Post-Print hal-03459084, HAL.
      • Georg Feigl & Markus Marterbauer & Miriam Rehm & Matthias Schnetzer & Sepp Zuckerstätter & Lars Nørvang Andersen & Thea Nissen & Signe Dahl & Peter Hohlfeld & Benjamin Lojak & Thomas Theobald & Achim , 2016. "The Elusive Recovery," Working Papers hal-03612850, HAL.
      • Georg Feigl & Markus Marterbauer & Miriam Rehm & Matthias Schnetzer & Sepp Zuckerstätter & Lars Nørvang Andersen & Thea Nissen & Signe Dahl & Peter Hohlfeld & Benjamin Lojak & Thomas Theobald & Achim , 2016. "The Elusive Recovery," SciencePo Working papers Main hal-03612850, HAL.
      • Georg Feigl & Markus Marterbauer & Miriam Rehm & Matthias Schnetzer & Sepp Zuckerstätter & Lars Nørvang Andersen & Thea Nissen & Signe Dahl & Peter Hohlfeld & Benjamin Lojak & Thomas Theobald & Achim , 2016. "The Elusive Recovery," PSE-Ecole d'économie de Paris (Postprint) hal-03612850, HAL.
    19. Billari, Francesco C. & Giuntella, Osea & Stella, Luca, 2018. "Broadband internet, digital temptations, and sleep," Journal of Economic Behavior & Organization, Elsevier, vol. 153(C), pages 58-76.
    20. Anastasios Evgenidis & Apostolos Fasianos, 2019. "Monetary Policy and Wealth Inequalities in Great Britain: Assessing the role of unconventional policies for a decade of household data," Papers 1912.09702, arXiv.org.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1009548. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.