IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0282576.html

Strategies for data normalization and missing data imputation and consequences for potential diagnostic microRNA biomarkers in epithelial ovarian cancer

Author

Listed:
  • Joanna Lopacinska-Jørgensen
  • Patrick H D Petersen
  • Douglas V N P Oliveira
  • Claus K Høgdall
  • Estrid V Høgdall

Abstract

MicroRNAs (miRNAs) are small non-coding RNA molecules regulating gene expression with diagnostic potential in different diseases, including epithelial ovarian carcinomas (EOC). As only a few studies have been published on the identification of stable endogenous miRNA in EOC, there is no consensus which miRNAs should be used aiming standardization. Currently, U6-snRNA is widely adopted as a normalization control in RT-qPCR when investigating miRNAs in EOC; despite its variable expression across cancers being reported. Therefore, our goal was to compare different missing data and normalization approaches to investigate their impact on the choice of stable endogenous controls and subsequent survival analysis while performing expression analysis of miRNAs by RT-qPCR in most frequent subtype of EOC: high-grade serous carcinoma (HGSC). 40 miRNAs were included based on their potential as stable endogenous controls or as biomarkers in EOC. Following RNA extraction from formalin-fixed paraffin embedded tissues from 63 HGSC patients, RT-qPCR was performed with a custom panel covering 40 target miRNAs and 8 controls. The raw data was analyzed by applying various strategies regarding choosing stable endogenous controls (geNorm, BestKeeper, NormFinder, the comparative ΔCt method and RefFinder), missing data (single/multiple imputation), and normalization (endogenous miRNA controls, U6-snRNA or global mean). Based on our study, we propose hsa-miR-23a-3p and hsa-miR-193a-5p, but not U6-snRNA as endogenous controls in HGSC patients. Our findings are validated in two external cohorts retrieved from the NCBI Gene Expression Omnibus database. We present that the outcome of stability analysis depends on the histological composition of the cohort, and it might suggest unique pattern of miRNA stability profiles for each subtype of EOC. Moreover, our data demonstrates the challenge of miRNA data analysis by presenting various outcomes from normalization and missing data imputation strategies on survival analysis.

Suggested Citation

  • Joanna Lopacinska-Jørgensen & Patrick H D Petersen & Douglas V N P Oliveira & Claus K Høgdall & Estrid V Høgdall, 2023. "Strategies for data normalization and missing data imputation and consequences for potential diagnostic microRNA biomarkers in epithelial ovarian cancer," PLOS ONE, Public Library of Science, vol. 18(5), pages 1-18, May.
  • Handle: RePEc:plo:pone00:0282576
    DOI: 10.1371/journal.pone.0282576
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0282576
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0282576&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0282576?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Kowarik, Alexander & Templ, Matthias, 2016. "Imputation with the R Package VIM," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 74(i07).
    2. Akira Yokoi & Juntaro Matsuzaki & Yusuke Yamamoto & Yutaka Yoneoka & Kenta Takahashi & Hanako Shimizu & Takashi Uehara & Mitsuya Ishikawa & Shun-ichi Ikeda & Takumi Sonoda & Junpei Kawauchi & Satoko T, 2018. "Integrated extracellular microRNA profiling for ovarian cancer screening," Nature Communications, Nature, vol. 9(1), pages 1-10, December.
    3. repec:plo:pone00:0207319 is not listed on IDEAS
    4. repec:plo:pone00:0064393 is not listed on IDEAS
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Henry Webel & Lili Niu & Annelaura Bach Nielsen & Marie Locard-Paulet & Matthias Mann & Lars Juhl Jensen & Simon Rasmussen, 2024. "Imputation of label-free quantitative mass spectrometry-based proteomics data using self-supervised deep learning," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    2. Bram Janssens & Matthias Bogaert & Mathijs Maton, 2023. "Predicting the next Pogačar: a data analytical approach to detect young professional cycling talents," Annals of Operations Research, Springer, vol. 325(1), pages 557-588, June.
    3. Chhetri, Netra & Ghimire, Rajiv & Wagner, Melissa & Wang, Meng, 2020. "Global citizen deliberation: Case of world-wide views on climate and energy," Energy Policy, Elsevier, vol. 147(C).
    4. Ieva Burakauskaitė & Andrius Čiginas, 2023. "An Approach to Integrating a Non-Probability Sample in the Population Census," Mathematics, MDPI, vol. 11(8), pages 1-14, April.
    5. Carlos Miguel Lemos & Ross Joseph Gore & Ivan Puga-Gonzalez & F LeRon Shults, 2019. "Dimensionality and factorial invariance of religiosity among Christians and the religiously unaffiliated: A cross-cultural analysis based on the International Social Survey Programme," PLOS ONE, Public Library of Science, vol. 14(5), pages 1-36, May.
    6. Nicholas Tierney & Dianne Cook, 2018. "Expanding tidy data principles to facilitate missing data exploration, visualization and assessment of imputations," Monash Econometrics and Business Statistics Working Papers 14/18, Monash University, Department of Econometrics and Business Statistics.
    7. Natsuko Satomi-Tsushita & Akihiko Shimomura & Juntaro Matsuzaki & Yusuke Yamamoto & Junpei Kawauchi & Satoko Takizawa & Yoshiaki Aoki & Hiromi Sakamoto & Ken Kato & Chikako Shimizu & Takahiro Ochiya &, 2019. "Serum microRNA-based prediction of responsiveness to eribulin in metastatic breast cancer," PLOS ONE, Public Library of Science, vol. 14(9), pages 1-12, September.
    8. Samorodnitsky, Sarah & Wendt, Chris H. & Lock, Eric F., 2024. "Bayesian simultaneous factorization and prediction using multi-omic data," Computational Statistics & Data Analysis, Elsevier, vol. 197(C).
    9. Sonja Herrmann & Christian Nagel, 2023. "Early Careers of Graduates from Private and Public Universities in Germany: A Comparison of Income Differences Regarding the First Employment," Research in Higher Education, Springer;Association for Institutional Research, vol. 64(1), pages 129-146, February.
    10. Oliver Hirsch & Charles Christian Adarkwah, 2018. "The Issue of Burnout and Work Satisfaction in Younger GPs—A Cluster Analysis Utilizing the HaMEdSi Study," IJERPH, MDPI, vol. 15(10), pages 1-10, October.
    11. Ahmad R. Alsaber & Jiazhu Pan & Adeeba Al-Hurban, 2021. "Handling Complex Missing Data Using Random Forest Approach for an Air Quality Monitoring Dataset: A Case Study of Kuwait Environmental Data (2012 to 2018)," IJERPH, MDPI, vol. 18(3), pages 1-25, February.
    12. Marc Kuhn & Viola Marquardt & Sarah Selinka, 2021. "“Is Sharing Really Caring?”: The Role of Environmental Concern and Trust Reflecting Usage Intention of “Station-Based” and “Free-Floating”—Carsharing Business Models," Sustainability, MDPI, vol. 13(13), pages 1-18, July.
    13. Elena Escolano-Pérez & Marta Bestué, 2021. "Academic Achievement in Spanish Secondary School Students: The Inter-Related Role of Executive Functions, Physical Activity and Gender," IJERPH, MDPI, vol. 18(4), pages 1-25, February.
    14. Zhixin Lun & Ravindra Khattree, 2021. "Imputation for Skewed Data: Multivariate Lomax Case," Sankhya B: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 83(1), pages 86-113, May.
    15. Nengsih Titin Agustin & Bertrand Frédéric & Maumy-Bertrand Myriam & Meyer Nicolas, 2019. "Determining the number of components in PLS regression on incomplete data set," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 18(6), pages 1-28, December.
    16. Frank Tian-Fang Ye & Kuen-Fung Sin & Xiaozi Gao, 2021. "Subjective Well-Being among Parents of Children with Special Educational Needs in Hong Kong: Impacts of Stigmatized Identity and Discrimination under Social Unrest and COVID-19," IJERPH, MDPI, vol. 19(1), pages 1-13, December.
    17. Maciej Berk{e}sewicz & Greta Bia{l}kowska & Krzysztof Marcinkowski & Magdalena Ma'slak & Piotr Opiela & Robert Pater & Katarzyna Zadroga, 2019. "Enhancing the Demand for Labour survey by including skills from online job advertisements using model-assisted calibration," Papers 1908.06731, arXiv.org.
    18. Karthik Srinivasan & Faiz Currim & Sudha Ram, 2025. "A Reduced Modeling Approach for Making Predictions with Incomplete Data Having Blockwise Missing Patterns," INFORMS Joural on Data Science, INFORMS, vol. 4(1), pages 85-99, January.
    19. Schalk Burger & Searle Silverman & Gary van Vuuren, 2018. "Deriving Correlation Matrices for Missing Financial Time-Series Data," International Journal of Economics and Finance, Canadian Center of Science and Education, vol. 10(10), pages 105-105, October.
    20. Adel Bosch & Steven F. Koch, 2021. "Individual and Household Debt: Does Imputation Choice Matter?," Working Papers 202141, University of Pretoria, Department of Economics.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0282576. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.