IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1010420.html
   My bibliography  Save this article

Accounting for multiple imputation-induced variability for differential analysis in mass spectrometry-based label-free quantitative proteomics

Author

Listed:
  • Marie Chion
  • Christine Carapito
  • Frédéric Bertrand

Abstract

Imputing missing values is common practice in label-free quantitative proteomics. Imputation aims at replacing a missing value with a user-defined one. However, the imputation itself may not be optimally considered downstream of the imputation process, as imputed datasets are often considered as if they had always been complete. Hence, the uncertainty due to the imputation is not adequately taken into account. We provide a rigorous multiple imputation strategy, leading to a less biased estimation of the parameters’ variability thanks to Rubin’s rules. The imputation-based peptide’s intensities’ variance estimator is then moderated using Bayesian hierarchical models. This estimator is finally included in moderated t-test statistics to provide differential analyses results. This workflow can be used both at peptide and protein-level in quantification datasets. Indeed, an aggregation step is included for protein-level results based on peptide-level quantification data. Our methodology, named mi4p, was compared to the state-of-the-art limma workflow implemented in the DAPAR R package, both on simulated and real datasets. We observed a trade-off between sensitivity and specificity, while the overall performance of mi4p outperforms DAPAR in terms of F-Score.Author summary: Statistical inference methods commonly used in quantitative proteomics are based on the measurement of peptide intensities. They allow the deduction of protein abundances provided that sufficient peptides per protein are available. However, they do not satisfactorily consider peptides or proteins whose intensities are missing under certain conditions, even though they are particularly interesting from a biological or medical point of view, since they may explain a difference between the groups being compared. Some state-of-the-art statistical proteomics data processing software proposes to impute these missing values, while others simply remove proteins with too many missing peptides. The statistical treatment is not entirely satisfactory when imputation methods are used, notably multiple imputation techniques. Indeed, even if these statistical tools are relevant in this context, the data sets once imputed are considered as having always been complete in the subsequent analyses: the uncertainty caused by the imputation is not taken into account. These analyses generally conclude with a study of the differences in protein abundances between the different conditions, either using Student’s or Welch’s test for the most rudimentary approaches or using the t-tempered testing techniques based on empirical Bayesian approaches. Thus, we propose a new methodology that starts by imputing missing values at the peptide level and estimating the uncertainty associated with this imputation and naturally extends by incorporating this uncertainty into the current moderated variance estimation techniques.

Suggested Citation

  • Marie Chion & Christine Carapito & Frédéric Bertrand, 2022. "Accounting for multiple imputation-induced variability for differential analysis in mass spectrometry-based label-free quantitative proteomics," PLOS Computational Biology, Public Library of Science, vol. 18(8), pages 1-26, August.
  • Handle: RePEc:plo:pcbi00:1010420
    DOI: 10.1371/journal.pcbi.1010420
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1010420
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1010420&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1010420?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1010420. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.