IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1012258.html
   My bibliography  Save this article

Estimating error rates for single molecule protein sequencing experiments

Author

Listed:
  • Matthew Beauregard Smith
  • Kent VanderVelden
  • Thomas Blom
  • Heather D Stout
  • James H Mapes
  • Tucker M Folsom
  • Christopher Martin
  • Angela M Bardo
  • Edward M Marcotte

Abstract

The practical application of new single molecule protein sequencing (SMPS) technologies requires accurate estimates of their associated sequencing error rates. Here, we describe the development and application of two distinct parameter estimation methods for analyzing SMPS reads produced by fluorosequencing. A Hidden Markov Model (HMM) based approach, extends whatprot, where we previously used HMMs for SMPS peptide-read matching. This extension offers a principled approach for estimating key parameters for fluorosequencing experiments, including missed amino acid cleavages, dye loss, and peptide detachment. Specifically, we adapted the Baum-Welch algorithm, a standard technique to estimate transition probabilities for an HMM using expectation maximization, but modified here to estimate a small number of parameter values directly rather than estimating every transition probability independently. We demonstrate a high degree of accuracy on simulated data, but on experimental datasets, we observed that the model needed to be augmented with an additional error type, N-terminal blocking. This, in combination with data pre-processing, results in reasonable parameterizations of experimental datasets that agree with controlled experimental perturbations. A second independent implementation using a hybrid of DIRECT and Powell’s method to reduce the root mean squared error (RMSE) between simulations and the real dataset was also developed. We compare these methods on both simulated and real data, finding that our Baum-Welch based approach outperforms DIRECT and Powell’s method by most, but not all, criteria. Although some discrepancies between the results exist, we also find that both approaches provide similar error rate estimates from experimental single molecule fluorosequencing datasets.Author summary: Diverse new technologies are being developed for single-molecule protein sequencing, capable of identifying and quantifying mixtures of proteins at the level of individual molecules. There are many biochemical challenges intrinsic to high-throughput studies of proteins at such high sensitivity arising from their heterogeneous chemistries, sizes, and abundances. Beyond these challenges, the technologies themselves involve complex multi-step analytical processes. Thus, in developing and optimizing these technologies, it is important to consider the accuracy of each step and to have reliable approaches for estimating these accuracies. We focus on one particular single-molecule sequencing technology known as flourosequencing. We report and validate two methods for simultaneously determining the error-rates of each of the various steps of the fluorosequencing process. These new error estimation techniques will help researchers to better interpret the effects of changes to the chemistry and sample preparation used in fluorosequencing so that these steps can be improved. Further, more accurate determination of error rates will aid in the creation of better tools for the interpretation of this data.

Suggested Citation

  • Matthew Beauregard Smith & Kent VanderVelden & Thomas Blom & Heather D Stout & James H Mapes & Tucker M Folsom & Christopher Martin & Angela M Bardo & Edward M Marcotte, 2024. "Estimating error rates for single molecule protein sequencing experiments," PLOS Computational Biology, Public Library of Science, vol. 20(7), pages 1-24, July.
  • Handle: RePEc:plo:pcbi00:1012258
    DOI: 10.1371/journal.pcbi.1012258
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1012258
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1012258&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1012258?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1012258. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.