IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1014077.html

Large-scale paired chain BCR analysis reveals antibody clonal family inference bias and enhances resolution with machine learning

Author

Listed:
  • Hao Wang
  • Kaixuan Wang
  • Qihang Xu
  • Linru Cai
  • Chuanxiang Huang
  • Linlin Chen
  • Yunliang Zang
  • Xihao Hu
  • Jian Zhang

Abstract

A fundamental question in immunology is how the adaptive immune system encodes antigen specificity while maintaining repertoire diversity. B cell receptor (BCR) or antibody clonal families, defined by groups of B cells descending from a common ancestor, are key to deciphering this encoding. Although paired heavy and light chains jointly determine antibody specificity, most repertoire analyses have historically relied on heavy-chain-only data due to the loss of native pairing information in bulk BCR sequencing. This reliance introduces potential biases in computational clonal cluster inference, which may complicate efforts to resolve disease-associated immune signatures. Here, we leverage large-scale paired-chain BCR sequencing data to demonstrate that heavy-chain-based clustering may misrepresent true clonal architecture, and identified two major artifacts: chain-mixed clusters, in which similar heavy chains are paired with distinct light chains, and naive-like pseudo-clonal clusters, which are detected in an individual’s naive B cell repertoire and exhibit highly similar heavy and light chains without reflecting true clonal expansion. To address these limitations, we present fastBCR-p, an optimized framework that integrates light-chain-informed subclustering, with public sequence aware refinement to improve clonal family inference. By resolving both technical artifacts and biological convergence, fastBCR-p improves the chain concordance and overall clustering quality of clonal inference in real-world datasets. This enables more accurate tracking of immune dynamics in health and disease and facilitates the identification of clinically relevant antibody lineages.Author summary: Our immune system protects us by producing a vast and diverse collection of antibodies, each designed to recognize a specific target. These antibodies are made by B cells, which expand and evolve in groups known as clonal families. Accurately identifying these clonal families from sequencing data is essential for understanding immune responses during infection, vaccination, and disease. Most existing computational methods infer B-cell clonal families using information from only one part of the antibody, the heavy chain. This limitation largely reflects the fact that traditional sequencing technologies often lose information about how heavy and light chains are naturally paired. However, both chains are required to define antibody specificity. Using large-scale datasets that preserve native heavy–light chain pairing, we show that heavy-chain-only approaches can introduce systematic errors. These include incorrectly grouping together unrelated B cells and falsely identifying naive B cells as expanded clones. To address these limitations, we developed fastBCR-p, which integrates light-chain information and accounts for shared (“public”) antibody sequences. By correcting both technical artifacts and biological convergence, fastBCR-p enables more accurate clonal family inference and improves the analysis of immune repertoire dynamics and facilitating the identification of clinically relevant antibody lineages.

Suggested Citation

  • Hao Wang & Kaixuan Wang & Qihang Xu & Linru Cai & Chuanxiang Huang & Linlin Chen & Yunliang Zang & Xihao Hu & Jian Zhang, 2026. "Large-scale paired chain BCR analysis reveals antibody clonal family inference bias and enhances resolution with machine learning," PLOS Computational Biology, Public Library of Science, vol. 22(3), pages 1-22, March.
  • Handle: RePEc:plo:pcbi00:1014077
    DOI: 10.1371/journal.pcbi.1014077
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1014077
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1014077&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1014077?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Bryan Briney & Anne Inderbitzin & Collin Joyce & Dennis R. Burton, 2019. "Commonality despite exceptional diversity in the baseline human antibody repertoire," Nature, Nature, vol. 566(7744), pages 393-397, February.
    2. Oscar L. Rodriguez & Yana Safonova & Catherine A. Silver & Kaitlyn Shields & William S. Gibson & Justin T. Kos & David Tieri & Hanzhong Ke & Katherine J. L. Jackson & Scott D. Boyd & Melissa L. Smith , 2023. "Genetic variation in the immunoglobulin heavy chain locus shapes the human antibody repertoire," Nature Communications, Nature, vol. 14(1), pages 1-18, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Eric Engelbrecht & Oscar L. Rodriguez & William Lees & Zach Vanwinkle & Kaitlyn Shields & Steven Schultze & William S. Gibson & David R. Smith & Uddalok Jana & Swati Saha & Ayelet Peres & Gur Yaari & , 2025. "Germline polymorphisms in the immunoglobulin kappa and lambda loci underpinning antibody light chain repertoire variability," Nature Communications, Nature, vol. 16(1), pages 1-16, December.
    2. Nima Nouri & Steven H Kleinstein, 2020. "Somatic hypermutation analysis for improved identification of B cell clonal families from next-generation sequencing data," PLOS Computational Biology, Public Library of Science, vol. 16(6), pages 1-22, June.
    3. Aleksandr Kovaltsuk & Matthew I J Raybould & Wing Ki Wong & Claire Marks & Sebastian Kelm & James Snowden & Johannes Trück & Charlotte M Deane, 2020. "Structural diversity of B-cell receptor repertoires along the B-cell differentiation axis in humans and mice," PLOS Computational Biology, Public Library of Science, vol. 16(2), pages 1-20, February.
    4. Eduardo Gomez-Bañuelos & Yikai Yu & Jessica Li & Kevin S. Cashman & Merlin Paz & Maria Isabel Trejo-Zambrano & Regina Bugrovsky & Youliang Wang & Asiya Seema Chida & Cheryl A. Sherman-Baust & Dylan P., 2023. "Affinity maturation generates pathogenic antibodies with dual reactivity to DNase1L3 and dsDNA in systemic lupus erythematosus," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    5. Cosimo Lupo & Natanael Spisak & Aleksandra M Walczak & Thierry Mora, 2022. "Learning the statistics and landscape of somatic mutation-induced insertions and deletions in antibodies," PLOS Computational Biology, Public Library of Science, vol. 18(6), pages 1-23, June.
    6. Gyunghee Jo & Seiya Yamayoshi & Krystal M. Ma & Olivia Swanson & Jonathan L. Torres & James A. Ferguson & Monica L. Fernández-Quintero & Jiachen Huang & Jeffrey Copps & Alesandra J. Rodriguez & Jon M., 2025. "Structural basis of broad protection against influenza virus by human antibodies targeting the neuraminidase active site via a recurring motif in CDR H3," Nature Communications, Nature, vol. 16(1), pages 1-16, December.
    7. Yubin Liu & Ziyi Wang & Xinyu Zhuang & Shengnan Zhang & Zhicheng Chen & Yan Zou & Jie Sheng & Tianpeng Li & Wanbo Tai & Jinfang Yu & Yanqun Wang & Zhaoyong Zhang & Yunfeng Chen & Liangqin Tong & Xi Yu, 2023. "Inactivated vaccine-elicited potent antibodies can broadly neutralize SARS-CoV-2 circulating variants," Nature Communications, Nature, vol. 14(1), pages 1-17, December.
    8. Joan Capella-Pujol & Marlon Gast & Laura Radić & Ian Zon & Ana Chumbe & Sylvie Koekkoek & Wouter Olijhoek & Janke Schinkel & Marit J. Gils & Rogier W. Sanders & Kwinten Sliepen, 2023. "Signatures of VH1-69-derived hepatitis C virus neutralizing antibody precursors defined by binding to envelope glycoproteins," Nature Communications, Nature, vol. 14(1), pages 1-16, December.
    9. Yi-Chun Hsiao & Heidi Ackerly Wallweber & Robert G. Alberstein & Zhonghua Lin & Changchun Du & Ainhoa Etxeberria & Theint Aung & Yonglei Shang & Dhaya Seshasayee & Franziska Seeger & Andrew M. Watkins, 2024. "Rapid affinity optimization of an anti-TREM2 clinical lead antibody by cross-lineage immune repertoire mining," Nature Communications, Nature, vol. 15(1), pages 1-15, December.
    10. Andrew P. Cazier & Jaewoo Son & Sreenivas Yellayi & Lizmarie S. Chavez & Caden Young & Olivia M. Irvin & Hannah Abraham & Saachi Dalvi & John Blazeck, 2025. "Generating combinatorial diversity via engineered V(D)J-like recombination in Saccharomyces cerevisiae," Nature Communications, Nature, vol. 16(1), pages 1-16, December.
    11. Oscar L. Rodriguez & Yana Safonova & Catherine A. Silver & Kaitlyn Shields & William S. Gibson & Justin T. Kos & David Tieri & Hanzhong Ke & Katherine J. L. Jackson & Scott D. Boyd & Melissa L. Smith , 2023. "Genetic variation in the immunoglobulin heavy chain locus shapes the human antibody repertoire," Nature Communications, Nature, vol. 14(1), pages 1-18, December.
    12. Chengzi I. Kaku & Tyler N. Starr & Panpan Zhou & Haley L. Dugan & Paul Khalifé & Ge Song & Elizabeth R. Champney & Daniel W. Mielcarz & James C. Geoghegan & Dennis R. Burton & Raiees Andrabi & Jesse D, 2023. "Evolution of antibody immunity following Omicron BA.1 breakthrough infection," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    13. Mathieu Claireaux & Tom G. Caniels & Marlon Gast & Julianna Han & Denise Guerra & Gius Kerster & Barbera D. C. Schaik & Aldo Jongejan & Angela I. Schriek & Marloes Grobben & Philip J. M. Brouwer & Kar, 2022. "A public antibody class recognizes an S2 epitope exposed on open conformations of SARS-CoV-2 spike," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    14. Yansheng Li & Lin Cheng & Lixu Jiang & Zhuohuan Li & Jing Rao & Tong Wu & Fangyan Zhang & Baocai Xie & Yu He & Lianrong Wang & Zheng Zhang & Shi Chen, 2025. "A multivalent mRNA vaccine elicits robust immune responses and confers protection in a murine model of monkeypox virus infection," Nature Communications, Nature, vol. 16(1), pages 1-22, December.
    15. Christoph Kreer & Cosimo Lupo & Meryem S. Ercanoglu & Lutz Gieselmann & Natanael Spisak & Jan Grossbach & Maike Schlotz & Philipp Schommers & Henning Gruell & Leona Dold & Andreas Beyer & Armita Nourm, 2023. "Probabilities of developing HIV-1 bNAb sequence features in uninfected and chronically infected individuals," Nature Communications, Nature, vol. 14(1), pages 1-14, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1014077. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.