IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1010745.html
   My bibliography  Save this article

Optimized phylogenetic clustering of HIV-1 sequence data for public health applications

Author

Listed:
  • Connor Chato
  • Yi Feng
  • Yuhua Ruan
  • Hui Xing
  • Joshua Herbeck
  • Marcia Kalish
  • Art F Y Poon

Abstract

Clusters of genetically similar infections suggest rapid transmission and may indicate priorities for public health action or reveal underlying epidemiological processes. However, clusters often require user-defined thresholds and are sensitive to non-epidemiological factors, such as non-random sampling. Consequently the ideal threshold for public health applications varies substantially across settings. Here, we show a method which selects optimal thresholds for phylogenetic (subset tree) clustering based on population. We evaluated this method on HIV-1 pol datasets (n = 14, 221 sequences) from four sites in USA (Tennessee, Washington), Canada (Northern Alberta) and China (Beijing). Clusters were defined by tips descending from an ancestral node (with a minimum bootstrap support of 95%) through a series of branches, each with a length below a given threshold. Next, we used pplacer to graft new cases to the fixed tree by maximum likelihood. We evaluated the effect of varying branch-length thresholds on cluster growth as a count outcome by fitting two Poisson regression models: a null model that predicts growth from cluster size, and an alternative model that includes mean collection date as an additional covariate. The alternative model was favoured by AIC across most thresholds, with optimal (greatest difference in AIC) thresholds ranging 0.007–0.013 across sites. The range of optimal thresholds was more variable when re-sampling 80% of the data by location (IQR 0.008 − 0.016, n = 100 replicates). Our results use prospective phylogenetic cluster growth and suggest that there is more variation in effective thresholds for public health than those typically used in clustering studies.Author summary: A genetic cluster of virus infections is a group of DNA or RNA sequences that are much more similar to each other than they are to other infections from the same population of hosts. These clusters can reveal where virus transmission has been occurring the most rapidly, which can provide useful information for a public health response. Genetic clusters are often built by reconstructing a phylogeny—a tree-based model of how the sequences are related by common ancestors—and locating distinct parts of the tree with short branches. However, there are no objective, general-purpose criteria for deciding which parts of a tree constitute clusters, and there are an unlimited number of ways to partition a tree into clusters. In this study, we develop a computational method to determine the best clustering criteria based on our ability to predict where the next infections will occur. We apply this method to anonymized HIV-1 sequence data sets from Canada, the United States, and China, to characterize the sensitivity of clustering criteria to different risk populations and sampling contexts. Our results indicate that the clustering criteria typically used for phylogenetic studies of HIV-1 are not optimal for public health applications.

Suggested Citation

  • Connor Chato & Yi Feng & Yuhua Ruan & Hui Xing & Joshua Herbeck & Marcia Kalish & Art F Y Poon, 2022. "Optimized phylogenetic clustering of HIV-1 sequence data for public health applications," PLOS Computational Biology, Public Library of Science, vol. 18(11), pages 1-24, November.
  • Handle: RePEc:plo:pcbi00:1010745
    DOI: 10.1371/journal.pcbi.1010745
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1010745
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1010745&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1010745?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Tomoki Nakaya, 2000. "An Information Statistical Approach to the Modifiable Areal Unit Problem in Incidence Rate Maps," Environment and Planning A, , vol. 32(1), pages 91-109, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Abolfazl Mollalo & Alireza Mohammadi & Sara Mavaddati & Behzad Kiani, 2021. "Spatial Analysis of COVID-19 Vaccination: A Scoping Review," IJERPH, MDPI, vol. 18(22), pages 1-14, November.
    2. You, Liangzhi & Wood, Stanley, 2006. "An entropy approach to spatial disaggregation of agricultural production," Agricultural Systems, Elsevier, vol. 90(1-3), pages 329-347, October.
    3. Kelvyn Jones & David Manley & Ron Johnston & Dewi Owen, 2018. "Modelling residential segregation as unevenness and clustering: A multilevel modelling approach incorporating spatial dependence and tackling the MAUP," Environment and Planning B, , vol. 45(6), pages 1122-1141, November.
    4. Christoph Lambio & Tillman Schmitz & Richard Elson & Jeffrey Butler & Alexandra Roth & Silke Feller & Nicolai Savaskan & Tobia Lakes, 2023. "Exploring the Spatial Relative Risk of COVID-19 in Berlin-Neukölln," IJERPH, MDPI, vol. 20(10), pages 1-22, May.
    5. Nathaniel Bell & Nadine Schuurman, 2010. "GIS and Injury Prevention and Control: History, Challenges, and Opportunities," IJERPH, MDPI, vol. 7(3), pages 1-16, March.
    6. B. d'Hombres & L. Rocco & M. Suhrcke & M. McKee, 2010. "Does social capital determine health? Evidence from eight transition countries," Health Economics, John Wiley & Sons, Ltd., vol. 19(1), pages 56-74, January.
    7. repec:plo:pone00:0192892 is not listed on IDEAS
    8. Jared Hewko & Karen E Smoyer-Tomic & M John Hodgson, 2002. "Measuring Neighbourhood Spatial Accessibility to Urban Amenities: Does Aggregation Error Matter?," Environment and Planning A, , vol. 34(7), pages 1185-1206, July.
    9. Sun, Yeran & Wang, Shaohua & Zhang, Xucai & Chan, Ting On & Wu, Wenjie, 2021. "Estimating local-scale domestic electricity energy consumption using demographic, nighttime light imagery and Twitter data," Energy, Elsevier, vol. 226(C).
    10. Ortega, Emilio & López, Elena & Monzón, Andrés, 2014. "Territorial cohesion impacts of high-speed rail under different zoning systems," Journal of Transport Geography, Elsevier, vol. 34(C), pages 16-24.
    11. Juan C Duque & Henry Laniado & Adriano Polo, 2018. "S-maup: Statistical test to measure the sensitivity to the modifiable areal unit problem," PLOS ONE, Public Library of Science, vol. 13(11), pages 1-25, November.
    12. James A. Cheshire & Paul A. Longley & Keiji Yano & Tomoki Nakaya, 2014. "Japanese surname regions," Papers in Regional Science, Wiley Blackwell, vol. 93(3), pages 539-555, August.
    13. Jamie Spinney & Pavlos Kanaroglou & Darren Scott, 2011. "Exploring Spatial Dynamics with Land Price Indexes," Urban Studies, Urban Studies Journal Limited, vol. 48(4), pages 719-735, March.
    14. Briggs, David & Abellan, Juan J. & Fecht, Daniela, 2008. "Environmental inequity in England: Small area associations between socio-economic status and environmental pollution," Social Science & Medicine, Elsevier, vol. 67(10), pages 1612-1629, November.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1010745. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.