IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0272413.html

Use of mixed-type data clustering algorithm for characterizing temporal and spatial distribution of biosecurity border detections of terrestrial non-indigenous species

Author

Listed:
  • Barbara Kachigunda
  • Kerrie Mengersen
  • Devindri I Perera
  • Grey T Coupland
  • Johann van der Merwe
  • Simon McKirdy

Abstract

Appropriate inspection protocols and mitigation strategies are a critical component of effective biosecurity measures, enabling implementation of sound management decisions. Statistical models to analyze biosecurity surveillance data are integral to this decision-making process. Our research focuses on analyzing border interception biosecurity data collected from a Class A Nature Reserve, Barrow Island, in Western Australia and the associated covariates describing both spatial and temporal interception patterns. A clustering analysis approach was adopted using a generalization of the popular k-means algorithm appropriate for mixed-type data. The analysis approach compared the efficiency of clustering using only the numerical data, then subsequently including covariates to the clustering. Based on numerical data only, three clusters gave an acceptable fit and provided information about the underlying data characteristics. Incorporation of covariates into the model suggested four distinct clusters dominated by physical location and type of detection. Clustering increases interpretability of complex models and is useful in data mining to highlight patterns to describe underlying processes in biosecurity and other research areas. Availability of more relevant data would greatly improve the model. Based on outcomes from our research we recommend broader use of cluster models in biosecurity data, with testing of these models on more datasets to validate the model choice and identify important explanatory variables.

Suggested Citation

  • Barbara Kachigunda & Kerrie Mengersen & Devindri I Perera & Grey T Coupland & Johann van der Merwe & Simon McKirdy, 2022. "Use of mixed-type data clustering algorithm for characterizing temporal and spatial distribution of biosecurity border detections of terrestrial non-indigenous species," PLOS ONE, Public Library of Science, vol. 17(8), pages 1-22, August.
  • Handle: RePEc:plo:pone00:0272413
    DOI: 10.1371/journal.pone.0272413
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0272413
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0272413&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0272413?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Hinde, John & Demetrio, Clarice G. B., 1998. "Overdispersion: Models and estimation," Computational Statistics & Data Analysis, Elsevier, vol. 27(2), pages 151-170, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Maria Iannario, 2015. "Detecting latent components in ordinal data with overdispersion by means of a mixture distribution," Quality & Quantity: International Journal of Methodology, Springer, vol. 49(3), pages 977-987, May.
    2. Tony Vangeneugden & Geert Molenberghs & Geert Verbeke & Clarice G.B. Dem�trio, 2011. "Marginal correlation from an extended random-effects model for repeated and overdispersed counts," Journal of Applied Statistics, Taylor & Francis Journals, vol. 38(2), pages 215-232, September.
    3. Jeonghwan Kim & Woojoo Lee, 2019. "On testing the hidden heterogeneity in negative binomial regression models," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 82(4), pages 457-470, May.
    4. Molenberghs, Geert & Verbeke, Geert & Iddi, Samuel & Demétrio, Clarice G.B., 2012. "A combined beta and normal random-effects model for repeated, overdispersed binary and binomial data," Journal of Multivariate Analysis, Elsevier, vol. 111(C), pages 94-109.
    5. Steven Abrams & Marc Aerts & Geert Molenberghs & Niel Hens, 2017. "Parametric overdispersed frailty models for current status data," Biometrics, The International Biometric Society, vol. 73(4), pages 1388-1400, December.
    6. Yusuf OB & Bello T & Gureje O, 2017. "Zero Inflated Poisson and Zero Inflated Negative Binomial Models with Application to Number of Falls in the Elderly," Biostatistics and Biometrics Open Access Journal, Juniper Publishers Inc., vol. 1(4), pages 69-75, May.
    7. Christophe Croux & Irène Gijbels & Ilaria Prosdocimi, 2012. "Robust Estimation of Mean and Dispersion Functions in Extended Generalized Additive Models," Biometrics, The International Biometric Society, vol. 68(1), pages 31-44, March.
    8. Aeberhard, William H. & Cantoni, Eva & Heritier, Stephane, 2017. "Saddlepoint tests for accurate and robust inference on overdispersed count data," Computational Statistics & Data Analysis, Elsevier, vol. 107(C), pages 162-175.
    9. Sami Mestiri & Abdeljelil Farhat, 2021. "Using Non-parametric Count Model for Credit Scoring," Journal of Quantitative Economics, Springer;The Indian Econometric Society (TIES), vol. 19(1), pages 39-49, March.
    10. Feria-Domínguez, José Manuel & Jiménez-Rodríguez, Enrique & Sholarin, Ola, 2015. "Tackling the over-dispersion of operational risk: Implications on capital adequacy requirements," The North American Journal of Economics and Finance, Elsevier, vol. 31(C), pages 206-221.
    11. I. Gijbels & I. Prosdocimi & G. Claeskens, 2010. "Nonparametric estimation of mean and dispersion functions in extended generalized linear models," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 19(3), pages 580-608, November.
    12. I. Gijbels & I. Prosdocimi, 2011. "Smooth estimation of mean and dispersion function in extended generalized additive models with application to Italian induced abortion data," Journal of Applied Statistics, Taylor & Francis Journals, vol. 38(11), pages 2391-2411, December.
    13. Iddi, Samuel & Molenberghs, Geert, 2012. "A combined overdispersed and marginalized multilevel model," Computational Statistics & Data Analysis, Elsevier, vol. 56(6), pages 1944-1951.
    14. Cory Anderson & Shuai Zhou & Guangqing Chi, 2023. "Population-Wide Vaccination Hesitancy among the Amish: A County-Level Study of COVID-19 Vaccine Adoption and Implications for Public Health Policy and Practice," Population Research and Policy Review, Springer;Southern Demographic Association (SDA), vol. 42(4), pages 1-24, August.
    15. Iraj Kazemi & Fatemeh Hassanzadeh, 2021. "Marginalized random-effects models for clustered binomial data through innovative link functions," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 105(2), pages 197-228, June.
    16. Croux, C. & Gijbels, I. & Prosdocimi, I., 2010. "Robust Estimation of Mean and Dispersion Functions in Extended Generalized Additive Models," Other publications TiSEM a188c2bc-8a96-44c9-b1e6-0, Tilburg University, School of Economics and Management.
    17. Borges, Patrick & Rodrigues, Josemar & Balakrishnan, Narayanaswamy & Bazán, Jorge, 2014. "A COM–Poisson type generalization of the binomial distribution and its properties and applications," Statistics & Probability Letters, Elsevier, vol. 87(C), pages 158-166.
    18. Katiane S. Conceição & Marinho G. Andrade & Victor Hugo Lachos & Nalini Ravishanker, 2024. "Bayesian Inference for Zero-Modified Power Series Regression Models," Mathematics, MDPI, vol. 13(1), pages 1-30, December.
    19. Nasim Vahabi & Anoshirvan Kazemnejad & Somnath Datta, 2018. "A Marginalized Overdispersed Location Scale Model for Clustered Ordinal Data," Sankhya B: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 80(1), pages 103-134, December.
    20. Jussiane Nader Gonçalves & Wagner Barreto-Souza, 2020. "Flexible regression models for counts with high-inflation of zeros," METRON, Springer;Sapienza Università di Roma, vol. 78(1), pages 71-95, April.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0272413. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.