IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0272413.html
   My bibliography  Save this article

Use of mixed-type data clustering algorithm for characterizing temporal and spatial distribution of biosecurity border detections of terrestrial non-indigenous species

Author

Listed:
  • Barbara Kachigunda
  • Kerrie Mengersen
  • Devindri I Perera
  • Grey T Coupland
  • Johann van der Merwe
  • Simon McKirdy

Abstract

Appropriate inspection protocols and mitigation strategies are a critical component of effective biosecurity measures, enabling implementation of sound management decisions. Statistical models to analyze biosecurity surveillance data are integral to this decision-making process. Our research focuses on analyzing border interception biosecurity data collected from a Class A Nature Reserve, Barrow Island, in Western Australia and the associated covariates describing both spatial and temporal interception patterns. A clustering analysis approach was adopted using a generalization of the popular k-means algorithm appropriate for mixed-type data. The analysis approach compared the efficiency of clustering using only the numerical data, then subsequently including covariates to the clustering. Based on numerical data only, three clusters gave an acceptable fit and provided information about the underlying data characteristics. Incorporation of covariates into the model suggested four distinct clusters dominated by physical location and type of detection. Clustering increases interpretability of complex models and is useful in data mining to highlight patterns to describe underlying processes in biosecurity and other research areas. Availability of more relevant data would greatly improve the model. Based on outcomes from our research we recommend broader use of cluster models in biosecurity data, with testing of these models on more datasets to validate the model choice and identify important explanatory variables.

Suggested Citation

  • Barbara Kachigunda & Kerrie Mengersen & Devindri I Perera & Grey T Coupland & Johann van der Merwe & Simon McKirdy, 2022. "Use of mixed-type data clustering algorithm for characterizing temporal and spatial distribution of biosecurity border detections of terrestrial non-indigenous species," PLOS ONE, Public Library of Science, vol. 17(8), pages 1-22, August.
  • Handle: RePEc:plo:pone00:0272413
    DOI: 10.1371/journal.pone.0272413
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0272413
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0272413&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0272413?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Hinde, John & Demetrio, Clarice G. B., 1998. "Overdispersion: Models and estimation," Computational Statistics & Data Analysis, Elsevier, vol. 27(2), pages 151-170, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Maria Iannario, 2015. "Detecting latent components in ordinal data with overdispersion by means of a mixture distribution," Quality & Quantity: International Journal of Methodology, Springer, vol. 49(3), pages 977-987, May.
    2. Molenberghs, Geert & Verbeke, Geert & Iddi, Samuel & Demétrio, Clarice G.B., 2012. "A combined beta and normal random-effects model for repeated, overdispersed binary and binomial data," Journal of Multivariate Analysis, Elsevier, vol. 111(C), pages 94-109.
    3. Steven Abrams & Marc Aerts & Geert Molenberghs & Niel Hens, 2017. "Parametric overdispersed frailty models for current status data," Biometrics, The International Biometric Society, vol. 73(4), pages 1388-1400, December.
    4. Aeberhard, William H. & Cantoni, Eva & Heritier, Stephane, 2017. "Saddlepoint tests for accurate and robust inference on overdispersed count data," Computational Statistics & Data Analysis, Elsevier, vol. 107(C), pages 162-175.
    5. Sami Mestiri & Abdeljelil Farhat, 2021. "Using Non-parametric Count Model for Credit Scoring," Journal of Quantitative Economics, Springer;The Indian Econometric Society (TIES), vol. 19(1), pages 39-49, March.
    6. I. Gijbels & I. Prosdocimi & G. Claeskens, 2010. "Nonparametric estimation of mean and dispersion functions in extended generalized linear models," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 19(3), pages 580-608, November.
    7. I. Gijbels & I. Prosdocimi, 2011. "Smooth estimation of mean and dispersion function in extended generalized additive models with application to Italian induced abortion data," Journal of Applied Statistics, Taylor & Francis Journals, vol. 38(11), pages 2391-2411, December.
    8. Iddi, Samuel & Molenberghs, Geert, 2012. "A combined overdispersed and marginalized multilevel model," Computational Statistics & Data Analysis, Elsevier, vol. 56(6), pages 1944-1951.
    9. Cory Anderson & Shuai Zhou & Guangqing Chi, 2023. "Population-Wide Vaccination Hesitancy among the Amish: A County-Level Study of COVID-19 Vaccine Adoption and Implications for Public Health Policy and Practice," Population Research and Policy Review, Springer;Southern Demographic Association (SDA), vol. 42(4), pages 1-24, August.
    10. Iraj Kazemi & Fatemeh Hassanzadeh, 2021. "Marginalized random-effects models for clustered binomial data through innovative link functions," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 105(2), pages 197-228, June.
    11. Croux, C. & Gijbels, I. & Prosdocimi, I., 2010. "Robust Estimation of Mean and Dispersion Functions in Extended Generalized Additive Models," Other publications TiSEM a188c2bc-8a96-44c9-b1e6-0, Tilburg University, School of Economics and Management.
    12. Borges, Patrick & Rodrigues, Josemar & Balakrishnan, Narayanaswamy & Bazán, Jorge, 2014. "A COM–Poisson type generalization of the binomial distribution and its properties and applications," Statistics & Probability Letters, Elsevier, vol. 87(C), pages 158-166.
    13. Katiane S. Conceição & Marinho G. Andrade & Victor Hugo Lachos & Nalini Ravishanker, 2024. "Bayesian Inference for Zero-Modified Power Series Regression Models," Mathematics, MDPI, vol. 13(1), pages 1-30, December.
    14. Nasim Vahabi & Anoshirvan Kazemnejad & Somnath Datta, 2018. "A Marginalized Overdispersed Location Scale Model for Clustered Ordinal Data," Sankhya B: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 80(1), pages 103-134, December.
    15. Jussiane Nader Gonçalves & Wagner Barreto-Souza, 2020. "Flexible regression models for counts with high-inflation of zeros," METRON, Springer;Sapienza Università di Roma, vol. 78(1), pages 71-95, April.
    16. Aregay, Mehreteab & Shkedy, Ziv & Molenberghs, Geert, 2013. "A hierarchical Bayesian approach for the analysis of longitudinal count data with overdispersion: A simulation study," Computational Statistics & Data Analysis, Elsevier, vol. 57(1), pages 233-245.
    17. Oludare Ariyo & Emmanuel Lesaffre & Geert Verbeke & Adrian Quintero, 2022. "Bayesian Model Selection for Longitudinal Count Data," Sankhya B: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 84(2), pages 516-547, November.
    18. William H. Greene & David A. Hensher, 2008. "Modeling Ordered Choices: A Primer and Recent Developments," Working Papers 08-26, New York University, Leonard N. Stern School of Business, Department of Economics.
    19. Lee, Dae-Jin & Durbán, María, 2008. "Smooth-car mixed models for spatial count data," DES - Working Papers. Statistics and Econometrics. WS ws085820, Universidad Carlos III de Madrid. Departamento de Estadística.
    20. Rahma Abid & Célestin C. Kokonendji & Afif Masmoudi, 2021. "On Poisson-exponential-Tweedie models for ultra-overdispersed count data," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 105(1), pages 1-23, March.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0272413. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.