IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0143465.html
   My bibliography  Save this article

Machine Learning Based Classification of Microsatellite Variation: An Effective Approach for Phylogeographic Characterization of Olive Populations

Author

Listed:
  • Bahareh Torkzaban
  • Amir Hossein Kayvanjoo
  • Arman Ardalan
  • Soraya Mousavi
  • Roberto Mariotti
  • Luciana Baldoni
  • Esmaeil Ebrahimie
  • Mansour Ebrahimi
  • Mehdi Hosseini-Mazinani

Abstract

Finding efficient analytical techniques is overwhelmingly turning into a bottleneck for the effectiveness of large biological data. Machine learning offers a novel and powerful tool to advance classification and modeling solutions in molecular biology. However, these methods have been less frequently used with empirical population genetics data. In this study, we developed a new combined approach of data analysis using microsatellite marker data from our previous studies of olive populations using machine learning algorithms. Herein, 267 olive accessions of various origins including 21 reference cultivars, 132 local ecotypes, and 37 wild olive specimens from the Iranian plateau, together with 77 of the most represented Mediterranean varieties were investigated using a finely selected panel of 11 microsatellite markers. We organized data in two ‘4-targeted’ and ‘16-targeted’ experiments. A strategy of assaying different machine based analyses (i.e. data cleaning, feature selection, and machine learning classification) was devised to identify the most informative loci and the most diagnostic alleles to represent the population and the geography of each olive accession. These analyses revealed microsatellite markers with the highest differentiating capacity and proved efficiency for our method of clustering olive accessions to reflect upon their regions of origin. A distinguished highlight of this study was the discovery of the best combination of markers for better differentiating of populations via machine learning models, which can be exploited to distinguish among other biological populations.

Suggested Citation

  • Bahareh Torkzaban & Amir Hossein Kayvanjoo & Arman Ardalan & Soraya Mousavi & Roberto Mariotti & Luciana Baldoni & Esmaeil Ebrahimie & Mansour Ebrahimi & Mehdi Hosseini-Mazinani, 2015. "Machine Learning Based Classification of Microsatellite Variation: An Effective Approach for Phylogeographic Characterization of Olive Populations," PLOS ONE, Public Library of Science, vol. 10(11), pages 1-17, November.
  • Handle: RePEc:plo:pone00:0143465
    DOI: 10.1371/journal.pone.0143465
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0143465
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0143465&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0143465?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Joseph Schlecht & Matthew E Kaplan & Kobus Barnard & Tatiana Karafet & Michael F Hammer & Nirav C Merchant, 2008. "Machine-Learning Approaches for Classifying Haplogroup from Y Chromosome STR Data," PLOS Computational Biology, Public Library of Science, vol. 4(6), pages 1-12, June.
    2. Adi L Tarca & Vincent J Carey & Xue-wen Chen & Roberto Romero & Sorin Drăghici, 2007. "Machine Learning and Its Applications to Biology," PLOS Computational Biology, Public Library of Science, vol. 3(6), pages 1-11, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Stephen Gang Wu & Yuxuan Wang & Wu Jiang & Tolutola Oyetunde & Ruilian Yao & Xuehong Zhang & Kazuyuki Shimizu & Yinjie J Tang & Forrest Sheng Bao, 2016. "Rapid Prediction of Bacterial Heterotrophic Fluxomics Using Machine Learning and Constraint Programming," PLOS Computational Biology, Public Library of Science, vol. 12(4), pages 1-22, April.
    2. Früh, Linus & Kampen, Helge & Kerkow, Antje & Schaub, Günter A. & Walther, Doreen & Wieland, Ralf, 2018. "Modelling the potential distribution of an invasive mosquito species: comparative evaluation of four machine learning methods and their combinations," Ecological Modelling, Elsevier, vol. 388(C), pages 136-144.
    3. Asa Ben-Hur & Cheng Soon Ong & Sören Sonnenburg & Bernhard Schölkopf & Gunnar Rätsch, 2008. "Support Vector Machines and Kernels for Computational Biology," PLOS Computational Biology, Public Library of Science, vol. 4(10), pages 1-10, October.
    4. Wang, Jia & Hu, Jun & Shen, Shifei & Zhuang, Jun & Ni, Shunjiang, 2020. "Crime risk analysis through big data algorithm with urban metrics," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 545(C).
    5. Lior Shamir & John D Delaney & Nikita Orlov & D Mark Eckley & Ilya G Goldberg, 2010. "Pattern Recognition Software and Techniques for Biological Image Analysis," PLOS Computational Biology, Public Library of Science, vol. 6(11), pages 1-10, November.
    6. Joana Rosado Coelho & João André Carriço & Daniel Knight & Jose-Luis Martínez & Ian Morrissey & Marco Rinaldo Oggioni & Ana Teresa Freitas, 2013. "The Use of Machine Learning Methodologies to Analyse Antibiotic and Biocide Susceptibility in Staphylococcus aureus," PLOS ONE, Public Library of Science, vol. 8(2), pages 1-10, February.
    7. Shun Adachi, 2017. "Rigid geometry solves “curse of dimensionality” effects in clustering methods: An application to omics data," PLOS ONE, Public Library of Science, vol. 12(6), pages 1-20, June.
    8. Parag Parashar & Chun Han Chen & Chandni Akbar & Sze Ming Fu & Tejender S Rawat & Sparsh Pratik & Rajat Butola & Shih Han Chen & Albert S Lin, 2019. "Analytics-statistics mixed training and its fitness to semisupervised manufacturing," PLOS ONE, Public Library of Science, vol. 14(8), pages 1-18, August.
    9. Ribeiro, Haroldo V. & Lopes, Diego D. & Pessa, Arthur A.B. & Martins, Alvaro F. & da Cunha, Bruno R. & Gonçalves, Sebastián & Lenzi, Ervin K. & Hanley, Quentin S. & Perc, Matjaž, 2023. "Deep learning criminal networks," Chaos, Solitons & Fractals, Elsevier, vol. 172(C).
    10. Dolores Wolfram & Ravi Starzl & Hubert Hackl & Derek Barclay & Theresa Hautz & Bettina Zelger & Gerald Brandacher & W P Andrew Lee & Nadine Eberhart & Yoram Vodovotz & Johann Pratschke & Gerhard Piere, 2014. "Insights from Computational Modeling in Inflammation and Acute Rejection in Limb Transplantation," PLOS ONE, Public Library of Science, vol. 9(6), pages 1-11, June.
    11. Lyaqini, S. & Nachaoui, M. & Hadri, A., 2022. "An efficient primal-dual method for solving non-smooth machine learning problem," Chaos, Solitons & Fractals, Elsevier, vol. 155(C).
    12. Malka N. Halgamuge, 2020. "Supervised Machine Learning Algorithms for Bioelectromagnetics: Prediction Models and Feature Selection Techniques Using Data from Weak Radiofrequency Radiation Effect on Human and Animals Cells," IJERPH, MDPI, vol. 17(12), pages 1-27, June.
    13. Dennis Pischel & Jörn H Buchbinder & Kai Sundmacher & Inna N Lavrik & Robert J Flassig, 2018. "A guide to automated apoptosis detection: How to make sense of imaging flow cytometry data," PLOS ONE, Public Library of Science, vol. 13(5), pages 1-17, May.
    14. Willcock, Simon & Martínez-López, Javier & Hooftman, Danny A.P. & Bagstad, Kenneth J. & Balbi, Stefano & Marzo, Alessia & Prato, Carlo & Sciandrello, Saverio & Signorello, Giovanni & Voigt, Brian & , 2018. "Machine learning for ecosystem services," Ecosystem Services, Elsevier, vol. 33(PB), pages 165-174.
    15. Takaya Saito & Marc Rehmsmeier, 2015. "The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets," PLOS ONE, Public Library of Science, vol. 10(3), pages 1-21, March.
    16. Guido Zampieri & Supreeta Vijayakumar & Elisabeth Yaneske & Claudio Angione, 2019. "Machine and deep learning meet genome-scale metabolic modeling," PLOS Computational Biology, Public Library of Science, vol. 15(7), pages 1-24, July.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0143465. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.