IDEAS home Printed from https://ideas.repec.org/a/wly/riskan/v39y2019i6p1397-1413.html
   My bibliography  Save this article

Machine Learning Methods as a Tool for Predicting Risk of Illness Applying Next‐Generation Sequencing Data

Author

Listed:
  • Patrick Murigu Kamau Njage
  • Clementine Henri
  • Pimlapas Leekitcharoenphon
  • Michel‐Yves Mistou
  • Rene S. Hendriksen
  • Tine Hald

Abstract

Next‐generation sequencing (NGS) data present an untapped potential to improve microbial risk assessment (MRA) through increased specificity and redefinition of the hazard. Most of the MRA models do not account for differences in survivability and virulence among strains. The potential of machine learning algorithms for predicting the risk/health burden at the population level while inputting large and complex NGS data was explored with Listeria monocytogenes as a case study. Listeria data consisted of a percentage similarity matrix from genome assemblies of 38 and 207 strains of clinical and food origin, respectively. Basic Local Alignment (BLAST) was used to align the assemblies against a database of 136 virulence and stress resistance genes. The outcome variable was frequency of illness, which is the percentage of reported cases associated with each strain. These frequency data were discretized into seven ordinal outcome categories and used for supervised machine learning and model selection from five ensemble algorithms. There was no significant difference in accuracy between the models, and support vector machine with linear kernel was chosen for further inference (accuracy of 89% [95% CI: 68%, 97%]). The virulence genes FAM002725, FAM002728, FAM002729, InlF, InlJ, Inlk, IisY, IisD, IisX, IisH, IisB, lmo2026, and FAM003296 were important predictors of higher frequency of illness. InlF was uniquely truncated in the sequence type 121 strains. Most important risk predictor genes occurred at highest prevalence among strains from ready‐to‐eat, dairy, and composite foods. We foresee that the findings and approaches described offer the potential for rethinking the current approaches in MRA.

Suggested Citation

  • Patrick Murigu Kamau Njage & Clementine Henri & Pimlapas Leekitcharoenphon & Michel‐Yves Mistou & Rene S. Hendriksen & Tine Hald, 2019. "Machine Learning Methods as a Tool for Predicting Risk of Illness Applying Next‐Generation Sequencing Data," Risk Analysis, John Wiley & Sons, vol. 39(6), pages 1397-1413, June.
  • Handle: RePEc:wly:riskan:v:39:y:2019:i:6:p:1397-1413
    DOI: 10.1111/risa.13239
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/risa.13239
    Download Restriction: no

    File URL: https://libkey.io/10.1111/risa.13239?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. ,, 1998. "Problems And Solutions," Econometric Theory, Cambridge University Press, vol. 14(5), pages 687-698, October.
    2. Kuhn, Max, 2008. "Building Predictive Models in R Using the caret Package," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 28(i05).
    3. ,, 1998. "Problems And Solutions," Econometric Theory, Cambridge University Press, vol. 14(3), pages 381-386, June.
    4. ,, 1998. "Problems And Solutions," Econometric Theory, Cambridge University Press, vol. 14(4), pages 525-537, August.
    5. ,, 1998. "Problems And Solutions," Econometric Theory, Cambridge University Press, vol. 14(2), pages 285-292, April.
    6. ,, 1998. "Problems And Solutions," Econometric Theory, Cambridge University Press, vol. 14(1), pages 151-159, February.
    7. Brendan Maher, 2008. "Personal genomes: The case of the missing heritability," Nature, Nature, vol. 456(7218), pages 18-21, November.
    8. Régis Pouillot* & Karin Hoelzer & Yuhuan Chen & Sherri B. Dennis, 2015. "Listeria monocytogenes Dose Response Revisited—Incorporating Adjustments for Variability in Strain Virulence and Host Susceptibility," Risk Analysis, John Wiley & Sons, vol. 35(1), pages 90-108, January.
    9. Enrico Glaab & Jaume Bacardit & Jonathan M Garibaldi & Natalio Krasnogor, 2012. "Using Rule-Based Machine Learning for Candidate Disease Gene Prioritization and Sample Classification of Cancer Gene Expression Data," PLOS ONE, Public Library of Science, vol. 7(7), pages 1-18, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Dolf Talman & Zaifu Yang, 2012. "On a Parameterized System of Nonlinear Equations with Economic Applications," Journal of Optimization Theory and Applications, Springer, vol. 154(2), pages 644-671, August.
    2. Zhiqiang Zheng & Balaji Padmanabhan & Steven O. Kimbrough, 2003. "On the Existence and Significance of Data Preprocessing Biases in Web-Usage Mining," INFORMS Journal on Computing, INFORMS, vol. 15(2), pages 148-170, May.
    3. Herings, P.J.J. & Talman, A.J.J. & Yang, Z.F., 1999. "Variational Inequality Problems With a Continuum of Solutions : Existence and Computation," Other publications TiSEM 73e2f01b-ad4d-4447-95ba-a, Tilburg University, School of Economics and Management.
    4. Carlos R. Handy & Daniel Vrinceanu & Carl B. Marth & Harold A. Brooks, 2015. "Pointwise Reconstruction of Wave Functions from Their Moments through Weighted Polynomial Expansions: An Alternative Global-Local Quantization Procedure," Mathematics, MDPI, vol. 3(4), pages 1-24, November.
    5. Allen C. Goodman & Miron Stano, 2000. "Hmos and Health Externalities: A Local Public Good Perspective," Public Finance Review, , vol. 28(3), pages 247-269, May.
    6. Bode, Sven & Michaelowa, Axel, 2003. "Avoiding perverse effects of baseline and investment additionality determination in the case of renewable energy projects," Energy Policy, Elsevier, vol. 31(6), pages 505-517, May.
    7. Ala, Guido & Fasshauer, Gregory E. & Francomano, Elisa & Ganci, Salvatore & McCourt, Michael J., 2017. "An augmented MFS approach for brain activity reconstruction," Mathematics and Computers in Simulation (MATCOM), Elsevier, vol. 141(C), pages 3-15.
    8. Bettina Campedelli & Andrea Guerrina & Giulia Romano & Chiara Leardini, 2014. "La performance della rete ospedaliera pubblica della regione Veneto. L?impatto delle variabili ambientali e operative sull?efficienza," MECOSAN, FrancoAngeli Editore, vol. 2014(92), pages 119-142.
    9. Haider A. Khan, 2004. "General Conclusions: From Crisis to a Global Political Economy of Freedom," Palgrave Macmillan Books, in: Global Markets and Financial Crises in Asia, chapter 9, pages 193-211, Palgrave Macmillan.
    10. Penn Loh & Zoë Ackerman & Joceline Fidalgo & Rebecca Tumposky, 2022. "Co-Education/Co-Research Partnership: A Critical Approach to Co-Learning between Dudley Street Neighborhood Initiative and Tufts University," Social Sciences, MDPI, vol. 11(2), pages 1-17, February.
    11. Broekhuis, Manda & Vos, Janita F.J., 2003. "Improving organizational sustainability using a quality perspective," Research Report 03A43, University of Groningen, Research Institute SOM (Systems, Organisations and Management).
    12. O'Brien, Raymond & Patacchini, Eleonora, 2003. "Testing the exogeneity assumption in panel data models with "non classical" disturbances," Discussion Paper Series In Economics And Econometrics 0302, Economics Division, School of Social Sciences, University of Southampton.
    13. van der Laan, G. & Talman, A.J.J. & Yang, Z.F., 2002. "Perfection and Stability of Stationary Points with Applications in Noncooperative Games," Discussion Paper 2002-108, Tilburg University, Center for Economic Research.
    14. Edcarlos D. Silva & J. C. Albuquerque & T. R. Cavalcante, 2021. "Fourth-order nonlocal type elliptic problems with indefinite nonlinearities," Partial Differential Equations and Applications, Springer, vol. 2(2), pages 1-22, April.
    15. YongSeog Kim & W. Nick Street & Gary J. Russell & Filippo Menczer, 2005. "Customer Targeting: A Neural Network Approach Guided by Genetic Algorithms," Management Science, INFORMS, vol. 51(2), pages 264-276, February.
    16. Montijano, J.I. & Rández, L. & Van Daele, M. & Calvo, M., 2020. "On the numerical stability of the exponentially fitted methods for first order IVPs," Applied Mathematics and Computation, Elsevier, vol. 379(C).
    17. Yanling Li & Zita Oravecz & Shuai Zhou & Yosef Bodovski & Ian J. Barnett & Guangqing Chi & Yuan Zhou & Naomi P. Friedman & Scott I. Vrieze & Sy-Miin Chow, 2022. "Bayesian Forecasting with a Regime-Switching Zero-Inflated Multilevel Poisson Regression Model: An Application to Adolescent Alcohol Use with Spatial Covariates," Psychometrika, Springer;The Psychometric Society, vol. 87(2), pages 376-402, June.
    18. Jensen, Nathan M. & Li, Quan & Rahman, Aminur, 2007. "Heard melodies are sweet, but those unheard are sweeter : understanding corruption using cross-national firm-level surveys," Policy Research Working Paper Series 4413, The World Bank.
    19. Oscar J. Cacho & Robyn L. Hean & Russell M. Wise, 2003. "Carbon‐accounting methods and reforestation incentives," Australian Journal of Agricultural and Resource Economics, Australian Agricultural and Resource Economics Society, vol. 47(2), pages 153-179, June.
    20. Walter M. Cadette, 1999. "Financing Long-Term Care: Options for Policy," Economics Working Paper Archive wp_283, Levy Economics Institute.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:wly:riskan:v:39:y:2019:i:6:p:1397-1413. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://doi.org/10.1111/(ISSN)1539-6924 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.