IDEAS home Printed from https://ideas.repec.org/a/eee/intfor/v35y2019i2p741-755.html
   My bibliography  Save this article

Predictive analysis and modelling football results using machine learning approach for English Premier League

Author

Listed:
  • Baboota, Rahul
  • Kaur, Harleen

Abstract

The introduction of artificial intelligence has given us the ability to build predictive systems with unprecedented accuracy. Machine learning is being used in virtually all areas in one way or another, due to its extreme effectiveness. One such area where predictive systems have gained a lot of popularity is the prediction of football match results. This paper demonstrates our work on the building of a generalized predictive model for predicting the results of the English Premier League. Using feature engineering and exploratory data analysis, we create a feature set for determining the most important factors for predicting the results of a football match, and consequently create a highly accurate predictive system using machine learning. We demonstrate the strong dependence of our models’ performances on important features. Our best model using gradient boosting achieved a performance of 0.2156 on the ranked probability score (RPS) metric for game weeks 6 to 38 for the English Premier League aggregated over two seasons (2014–2015 and 2015–2016), whereas the betting organizations that we consider (Bet365 and Pinnacle Sports) obtained an RPS value of 0.2012 for the same period. Since a lower RPS value represents a higher predictive accuracy, our model was not able to outperform the bookmaker’s predictions, despite obtaining promising results.

Suggested Citation

  • Baboota, Rahul & Kaur, Harleen, 2019. "Predictive analysis and modelling football results using machine learning approach for English Premier League," International Journal of Forecasting, Elsevier, vol. 35(2), pages 741-755.
  • Handle: RePEc:eee:intfor:v:35:y:2019:i:2:p:741-755
    DOI: 10.1016/j.ijforecast.2018.01.003
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0169207018300116
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.ijforecast.2018.01.003?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Lessmann, Stefan & Sung, Ming-Chien & Johnson, Johnnie E.V., 2010. "Alternative methods of predicting competitive events: An application in horserace betting markets," International Journal of Forecasting, Elsevier, vol. 26(3), pages 518-536, July.
    2. Štrumbelj, Erik & Vračar, Petar, 2012. "Simulating a basketball match with a homogeneous Markov model and forecasting the outcome," International Journal of Forecasting, Elsevier, vol. 28(2), pages 532-542.
    3. Boshnakov, Georgi & Kharrat, Tarak & McHale, Ian G., 2017. "A bivariate Weibull count model for forecasting association football scores," International Journal of Forecasting, Elsevier, vol. 33(2), pages 458-466.
    4. Asif, Muhammad & McHale, Ian G., 2016. "In-play forecasting of win probability in One-Day International cricket: A dynamic logistic regression model," International Journal of Forecasting, Elsevier, vol. 32(1), pages 34-43.
    5. Constantinou Anthony Costa & Fenton Norman Elliott, 2012. "Solving the Problem of Inadequate Scoring Rules for Assessing Probabilistic Football Forecast Models," Journal of Quantitative Analysis in Sports, De Gruyter, vol. 8(1), pages 1-14, March.
    6. Siem Jan Koopman & Rutger Lit, 2015. "A dynamic bivariate Poisson model for analysing and forecasting match results in the English Premier League," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 178(1), pages 167-186, January.
    7. Harleen Kaur & Ewa Lechman & Adam Marszk (ed.), 2017. "Catalyzing Development through ICT Adoption," Springer Books, Springer, number 978-3-319-56523-1, November.
    8. Constantinou Anthony Costa & Fenton Norman Elliott, 2013. "Determining the level of ability of football teams by dynamic ratings based on the relative discrepancies in scores between adversaries," Journal of Quantitative Analysis in Sports, De Gruyter, vol. 9(1), pages 37-50, March.
    9. Goddard, John, 2005. "Regression models for forecasting goals and match results in association football," International Journal of Forecasting, Elsevier, vol. 21(2), pages 331-340.
    10. Hvattum, Lars Magnus & Arntzen, Halvard, 2010. "Using ELO ratings for match result prediction in association football," International Journal of Forecasting, Elsevier, vol. 26(3), pages 460-470, July.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Christoph Schlembach & Sascha L. Schmidt & Dominik Schreyer & Linus Wunderlich, 2020. "Forecasting the Olympic medal distribution during a pandemic: a socio-economic machine learning model," Papers 2012.04378, arXiv.org, revised Jun 2021.
    2. Daniel Goller & Michael C. Knaus & Michael Lechner & Gabriel Okasa, 2021. "Predicting match outcomes in football by an Ordered Forest estimator," Chapters, in: Ruud H. Koning & Stefan Kesenne (ed.), A Modern Guide to Sports Economics, chapter 22, pages 335-355, Edward Elgar Publishing.
    3. Green, Lawrence & Sung, Ming-Chien & Ma, Tiejun & Johnson, Johnnie E. V., 2019. "To what extent can new web-based technology improve forecasts? Assessing the economic value of information derived from Virtual Globes and its rate of diffusion in a financial market," European Journal of Operational Research, Elsevier, vol. 278(1), pages 226-239.
    4. Galli, L. & Galvan, G. & Levato, T. & Liti, C. & Piccialli, V. & Sciandrone, M., 2021. "Football: Discovering elapsing-time bias in the science of success," Chaos, Solitons & Fractals, Elsevier, vol. 152(C).
    5. Wheatcroft Edward, 2021. "Evaluating probabilistic forecasts of football matches: the case against the ranked probability score," Journal of Quantitative Analysis in Sports, De Gruyter, vol. 17(4), pages 273-287, December.
    6. Chunyang Huang & Shaoliang Zhang, 2023. "Explainable artificial intelligence model for identifying Market Value in Professional Soccer Players," Papers 2311.04599, arXiv.org, revised Nov 2023.
    7. Maurizio Carpita & Enrico Ciavolino & Paola Pasca, 2021. "Players’ Role-Based Performance Composite Indicators of Soccer Teams: A Statistical Perspective," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 156(2), pages 815-830, August.
    8. Harleen Kaur & Shafqat Ul Ahsaan & Bhavya Alankar & Victor Chang, 2021. "A Proposed Sentiment Analysis Deep Learning Algorithm for Analyzing COVID-19 Tweets," Information Systems Frontiers, Springer, vol. 23(6), pages 1417-1429, December.
    9. Wheatcroft, Edward, 2021. "Evaluating probabilistic forecasts of football matches: the case against the ranked probability score," LSE Research Online Documents on Economics 111494, London School of Economics and Political Science, LSE Library.
    10. Federico Fioravanti & Fernando Delbianco & Fernando Tohmé, 2023. "The relative importance of ability, luck and motivation in team sports: a Bayesian model of performance in the English Rugby Premiership," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 32(3), pages 715-731, September.
    11. Hassanniakalager, Arman & Sermpinis, Georgios & Stasinakis, Charalampos & Verousis, Thanos, 2020. "A conditional fuzzy inference approach in forecasting," European Journal of Operational Research, Elsevier, vol. 283(1), pages 196-216.
    12. Schlembach, Christoph & Schmidt, Sascha L. & Schreyer, Dominik & Wunderlich, Linus, 2022. "Forecasting the Olympic medal distribution – A socioeconomic machine learning model," Technological Forecasting and Social Change, Elsevier, vol. 175(C).
    13. da Costa, Igor Barbosa & Marinho, Leandro Balby & Pires, Carlos Eduardo Santos, 2022. "Forecasting football results and exploiting betting markets: The case of “both teams to score”," International Journal of Forecasting, Elsevier, vol. 38(3), pages 895-909.
    14. Butler, David & Butler, Robert & Eakins, John, 2021. "Expert performance and crowd wisdom: Evidence from English Premier League predictions," European Journal of Operational Research, Elsevier, vol. 288(1), pages 170-182.
    15. Koopman, Siem Jan & Lit, Rutger, 2019. "Forecasting football match results in national league competitions using score-driven time series models," International Journal of Forecasting, Elsevier, vol. 35(2), pages 797-809.
    16. Fry, John & Serbera, Jean-Philippe & Wilson, Rob, 2021. "Managing performance expectations in association football," Journal of Business Research, Elsevier, vol. 135(C), pages 445-453.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Marc Garnica-Caparrós & Daniel Memmert & Fabian Wunderlich, 2022. "Artificial data in sports forecasting: a simulation framework for analysing predictive models in sports," Information Systems and e-Business Management, Springer, vol. 20(3), pages 551-580, September.
    2. Wunderlich, Fabian & Memmert, Daniel, 2020. "Are betting returns a useful measure of accuracy in (sports) forecasting?," International Journal of Forecasting, Elsevier, vol. 36(2), pages 713-722.
    3. Lasek, Jan & Gagolewski, Marek, 2021. "Interpretable sports team rating models based on the gradient descent algorithm," International Journal of Forecasting, Elsevier, vol. 37(3), pages 1061-1071.
    4. Koopman, Siem Jan & Lit, Rutger, 2019. "Forecasting football match results in national league competitions using score-driven time series models," International Journal of Forecasting, Elsevier, vol. 35(2), pages 797-809.
    5. J. James Reade & Carl Singleton & Alasdair Brown, 2021. "Evaluating strange forecasts: The curious case of football match scorelines," Scottish Journal of Political Economy, Scottish Economic Society, vol. 68(2), pages 261-285, May.
    6. da Costa, Igor Barbosa & Marinho, Leandro Balby & Pires, Carlos Eduardo Santos, 2022. "Forecasting football results and exploiting betting markets: The case of “both teams to score”," International Journal of Forecasting, Elsevier, vol. 38(3), pages 895-909.
    7. Wheatcroft, Edward, 2020. "A profitable model for predicting the over/under market in football," LSE Research Online Documents on Economics 103712, London School of Economics and Political Science, LSE Library.
    8. Szczecinski Leszek, 2022. "G-Elo: generalization of the Elo algorithm by modeling the discretized margin of victory," Journal of Quantitative Analysis in Sports, De Gruyter, vol. 18(1), pages 1-14, March.
    9. Angelini, Giovanni & De Angelis, Luca, 2019. "Efficiency of online football betting markets," International Journal of Forecasting, Elsevier, vol. 35(2), pages 712-721.
    10. Song, Kai & Shi, Jian, 2020. "A gamma process based in-play prediction model for National Basketball Association games," European Journal of Operational Research, Elsevier, vol. 283(2), pages 706-713.
    11. Hubáček, Ondřej & Šír, Gustav, 2023. "Beating the market with a bad predictive model," International Journal of Forecasting, Elsevier, vol. 39(2), pages 691-719.
    12. Singleton, Carl & Reade, J. James & Brown, Alasdair, 2020. "Going with your gut: The (In)accuracy of forecast revisions in a football score prediction game," Journal of Behavioral and Experimental Economics (formerly The Journal of Socio-Economics), Elsevier, vol. 89(C).
    13. Groll Andreas & Kneib Thomas & Mayr Andreas & Schauberger Gunther, 2018. "On the dependency of soccer scores – a sparse bivariate Poisson model for the UEFA European football championship 2016," Journal of Quantitative Analysis in Sports, De Gruyter, vol. 14(2), pages 65-79, June.
    14. Wheatcroft, Edward, 2020. "A profitable model for predicting the over/under market in football," International Journal of Forecasting, Elsevier, vol. 36(3), pages 916-932.
    15. Hvattum Lars Magnus, 2015. "Playing on artificial turf may be an advantage for Norwegian soccer teams," Journal of Quantitative Analysis in Sports, De Gruyter, vol. 11(3), pages 183-192, September.
    16. J Reade & C Singleton & L Vaughan Williams, 2020. "Betting Markets for English Premier League Results and Scorelines: Evaluating a Simple Forecasting Model," Economic Issues Journal Articles, Economic Issues, vol. 25(1), pages 87-106, March.
    17. Pearson Mitchell & Jr Glen Livingston & King Robert, 2020. "An exploration of predictive football modelling," Journal of Quantitative Analysis in Sports, De Gruyter, vol. 16(1), pages 27-39, March.
    18. Federico Fioravanti & Fernando Delbianco & Fernando Tohmé, 2023. "The relative importance of ability, luck and motivation in team sports: a Bayesian model of performance in the English Rugby Premiership," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 32(3), pages 715-731, September.
    19. P. Gorgi & S. J. Koopman & R. Lit, 2023. "Estimation of final standings in football competitions with a premature ending: the case of COVID-19," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 107(1), pages 233-250, March.
    20. Marius Ötting & Christian Deutscher & Carl Singleton & Luca De Angelis, 2023. "Gambling on Momentum in Contests," Economics Discussion Papers em-dp2023-08, Department of Economics, University of Reading.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:intfor:v:35:y:2019:i:2:p:741-755. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/ijforecast .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.