IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0312278.html
   My bibliography  Save this article

Predicting goal probabilities with improved xG models using event sequences in association football

Author

Listed:
  • Ishara Bandara
  • Sergiy Shelyag
  • Sutharshan Rajasegarar
  • Dan Dwyer
  • Eun-jin Kim
  • Maia Angelova

Abstract

In association football, predicting the likelihood and outcome of a shot at a goal is useful but challenging. Expected goal (xG) models can be used in a variety of ways including evaluating performance and designing offensive strategies. This study proposed a novel framework that uses the events preceding a shot, to improve the accuracy of the expected goals (xG) metric. A combination of previously explored and unexplored temporal features is utilized in the proposed framework. The new features include; “advancement factor”, and “player position column”. A random forest model was used, which performed better than published single-event-based models in the literature. Results further demonstrated a significant improvement in model performance with the inclusion of preceding event information. The proposed framework and model enable the discovery of event sequences that improve xG, which include; opportunities built up from the sides of the 18-yard box, shots attempted from in front of the goal within the opposition’s 18-yard box, and shots from successful passes to the far post.

Suggested Citation

  • Ishara Bandara & Sergiy Shelyag & Sutharshan Rajasegarar & Dan Dwyer & Eun-jin Kim & Maia Angelova, 2024. "Predicting goal probabilities with improved xG models using event sequences in association football," PLOS ONE, Public Library of Science, vol. 19(10), pages 1-22, October.
  • Handle: RePEc:plo:pone00:0312278
    DOI: 10.1371/journal.pone.0312278
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0312278
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0312278&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0312278?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. James Mead & Anthony O’Hare & Paul McMenemy, 2023. "Expected goals in football: Improving model performance and demonstrating value," PLOS ONE, Public Library of Science, vol. 18(4), pages 1-29, April.
    2. Andreas Heuer & Oliver Rubner, 2012. "How Does the Past of a Soccer Match Influence Its Future? Concepts and Statistical Analysis," PLOS ONE, Public Library of Science, vol. 7(11), pages 1-7, November.
    3. Boscá, José E. & Liern, Vicente & Martínez, Aurelio & Sala, Ramøn, 2009. "Increasing offensive or defensive efficiency? An analysis of Italian and Spanish football," Omega, Elsevier, vol. 37(1), pages 63-78, February.
    4. Daniel Link & Steffen Lang & Philipp Seidenschwarz, 2016. "Real Time Quantification of Dangerousity in Football Using Spatiotemporal Tracking Data," PLOS ONE, Public Library of Science, vol. 11(12), pages 1-16, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Stijn Baert & Simon Amez, 2018. "No better moment to score a goal than just before half time? A soccer myth statistically tested," PLOS ONE, Public Library of Science, vol. 13(3), pages 1-17, March.
    2. repec:plo:pone00:0230179 is not listed on IDEAS
    3. Bolle Friedel & Otto Philipp E., 2016. "Matching as a Stochastic Process," Journal of Economics and Statistics (Jahrbuecher fuer Nationaloekonomie und Statistik), De Gruyter, vol. 236(3), pages 323-348, May.
    4. Nicolau, Juan L., 2012. "The effect of winning the 2010 FIFA World Cup on the tourism market value: The Spanish case," Omega, Elsevier, vol. 40(5), pages 503-510.
    5. Torben Tiedemann & Tammo Francksen & Uwe Latacz-Lohmann, 2011. "Assessing the performance of German Bundesliga football players: a non-parametric metafrontier approach," Central European Journal of Operations Research, Springer;Slovak Society for Operations Research;Hungarian Operational Research Society;Czech Society for Operations Research;Österr. Gesellschaft für Operations Research (ÖGOR);Slovenian Society Informatika - Section for Operational Research;Croatian Operational Research Society, vol. 19(4), pages 571-587, December.
    6. Valerio Ficcadenti & Roy Cerqueti & Ciro Hosseini Varde’i, 2023. "A rank-size approach to analyse soccer competitions and teams: the case of the Italian football league “Serie A"," Annals of Operations Research, Springer, vol. 325(1), pages 85-113, June.
    7. Giampiero Maci & Vincenzo Pacelli & Elisabetta D'Apolito, 2021. "Societ〠Di Calcio Europee Quotate E Mercati Finanziari: Un'Analisi Empirica Sulle Determinanti Dei Corsi Azionari," Rivista di Diritto ed Economia dello Sport, Centro di diritto e business dello Sport, vol. 17(2), pages 69-90, novembre.
    8. Clive B Beggs & Alexander J Bond & Stacey Emmonds & Ben Jones, 2019. "Hidden dynamics of soccer leagues: The predictive ‘power’ of partial standings," PLOS ONE, Public Library of Science, vol. 14(12), pages 1-28, December.
    9. Thanasis Bouzidis & Giannis Karagiannis, 2022. "Extending the zero-sum gains data envelopment analysis model," Journal of Productivity Analysis, Springer, vol. 58(2), pages 171-184, December.
    10. Bruno Gonçalves & Diogo Coutinho & Juliana Exel & Bruno Travassos & Carlos Lago & Jaime Sampaio, 2019. "Extracting spatial-temporal features that describe a team match demands when considering the effects of the quality of opposition in elite football," PLOS ONE, Public Library of Science, vol. 14(8), pages 1-20, August.
    11. Debnath Roma Mitra & Malhotra Ashish, 2015. "Measuring Efficiency of Nations in Multi Sport Events: A case of Commonwealth Games XIX," Naše gospodarstvo/Our economy, Sciendo, vol. 61(1), pages 25-36, March.
    12. S. Mohammad Arabzad & Mazaher Ghorbani & Arash Shahin, 2013. "Ranking players by DEA the case of English Premier League," International Journal of Industrial and Systems Engineering, Inderscience Enterprises Ltd, vol. 15(4), pages 443-461.
    13. Chen, Ying-Hsiu & Lai, Po-Lin & Piboonrungroj, Pairach, 2017. "The relationship between airport performance and privatisation policy: A nonparametric metafrontier approach," Journal of Transport Geography, Elsevier, vol. 62(C), pages 229-235.
    14. Andrés Picazo-Tadeo & Francisco González-Gómez, 2010. "Does playing several competitions influence a team’s league performance? Evidence from Spanish professional football," Central European Journal of Operations Research, Springer;Slovak Society for Operations Research;Hungarian Operational Research Society;Czech Society for Operations Research;Österr. Gesellschaft für Operations Research (ÖGOR);Slovenian Society Informatika - Section for Operational Research;Croatian Operational Research Society, vol. 18(3), pages 413-432, September.
    15. Isidoro Guzmán-Raja & Manuela Guzmán-Raja, 2021. "Measuring the Efficiency of Football Clubs Using Data Envelopment Analysis: Empirical Evidence From Spanish Professional Football," SAGE Open, , vol. 11(1), pages 21582440219, February.
    16. J Reade & C Singleton & L Vaughan Williams, 2020. "Betting Markets for English Premier League Results and Scorelines: Evaluating a Simple Forecasting Model," Economic Issues Journal Articles, Economic Issues, vol. 25(1), pages 87-106, March.
    17. Lucas Wu & Tim B. Swartz, 2025. "A new metric for pitch control based on an intuitive motion model," Computational Statistics, Springer, vol. 40(4), pages 1713-1730, April.
    18. van Ours, Jan C. & van Tuijl, Martin, 2010. "Country-specific goal-scoring in the "dying-seconds" of international football matches," CEPR Discussion Papers 7873, C.E.P.R. Discussion Papers.
    19. J. Brandon Bolen & Jon Rezek & Joshua D. Pitts, 2019. "Performance Efficiency in NCAA Basketball," Journal of Sports Economics, , vol. 20(2), pages 218-241, February.
    20. Andreas Heuer & Oliver Rubner, 2014. "Optimizing the Prediction Process: From Statistical Concepts to the Case Study of Soccer," PLOS ONE, Public Library of Science, vol. 9(9), pages 1-9, September.
    21. Francisco González-Gómez & Andrés J. Picazo-Tadeo, 2010. "Can We Be Satisfied With Our Football Team? Evidence From Spanish Professional Football," Journal of Sports Economics, , vol. 11(4), pages 418-442, August.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0312278. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.