IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0342115.html

Efficient talent identification in women’s football: A ranking-based approach for goal scoring analysis

Author

Listed:
  • Songyi Song
  • Hee-Su Kim

Abstract

Individual goal-scoring analysis in women’s football faces severe class imbalance and limited scouting resources, where classification metrics alone do not capture operational efficiency. We analyzed 2,535 non-goalkeeper player-match observations from the 2023 FIFA Women’s World Cup (736 unique players) with 51 performance features, excluding match-outcome variables to emphasize individual actions. Using nested cross-validation, LightGBM captured 79.4% of goal-scoring observations within the top 20% of ranked observations; an out-of-bag (OOB) bootstrap gains analysis yielded 73.9% capture at Top 20% (lift = 3.69x; 95% CI: 63.9%−84.3%). Permutation and SHAP consensus highlighted tactical availability (Total Offers) and combined technical/physical workload indicators (Passes Attempted, Jogging Distance, Top Speed). This proof-of-concept study shows that ranking-based evaluation improves scouting efficiency using basic match statistics, while thresholds and feature weights require validation in other competitive contexts.

Suggested Citation

  • Songyi Song & Hee-Su Kim, 2026. "Efficient talent identification in women’s football: A ranking-based approach for goal scoring analysis," PLOS ONE, Public Library of Science, vol. 21(2), pages 1-13, February.
  • Handle: RePEc:plo:pone00:0342115
    DOI: 10.1371/journal.pone.0342115
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0342115
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0342115&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0342115?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Constantinou Anthony Costa & Fenton Norman Elliott, 2012. "Solving the Problem of Inadequate Scoring Rules for Assessing Probabilistic Football Forecast Models," Journal of Quantitative Analysis in Sports, De Gruyter, vol. 8(1), pages 1-14, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Baboota, Rahul & Kaur, Harleen, 2019. "Predictive analysis and modelling football results using machine learning approach for English Premier League," International Journal of Forecasting, Elsevier, vol. 35(2), pages 741-755.
    2. Babatunde Buraimo & David Peel & Rob Simmons, 2013. "Systematic Positive Expected Returns in the UK Fixed Odds Betting Market: An Analysis of the Fink Tank Predictions," IJFS, MDPI, vol. 1(4), pages 1-15, December.
    3. Groll Andreas & Kneib Thomas & Mayr Andreas & Schauberger Gunther, 2018. "On the dependency of soccer scores – a sparse bivariate Poisson model for the UEFA European football championship 2016," Journal of Quantitative Analysis in Sports, De Gruyter, vol. 14(2), pages 65-79, June.
    4. Michael Lechner & Gabriel Okasa, 2025. "Random Forest estimation of the ordered choice model," Empirical Economics, Springer, vol. 68(1), pages 1-106, January.
    5. Catlin, Colin, 2025. "Adaptive forecasting in dynamic markets: An evaluation of AutoTS within the M6 competition," International Journal of Forecasting, Elsevier, vol. 41(4), pages 1485-1493.
    6. László Gyarmati & Csaba Mihálykó & Éva Orbán-Mihálykó, 2025. "Forecasting Outcomes Using Multi-Option, Advantage-Sensitive Thurstone-Motivated Models," Forecasting, MDPI, vol. 7(3), pages 1-19, June.
    7. Pearson Mitchell & Jr Glen Livingston & King Robert, 2020. "An exploration of predictive football modelling," Journal of Quantitative Analysis in Sports, De Gruyter, vol. 16(1), pages 27-39, March.
    8. Koopman, Siem Jan & Lit, Rutger, 2019. "Forecasting football match results in national league competitions using score-driven time series models," International Journal of Forecasting, Elsevier, vol. 35(2), pages 797-809.
    9. Chia-Hao Chang, 2021. "Construction of a Predictive Model for MLB Matches," Forecasting, MDPI, vol. 3(1), pages 1-11, February.
    10. Rebeggiani, Luca & Gross, Johannes, 2018. "Chance or Ability? The Efficiency of the Football Betting Market Revisited," VfS Annual Conference 2018 (Freiburg, Breisgau): Digital Economy 181563, Verein für Socialpolitik / German Economic Association.
    11. Wheatcroft Edward, 2021. "Evaluating probabilistic forecasts of football matches: the case against the ranked probability score," Journal of Quantitative Analysis in Sports, De Gruyter, vol. 17(4), pages 273-287, December.
    12. Wheatcroft, Edward, 2021. "Evaluating probabilistic forecasts of football matches: the case against the ranked probability score," LSE Research Online Documents on Economics 111494, London School of Economics and Political Science, LSE Library.
    13. Hans Eetvelde & Lars Magnus Hvattum & Christophe Ley, 2023. "The Probabilistic Final Standing Calculator: a fair stochastic tool to handle abruptly stopped football seasons," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 107(1), pages 251-269, March.
    14. Szczecinski Leszek, 2022. "G-Elo: generalization of the Elo algorithm by modeling the discretized margin of victory," Journal of Quantitative Analysis in Sports, De Gruyter, vol. 18(1), pages 1-14, March.
    15. Lasek, Jan & Gagolewski, Marek, 2021. "Interpretable sports team rating models based on the gradient descent algorithm," International Journal of Forecasting, Elsevier, vol. 37(3), pages 1061-1071.
    16. Alejandro Álvarez & Alejandro Cataldo & Guillermo Durán & Manuel Durán & Pablo Galaz & Iván Monardo & Denis Sauré, 2025. "Data science approach to simulating the FIFA World Cup Qatar 2022 at a website in tribute to Maradona," Computational Statistics, Springer, vol. 40(4), pages 2223-2247, April.
    17. Marc Garnica-Caparrós & Daniel Memmert & Fabian Wunderlich, 2022. "Artificial data in sports forecasting: a simulation framework for analysing predictive models in sports," Information Systems and e-Business Management, Springer, vol. 20(3), pages 551-580, September.
    18. Zachary J. Smith & J. Eric Bickel, 2020. "Additive Scoring Rules for Discrete Sample Spaces," Decision Analysis, INFORMS, vol. 17(2), pages 115-133, June.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0342115. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.