IDEAS home Printed from https://ideas.repec.org/a/spr/annopr/v325y2023i1d10.1007_s10479-021-04476-4.html
   My bibliography  Save this article

Predicting the next Pogačar: a data analytical approach to detect young professional cycling talents

Author

Listed:
  • Bram Janssens

    (Ghent University)

  • Matthias Bogaert

    (Ghent University)

  • Mathijs Maton

    (Ghent University)

Abstract

The importance of young athletes in the field of professional cycling has sky-rocketed during the past years. Nevertheless, the early talent identification of these riders largely remains a subjective assessment. Therefore, an analytical system which automatically detects talented riders based on their freely available youth results should be installed. However, such a system cannot be copied directly from related fields, as large distinctions are observed between cycling and other sports. The aim of this paper is to develop such a data analytical system, which leverages the unique features of each race and thereby focusses on feature engineering, data quality, and visualization. To facilitate the deployment of prediction algorithms in situations without complete cases, we propose an adaptation to the k-nearest neighbours imputation algorithm which uses expert knowledge. Overall, our proposed method correlates strongly with eventual rider performance and can aid scouts in targeting young talents. On top of that, we introduce several model interpretation tools to give insight into which current starting professional riders are expected to perform well and why.

Suggested Citation

  • Bram Janssens & Matthias Bogaert & Mathijs Maton, 2023. "Predicting the next Pogačar: a data analytical approach to detect young professional cycling talents," Annals of Operations Research, Springer, vol. 325(1), pages 557-588, June.
  • Handle: RePEc:spr:annopr:v:325:y:2023:i:1:d:10.1007_s10479-021-04476-4
    DOI: 10.1007/s10479-021-04476-4
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10479-021-04476-4
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10479-021-04476-4?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Kursa, Miron B. & Rudnicki, Witold R., 2010. "Feature Selection with the Boruta Package," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 36(i11).
    2. Kowarik, Alexander & Templ, Matthias, 2016. "Imputation with the R Package VIM," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 74(i07).
    3. Vomfell, Lara & Härdle, Wolfgang Karl & Lessmann, Stefan, 2018. "Improving Crime Count Forecasts Using Twitter and Taxi Data," IRTG 1792 Discussion Papers 2018-013, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    4. Daniel Joseph Larson & Joel G. Maxcy, 2016. "Human Capital Development in Professional Cycling," Sports Economics, Management, and Policy, in: Daam Van Reeth & Daniel Joseph Larson (ed.), The Economics of Professional Road Cycling, edition 1, chapter 0, pages 129-145, Springer.
    5. Wladimir Andreff, 2016. "The Tour de France: A Success Story in Spite of Competitive Imbalance and Doping," Sports Economics, Management, and Policy, in: Daam Van Reeth & Daniel Joseph Larson (ed.), The Economics of Professional Road Cycling, edition 1, chapter 0, pages 233-255, Springer.
    6. Van Reeth, Daam, 2019. "Forecasting Tour de France TV audiences: A multi-country analysis," International Journal of Forecasting, Elsevier, vol. 35(2), pages 810-821.
    7. Ashwani Kumar & Viet Nguyen & Kwong Teo, 2016. "Commuter cycling policy in Singapore: a farecard data analytics based approach," Annals of Operations Research, Springer, vol. 236(1), pages 57-73, January.
    8. Daam Van Reeth & Daniel Joseph Larson (ed.), 2016. "The Economics of Professional Road Cycling," Sports Economics, Management and Policy, Springer, edition 1, number 978-3-319-22312-4, September.
    9. Ashwani Kumar & Viet Anh Nguyen & Kwong Meng Teo, 2016. "Commuter cycling policy in Singapore: a farecard data analytics based approach," Annals of Operations Research, Springer, vol. 236(1), pages 57-73, January.
    10. Daam Reeth, 2016. "Globalization in Professional Road Cycling," Sports Economics, Management, and Policy, in: Daam Van Reeth & Daniel Joseph Larson (ed.), The Economics of Professional Road Cycling, edition 1, chapter 0, pages 165-205, Springer.
    11. Anton Kocheturov & Panos M. Pardalos & Athanasia Karakitsiou, 2019. "Massive datasets and machine learning for computational biomedicine: trends and challenges," Annals of Operations Research, Springer, vol. 276(1), pages 5-34, May.
    12. Verbeke, Wouter & Dejaeger, Karel & Martens, David & Hur, Joon & Baesens, Bart, 2012. "New insights into churn prediction in the telecommunication sector: A profit driven data mining approach," European Journal of Operational Research, Elsevier, vol. 218(1), pages 211-229.
    13. Matthias Bogaert & Michel Ballings & Dirk Van den Poel, 2018. "Evaluating the importance of different communication types in romantic tie prediction on social media," Annals of Operations Research, Springer, vol. 263(1), pages 501-527, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Alexander Genoe & Ronald Rousseau & Sandra Rousseau, 2021. "Applying Google Trends’ Search Popularity Indicator to Professional Cycling," Journal of Sports Economics, , vol. 22(4), pages 459-485, May.
    2. Justin Longo & Alan Rodney Dobell, 2018. "The Limits of Policy Analytics: Early Examples and the Emerging Boundary of Possibilities," Politics and Governance, Cogitatio Press, vol. 6(4), pages 5-17.
    3. Che, Maohao & Wong, Yiik Diew & Lum, Kit Meng & Wang, Xueqin, 2021. "Interaction behaviour of active mobility users in shared space," Transportation Research Part A: Policy and Practice, Elsevier, vol. 153(C), pages 52-65.
    4. David Van Bulck & Arthur Vande Weghe & Dries Goossens, 2023. "Result-based talent identification in road cycling: discovering the next Eddy Merckx," Annals of Operations Research, Springer, vol. 325(1), pages 539-556, June.
    5. Chou, Ping & Chuang, Howard Hao-Chun & Chou, Yen-Chun & Liang, Ting-Peng, 2022. "Predictive analytics for customer repurchase: Interdisciplinary integration of buy till you die modeling and machine learning," European Journal of Operational Research, Elsevier, vol. 296(2), pages 635-651.
    6. Alexei A. Gaivoronski & Per Jonny Nesse & Olai Bendik Erdal, 2017. "Internet service provision and content services: paid peering and competition between internet providers," Netnomics, Springer, vol. 18(1), pages 43-79, May.
    7. Tong, Jianfeng & Liu, Zhenxing & Zhang, Yong & Zheng, Xiujuan & Jin, Junyang, 2023. "Improved multi-gate mixture-of-experts framework for multi-step prediction of gas load," Energy, Elsevier, vol. 282(C).
    8. Asma Shaheen & Javed Iqbal, 2018. "Spatial Distribution and Mobility Assessment of Carcinogenic Heavy Metals in Soil Profiles Using Geostatistics and Random Forest, Boruta Algorithm," Sustainability, MDPI, vol. 10(3), pages 1-20, March.
    9. Ramón Ferri-García & María del Mar Rueda, 2022. "Variable selection in Propensity Score Adjustment to mitigate selection bias in online surveys," Statistical Papers, Springer, vol. 63(6), pages 1829-1881, December.
    10. Ghosh, Indranil & Chaudhuri, Tamal Datta & Alfaro-Cortés, Esteban & Gámez, Matías & García, Noelia, 2022. "A hybrid approach to forecasting futures prices with simultaneous consideration of optimality in ensemble feature selection and advanced artificial intelligence," Technological Forecasting and Social Change, Elsevier, vol. 181(C).
    11. Koen W. de Bock & Arno de Caigny, 2021. "Spline-rule ensemble classifiers with structured sparsity regularization for interpretable customer churn modeling," Post-Print hal-03391564, HAL.
    12. Taiga Saito & Shivam Gupta, 2022. "Big Data Applications with Theoretical Models and Social Media in Financial Management," CIRJE F-Series CIRJE-F-1205, CIRJE, Faculty of Economics, University of Tokyo.
    13. Taiga Saito & Shivam Gupta, 2022. "Big data applications with theoretical models and social media in financial management," CARF F-Series CARF-F-550, Center for Advanced Research in Finance, Faculty of Economics, The University of Tokyo.
    14. Manuel J. García Rodríguez & Vicente Rodríguez Montequín & Francisco Ortega Fernández & Joaquín M. Villanueva Balsera, 2019. "Public Procurement Announcements in Spain: Regulations, Data Analysis, and Award Price Estimator Using Machine Learning," Complexity, Hindawi, vol. 2019, pages 1-20, November.
    15. Höppner, Sebastiaan & Stripling, Eugen & Baesens, Bart & Broucke, Seppe vanden & Verdonck, Tim, 2020. "Profit driven decision trees for churn prediction," European Journal of Operational Research, Elsevier, vol. 284(3), pages 920-933.
    16. Sangjin Kim & Jong-Min Kim, 2019. "Two-Stage Classification with SIS Using a New Filter Ranking Method in High Throughput Data," Mathematics, MDPI, vol. 7(6), pages 1-16, May.
    17. Kuczmaszewska, Anna & Yan, Ji Gao, 2018. "On complete convergence in Marcinkiewicz-Zygmund type SLLN for random variables," IRTG 1792 Discussion Papers 2018-041, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    18. Arjan S. Gosal & Janine A. McMahon & Katharine M. Bowgen & Catherine H. Hoppe & Guy Ziv, 2021. "Identifying and Mapping Groups of Protected Area Visitors by Environmental Awareness," Land, MDPI, vol. 10(6), pages 1-14, May.
    19. Zhao-Yue Chen & Hervé Petetin & Raúl Fernando Méndez Turrubiates & Hicham Achebak & Carlos Pérez García-Pando & Joan Ballester, 2024. "Population exposure to multiple air pollutants and its compound episodes in Europe," Nature Communications, Nature, vol. 15(1), pages 1-11, December.
    20. De Caigny, Arno & Coussement, Kristof & De Bock, Koen W., 2018. "A new hybrid classification algorithm for customer churn prediction based on logistic regression and decision trees," European Journal of Operational Research, Elsevier, vol. 269(2), pages 760-772.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:annopr:v:325:y:2023:i:1:d:10.1007_s10479-021-04476-4. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.