IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2604.22230.html

On Benchmark Hacking in ML Contests: Modeling, Insights and Design

Author

Listed:
  • Xiaoyun Qiu
  • Yang Yu
  • Haifeng Xu

Abstract

Benchmark hacking refers to tuning a machine learning model to score highly on certain evaluation criteria without improving true generalization or faithfully solving the intended problem. We study this phenomenon in a generic machine learning contest, where each contestant chooses two types of effort: creative effort that improves model capability as desired by the contest host, and mechanistic effort that only improves the model's fitness to the particular task in contest without contributing to true generalization. We establish the existence of a symmetric monotone pure strategy equilibrium in this competition game. It also provides a natural definition of benchmark hacking in this strategic context by comparing a player's equilibrium effort allocation to that of a single-agent baseline scenario. Under our definition, contestants with types below certain threshold (low types) always engage in benchmark hacking, whereas those above the threshold do not. Furthermore, we show that more skewed reward structures (favoring top-ranked contestants) can elicit more desirable contest outcomes. We also provide empirical evidence to support our theoretical predictions.

Suggested Citation

  • Xiaoyun Qiu & Yang Yu & Haifeng Xu, 2026. "On Benchmark Hacking in ML Contests: Modeling, Insights and Design," Papers 2604.22230, arXiv.org.
  • Handle: RePEc:arx:papers:2604.22230
    as

    Download full text from publisher

    File URL: https://arxiv.org/pdf/2604.22230
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Benny Moldovanu & Aner Sela, 2008. "The Optimal Allocation of Prizes in Contests," Springer Books, in: Roger D. Congleton & Arye L. Hillman & Kai A. Konrad (ed.), 40 Years of Research on Rent Seeking 1, pages 615-631, Springer.
    2. Vivek Bhattacharya, 2021. "An Empirical Model of R&D Procurement Contests: An Analysis of the DOD SBIR Program," Econometrica, Econometric Society, vol. 89(5), pages 2189-2224, September.
    3. Jorge Lemus & Guillermo Marshall, 2021. "Dynamic Tournament Design: Evidence from Prediction Contests," Journal of Political Economy, University of Chicago Press, vol. 129(2), pages 383-420.
    4. Glenn Ellison, 2002. "Evolving Standards for Academic Publishing: A q-r Theory," Journal of Political Economy, University of Chicago Press, vol. 110(5), pages 994-1034, October.
    5. Bryan, Kevin A. & Lemus, Jorge & Marshall, Guillermo, 2022. "R&D competition and the direction of innovation," International Journal of Industrial Organization, Elsevier, vol. 82(C).
    6. Eduardo Perez‐Richet & Vasiliki Skreta, 2022. "Test Design Under Falsification," Econometrica, Econometric Society, vol. 90(3), pages 1109-1142, May.
    7. Gonzalo Cisternas, 2018. "Career Concerns and the Nature of Skills," American Economic Journal: Microeconomics, American Economic Association, vol. 10(2), pages 152-189, May.
    8. Narayanan, M P, 1985. "Managerial Incentives for Short-term Results," Journal of Finance, American Finance Association, vol. 40(5), pages 1469-1484, December.
    9. Richard L. Fullerton & R. Preston McAfee, 1999. "Auctioning Entry into Tournaments," Journal of Political Economy, University of Chicago Press, vol. 107(3), pages 573-605, June.
    10. Erik Brynjolfsson & Danielle Li & Lindsey Raymond, 2025. "Generative AI at Work," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 140(2), pages 889-942.
    11. Philip J. Reny, 2011. "On the Existence of Monotone Pure‐Strategy Equilibria in Bayesian Games," Econometrica, Econometric Society, vol. 79(2), pages 499-553, March.
    12. Clark, Derek J & Riis, Christian, 1998. "Competition over More Than One Prize," American Economic Review, American Economic Association, vol. 88(1), pages 276-289, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Kaplan, Todd R. & Zamir, Shmuel, 2015. "Advances in Auctions," Handbook of Game Theory with Economic Applications,, Elsevier.
    2. Shanglyu Deng & Hanming Fang & Qiang Fu & Zenan Wu, 2023. "Information Favoritism and Scoring Bias in Contests," NBER Working Papers 31036, National Bureau of Economic Research, Inc.
    3. Segev, Ella & Sela, Aner, 2014. "Multi-stage sequential all-pay auctions," European Economic Review, Elsevier, vol. 70(C), pages 371-382.
    4. Moldovanu, Benny & Sela, Aner, 2006. "Contest architecture," Journal of Economic Theory, Elsevier, vol. 126(1), pages 70-96, January.
    5. Derek Clark & Tore Nilssen, 2013. "Learning by doing in contests," Public Choice, Springer, vol. 156(1), pages 329-343, July.
    6. Aner Sela, 2002. "Contest Architecture (jointly with Benny Moldovanu)," Theory workshop papers 357966000000000088, UCLA Department of Economics.
    7. Dong, Xiaoqi & Fu, Qiang & Serena, Marco & Wu, Zenan, 2025. "R&D contest design with resource allocation and entry fees," Journal of Economic Behavior & Organization, Elsevier, vol. 238(C).
    8. Konrad, Kai A., 2007. "Strategy in contests: an introduction [Strategie in Turnieren – eine Einführung]," Discussion Papers, Research Unit: Market Processes and Governance SP II 2007-01, WZB Berlin Social Science Center.
    9. Fu, Qiang & Lu, Jingfeng, 2007. "Unifying Contests: from Noisy Ranking to Ratio-Form Contest Success Functions," MPRA Paper 6679, University Library of Munich, Germany.
    10. Bastani, Spencer & Giebe, Thomas & Gürtler, Oliver, 2022. "Simple equilibria in general contests," Games and Economic Behavior, Elsevier, vol. 134(C), pages 264-280.
    11. Joshua Graff Zivin & Elizabeth Lyons, 2021. "The Effects of Prize Structures on Innovative Performance," AEA Papers and Proceedings, American Economic Association, vol. 111, pages 577-581, May.
    12. Vivek Bhattacharya, 2021. "An Empirical Model of R&D Procurement Contests: An Analysis of the DOD SBIR Program," Econometrica, Econometric Society, vol. 89(5), pages 2189-2224, September.
    13. Benny Moldovanu & Aner Sela & Xianwen Shi, 2012. "Carrots And Sticks: Prizes And Punishments In Contests," Economic Inquiry, Western Economic Association International, vol. 50(2), pages 453-462, April.
    14. Ghazala Azmat & Marc Möller, 2018. "The Distribution of Talent Across Contests," Economic Journal, Royal Economic Society, vol. 128(609), pages 471-509, March.
    15. Yingkai Li & Xiaoyun Qiu, 2023. "Mechanism Design under Costly Signaling: the Value of Non-Coordination," Papers 2302.09168, arXiv.org, revised Feb 2026.
    16. Qiang Fu & Jingfeng Lu, 2012. "Micro foundations of multi-prize lottery contests: a perspective of noisy performance ranking," Social Choice and Welfare, Springer;The Society for Social Choice and Welfare, vol. 38(3), pages 497-517, March.
    17. Szymanski, Stefan & Valletti, Tommaso M., 2005. "Incentive effects of second prizes," European Journal of Political Economy, Elsevier, vol. 21(2), pages 467-481, June.
    18. Ming Hu & Lu Wang, 2021. "Joint vs. Separate Crowdsourcing Contests," Management Science, INFORMS, vol. 67(5), pages 2711-2728, May.
    19. Ghazala Azmat & Marc Möller, 2009. "Competition among contests," RAND Journal of Economics, RAND Corporation, vol. 40(4), pages 743-768, December.
    20. Morgan, John & Tumlinson, Justin & Várdy, Felix, 2022. "The limits of meritocracy," Journal of Economic Theory, Elsevier, vol. 201(C).

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2604.22230. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: https://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.