IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2512.21917.html

Semiparametric Preference Optimization: Your Language Model is Secretly a Single-Index Model

Author

Listed:
  • Nathan Kallus

Abstract

Aligning large language models (LLMs) to preference data typically assumes a known link function between observed preferences and latent rewards (e.g., a logistic Bradley-Terry link). Misspecification of this link can bias inferred rewards and misalign learned policies. We study preference alignment under an unknown and unrestricted link function. We show that realizability of $f$-divergence-constrained reward maximization in a policy class induces a semiparametric single-index binary choice model, where a scalar policy-dependent index captures all dependence on demonstrations and the remaining preference distribution is unrestricted. Rather than assuming this model has identifiable finite-dimensional structural parameters and estimating them, as in econometrics, we focus on policy learning with the reward function implicit, analyzing error to the optimal policy and allowing for unidentifiable nonparametric indices. We develop preference optimization algorithms robust to the unknown link and prove convergence guarantees in terms of generic function complexity measures. We demonstrate this empirically on LLM alignment. Code is available at https://github.com/causalml/spo/

Suggested Citation

  • Nathan Kallus, 2025. "Semiparametric Preference Optimization: Your Language Model is Secretly a Single-Index Model," Papers 2512.21917, arXiv.org, revised Feb 2026.
  • Handle: RePEc:arx:papers:2512.21917
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2512.21917
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Rust, John, 1987. "Optimal Replacement of GMC Bus Engines: An Empirical Model of Harold Zurcher," Econometrica, Econometric Society, vol. 55(5), pages 999-1033, September.
    2. Victor Chernozhukov & Christian Hansen & Nathan Kallus & Martin Spindler & Vasilis Syrgkanis, 2024. "Applied Causal Inference Powered by ML and AI," Papers 2403.02467, arXiv.org.
    3. V. Chernozhukov & I. Fernández-Val & A. Galichon, 2009. "Improving point and interval estimators of monotone functions by rearrangement," Biometrika, Biometrika Trust, vol. 96(3), pages 559-575.
    4. Horowitz, Joel L, 1992. "A Smoothed Maximum Score Estimator for the Binary Response Model," Econometrica, Econometric Society, vol. 60(3), pages 505-531, May.
    5. Train,Kenneth E., 2009. "Discrete Choice Methods with Simulation," Cambridge Books, Cambridge University Press, number 9780521766555, January.
    6. Peter Arcidiacono & Robert A. Miller, 2011. "Conditional Choice Probability Estimation of Dynamic Discrete Choice Models With Unobserved Heterogeneity," Econometrica, Econometric Society, vol. 79(6), pages 1823-1867, November.
    7. Richard W. Blundell & James L. Powell, 2004. "Endogeneity in Semiparametric Binary Response Models," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 71(3), pages 655-679.
    8. V. Joseph Hotz & Robert A. Miller, 1993. "Conditional Choice Probabilities and the Estimation of Dynamic Models," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 60(3), pages 497-529.
    9. V. Chernozhukov & I. Fernández-Val & A. Galichon, 2009. "Improving point and interval estimators of monotone functions by rearrangement," Biometrika, Biometrika Trust, vol. 96(3), pages 559-575.
    10. Whitney K. Newey, 2009. "Two-step series estimation of sample selection models," Econometrics Journal, Royal Economic Society, vol. 12(s1), pages 217-229, January.
    11. Manski, Charles F., 1985. "Semiparametric analysis of discrete response : Asymptotic properties of the maximum score estimator," Journal of Econometrics, Elsevier, vol. 27(3), pages 313-333, March.
    12. Dylan J. Foster & Vasilis Syrgkanis, 2019. "Orthogonal Statistical Learning," Papers 1901.09036, arXiv.org, revised Jun 2023.
    13. Powell, James L & Stock, James H & Stoker, Thomas M, 1989. "Semiparametric Estimation of Index Coefficients," Econometrica, Econometric Society, vol. 57(6), pages 1403-1430, November.
    14. Manski, Charles F., 1975. "Maximum score estimation of the stochastic utility model of choice," Journal of Econometrics, Elsevier, vol. 3(3), pages 205-228, August.
    15. Klein, Roger W & Spady, Richard H, 1993. "An Efficient Semiparametric Estimator for Binary Response Models," Econometrica, Econometric Society, vol. 61(2), pages 387-421, March.
    16. Steven Berry & James Levinsohn & Ariel Pakes, 2004. "Differentiated Products Demand Systems from a Combination of Micro and Macro Data: The New Car Market," Journal of Political Economy, University of Chicago Press, vol. 112(1), pages 68-105, February.
    17. Cosslett, Stephen R, 1983. "Distribution-Free Maximum Likelihood Estimator of the Binary Choice Model," Econometrica, Econometric Society, vol. 51(3), pages 765-782, May.
    18. Sherman, Robert P, 1993. "The Limiting Distribution of the Maximum Rank Correlation Estimator," Econometrica, Econometric Society, vol. 61(1), pages 123-137, January.
    19. Berry, Steven & Levinsohn, James & Pakes, Ariel, 1995. "Automobile Prices in Market Equilibrium," Econometrica, Econometric Society, vol. 63(4), pages 841-890, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jochmans, Koen, 2015. "Multiplicative-error models with sample selection," Journal of Econometrics, Elsevier, vol. 184(2), pages 315-327.
    2. Aradillas-Lopez, Andres, 2012. "Pairwise-difference estimation of incomplete information games," Journal of Econometrics, Elsevier, vol. 168(1), pages 120-140.
    3. Buchholz, Nicholas & Shum, Matthew & Xu, Haiqing, 2021. "Semiparametric estimation of dynamic discrete choice models," Journal of Econometrics, Elsevier, vol. 223(2), pages 312-327.
    4. repec:spo:wpmain:info:hdl:2441/3vl5fe4i569nbr005tctlc8ll5 is not listed on IDEAS
    5. Matzkin, Rosa L., 2019. "Constructive identification in some nonseparable discrete choice models," Journal of Econometrics, Elsevier, vol. 211(1), pages 83-103.
    6. Takahiro ITO, 2024. "Binary and Ordered Response Models in Randomized Experiments: Applications of the Resampling-Based Maximum Likelihood Method," GSICS Working Paper Series 42, Graduate School of International Cooperation Studies, Kobe University.
    7. Arthur Lewbel, 2019. "The Identification Zoo: Meanings of Identification in Econometrics," Journal of Economic Literature, American Economic Association, vol. 57(4), pages 835-903, December.
    8. repec:spo:wpecon:info:hdl:2441/3vl5fe4i569nbr005tctlc8ll5 is not listed on IDEAS
    9. Qi Li & Jeffrey Scott Racine, 2006. "Nonparametric Econometrics: Theory and Practice," Economics Books, Princeton University Press, edition 1, volume 1, number 8355, December.
    10. Lahiri, Kajal & Yang, Liu, 2013. "Forecasting Binary Outcomes," Handbook of Economic Forecasting, in: G. Elliott & C. Granger & A. Timmermann (ed.), Handbook of Economic Forecasting, edition 1, volume 2, chapter 0, pages 1025-1106, Elsevier.
    11. Coppejans, Mark, 2001. "Estimation of the binary response model using a mixture of distributions estimator (MOD)," Journal of Econometrics, Elsevier, vol. 102(2), pages 231-269, June.
    12. Hoderlein, Stefan & Sherman, Robert, 2015. "Identification and estimation in a correlated random coefficients binary response model," Journal of Econometrics, Elsevier, vol. 188(1), pages 135-149.
    13. Horowitz, Joel L., 2002. "Bootstrap critical values for tests based on the smoothed maximum score estimator," Journal of Econometrics, Elsevier, vol. 111(2), pages 141-167, December.
    14. Tiziano Arduini & Eleonora Patacchini & Edoardo Rainone, 2015. "Parametric and Semiparametric IV Estimation of Network Models with Selectivity," EIEF Working Papers Series 1509, Einaudi Institute for Economics and Finance (EIEF), revised Oct 2015.
    15. Magnac, Thierry & Maurin, Eric, 2007. "Identification and information in monotone binary models," Journal of Econometrics, Elsevier, vol. 139(1), pages 76-104, July.
    16. Chen, Le-Yu & Lee, Sokbae, 2019. "Breaking the curse of dimensionality in conditional moment inequalities for discrete choice models," Journal of Econometrics, Elsevier, vol. 210(2), pages 482-497.
    17. Joel L. Horowitz, 1996. "Bootstrap Critical Values for Tests Based on the Smoothed Maximum Score Estimator," Econometrics 9603003, University Library of Munich, Germany.
    18. Park, Byeong U. & Simar, Léopold & Zelenyuk, Valentin, 2017. "Nonparametric estimation of dynamic discrete choice models for time series data," Computational Statistics & Data Analysis, Elsevier, vol. 108(C), pages 97-120.
    19. Mittelhammer, Ron C. & Judge, George, 2011. "A family of empirical likelihood functions and estimators for the binary response model," Journal of Econometrics, Elsevier, vol. 164(2), pages 207-217, October.
    20. Mittelhammer, Ronald C. & Judge, George G., 2008. "A Minimum Power Divergence Class of CDFs and Estimators for Binary Choice Models," CUDARE Working Papers 37759, University of California, Berkeley, Department of Agricultural and Resource Economics.
    21. Yingying Dong & Arthur Lewbel, 2015. "A Simple Estimator for Binary Choice Models with Endogenous Regressors," Econometric Reviews, Taylor & Francis Journals, vol. 34(1-2), pages 82-105, February.
    22. Gao, Yichen & Li, Cong & Liang, Zhongwen, 2015. "Binary response correlated random coefficient panel data models," Journal of Econometrics, Elsevier, vol. 188(2), pages 421-434.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2512.21917. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.