IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2601.22659.html

Using SVM to Estimate and Predict Binary Choice Models

Author

Listed:
  • Yoosoon Chang
  • Joon Y. Park
  • Guo Yan

Abstract

The support vector machine (SVM) has an asymptotic behavior that parallels that of the quasi-maximum likelihood estimator (QMLE) for binary outcomes generated by a binary choice model (BCM), although it is not a QMLE. We show that, under the linear conditional mean condition for covariates given the systematic component used in the QMLE slope consistency literature, the slope of the separating hyperplane given by the SVM consistently estimates the BCM slope parameter, as long as the class weight is used as required when binary outcomes are severely imbalanced. The SVM slope estimator is asymptotically equivalent to that of logistic regression in this sense. The finite-sample performance of the two estimators can be quite distinct depending on the distributions of covariates and errors, but neither dominates the other. The intercept parameter of the BCM can be consistently estimated once a consistent estimator of its slope parameter is obtained.

Suggested Citation

  • Yoosoon Chang & Joon Y. Park & Guo Yan, 2026. "Using SVM to Estimate and Predict Binary Choice Models," Papers 2601.22659, arXiv.org.
  • Handle: RePEc:arx:papers:2601.22659
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2601.22659
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Susan Athey & Guido W. Imbens, 2019. "Machine Learning Methods That Economists Should Know About," Annual Review of Economics, Annual Reviews, vol. 11(1), pages 685-725, August.
    2. Newey, Whitney K. & Ruud, Paul A., 1994. "Density Weighted Linear Least Squares," Department of Economics, Working Paper Series qt9fc2n3jc, Department of Economics, Institute for Business and Economic Research, UC Berkeley.
    3. Athey, Susan & Imbens, Guido W., 2019. "Machine Learning Methods Economists Should Know About," Research Papers 3776, Stanford University, Graduate School of Business.
    4. Max H. Farrell & Tengyuan Liang & Sanjog Misra, 2021. "Deep Neural Networks for Estimation and Inference," Econometrica, Econometric Society, vol. 89(1), pages 181-213, January.
    5. Le‐Yu Chen & Sokbae Lee & Myung Jae Sung, 2014. "Maximum score estimation with nonparametrically generated regressors," Econometrics Journal, Royal Economic Society, vol. 17(3), pages 271-300, October.
    6. Horowitz, Joel L, 1992. "A Smoothed Maximum Score Estimator for the Binary Response Model," Econometrica, Econometric Society, vol. 60(3), pages 505-531, May.
    7. Sendhil Mullainathan & Jann Spiess, 2017. "Machine Learning: An Applied Econometric Approach," Journal of Economic Perspectives, American Economic Association, vol. 31(2), pages 87-106, Spring.
    8. Manski, Charles F., 1975. "Maximum score estimation of the stochastic utility model of choice," Journal of Econometrics, Elsevier, vol. 3(3), pages 205-228, August.
    9. Klein, Roger W & Spady, Richard H, 1993. "An Efficient Semiparametric Estimator for Binary Response Models," Econometrica, Econometric Society, vol. 61(2), pages 387-421, March.
    10. Xiang Zhang & Yichao Wu & Lan Wang & Runze Li, 2016. "Variable selection for support vector machines in moderately high dimensions," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 78(1), pages 53-76, January.
    11. Ruud, Paul A, 1983. "Sufficient Conditions for the Consistency of Maximum Likelihood Estimation Despite Misspecifications of Distribution in Multinomial Discrete Choice Models," Econometrica, Econometric Society, vol. 51(1), pages 225-228, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lewbel, Arthur, 2000. "Semiparametric qualitative response model estimation with unknown heteroscedasticity or instrumental variables," Journal of Econometrics, Elsevier, vol. 97(1), pages 145-177, July.
    2. Andrii Babii & Xi Chen & Eric Ghysels & Rohit Kumar, 2020. "Binary Choice under Asymmetric Loss in a Data-Rich Environment: Theory and an Application to Algorithmic Fairness," Papers 2010.08463, arXiv.org, revised Nov 2025.
    3. Combes, Pierre-Philippe & Gobillon, Laurent & Zylberberg, Yanos, 2022. "Urban economics in a historical perspective: Recovering data with machine learning," Regional Science and Urban Economics, Elsevier, vol. 94(C).
    4. Alan P. Ker & Abdoul G. Sam, 2018. "Semiparametric estimation of the link function in binary-choice single-index models," Computational Statistics, Springer, vol. 33(3), pages 1429-1455, September.
    5. Yan, Jin & Yoo, Hong Il, 2019. "Semiparametric estimation of the random utility model with rank-ordered choice data," Journal of Econometrics, Elsevier, vol. 211(2), pages 414-438.
    6. Hanemann, W. Michael & Kanninen, Barbara, 1996. "The Statistical Analysis Of Discrete-Response Cv Data," CUDARE Working Papers 25022, University of California, Berkeley, Department of Agricultural and Resource Economics.
    7. Magnac, Thierry & Maurin, Eric, 2007. "Identification and information in monotone binary models," Journal of Econometrics, Elsevier, vol. 139(1), pages 76-104, July.
    8. Ghysels, Eric & Babii, Andrii & Chen, Xi & Kumar, Rohit, 2020. "Binary Choice with Asymmetric Loss in a Data-Rich Environment: Theory and an Application to Racial Justice," CEPR Discussion Papers 15418, C.E.P.R. Discussion Papers.
    9. Ana Fernandez & Juan Rodriquez-Poo, 1997. "Estimation and specification testing in female labor participation models: parametric and semiparametric methods," Econometric Reviews, Taylor & Francis Journals, vol. 16(2), pages 229-247.
    10. Sophie-Charlotte Klose & Johannes Lederer, 2020. "A Pipeline for Variable Selection and False Discovery Rate Control With an Application in Labor Economics," Papers 2006.12296, arXiv.org, revised Jun 2020.
    11. Kyle Colangelo & Ying-Ying Lee, 2019. "Double debiased machine learning nonparametric inference with continuous treatments," CeMMAP working papers CWP72/19, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    12. Chen, Le-Yu & Lee, Sokbae, 2018. "Best subset binary prediction," Journal of Econometrics, Elsevier, vol. 206(1), pages 39-56.
    13. Hurmeranta, Risto & Lyytikäinen, Teemu, 2025. "Nominal Loss Aversion in the Housing Market and Household Mobility," Working Papers 178, VATT Institute for Economic Research.
    14. Dang, Hai-Anh & Carleto, Gero & Gourlay, Sydney & Abanokova, Kseniya, 2023. "Addressing Soil Quality Data Gaps with Imputation: Evidence from Ethiopia and Uganda," 2023 Annual Meeting, July 23-25, Washington D.C. 335648, Agricultural and Applied Economics Association.
    15. Park, Byeong U. & Simar, Léopold & Zelenyuk, Valentin, 2017. "Nonparametric estimation of dynamic discrete choice models for time series data," Computational Statistics & Data Analysis, Elsevier, vol. 108(C), pages 97-120.
    16. Mittelhammer, Ron C. & Judge, George, 2011. "A family of empirical likelihood functions and estimators for the binary response model," Journal of Econometrics, Elsevier, vol. 164(2), pages 207-217, October.
    17. Lahiri, Kajal & Yang, Liu, 2013. "Forecasting Binary Outcomes," Handbook of Economic Forecasting, in: G. Elliott & C. Granger & A. Timmermann (ed.), Handbook of Economic Forecasting, edition 1, volume 2, chapter 0, pages 1025-1106, Elsevier.
    18. Giorgio Chiovelli & Stelios Michalopoulus & Elias Papaioannou & Tanner Regan, 2025. "Illuminating the Global South," Working Papers 2025-009, The George Washington University, The Center for Economic Research.
    19. Arenas, Andreu & Calsamiglia, Caterina, 2022. "Gender Differences in High-Stakes Performance and College Admission Policies," IZA Discussion Papers 15550, IZA Network @ LISER.
    20. Mittelhammer, Ronald C. & Judge, George G., 2008. "A Minimum Power Divergence Class of CDFs and Estimators for Binary Choice Models," CUDARE Working Papers 37759, University of California, Berkeley, Department of Agricultural and Resource Economics.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2601.22659. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.