A Precise High-Dimensional Asymptotic Theory for Boosting and Minimum-L1-Norm Interpolated Classifiers

A Precise High-Dimensional Asymptotic Theory for Boosting and Minimum-L1-Norm Interpolated Classifiers

Author

Listed:

Tengyuan Liang
(University of Chicago - Booth School of Business)
Pragya Sur
(Harvard University - Department of Statistics)

Abstract

This paper establishes a precise high-dimensional asymptotic theory for boosting on separable data, taking statistical and computational perspectives. We consider the setting where the number of features (weak learners) p scales with the sample size n, in an over-parametrized regime. Under a broad class of statistical models, we provide an exact analysis of the generalization error of boosting, when the algorithm interpolates the training data and maximizes the empirical L1-margin. The relation between the boosting test error and the optimal Bayes error is pinned down explicitly. In turn, these precise characterizations resolve several open questions raised in [15, 81] surrounding boosting. On the computational front, we provide a sharp analysis of the stopping time when boosting approximately maximizes the empirical L1 margin. Furthermore, we discover that the larger the overparametrization ratio p/n, the smaller the proportion of active features (with zero initialization), and the faster the optimization reaches interpolation. At the heart of our theory lies an in-depth study of the maximum L1-margin, which can be accurately described by a new system of non-linear equations; we analyze this margin and the properties of this system, using Gaussian comparison techniques and a novel uniform deviation argument. Variants of AdaBoost corresponding to general Lq geometry, for q > 1, are also presented, together with an exact analysis of the high-dimensional generalization and optimization behavior of a class of these algorithms.

Suggested Citation

Tengyuan Liang & Pragya Sur, 2020. "A Precise High-Dimensional Asymptotic Theory for Boosting and Minimum-L1-Norm Interpolated Classifiers," Working Papers 2020-152, Becker Friedman Institute for Research In Economics.

Handle: RePEc:bfi:wpaper:2020-152

Download full text from publisher

References listed on IDEAS

Tengyuan Liang & Hai Tran-Bach, 2020. "Mehlerâ€™s Formula, Branching Process, and Compositional Kernels of Deep Neural Networks," Working Papers 2020-151, Becker Friedman Institute for Research In Economics.
Alexander Hanbo Li & Jelena Bradic, 2018. "Boosting in the Presence of Outliers: Adaptive Classification With Nonconvex Loss Functions," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(522), pages 660-674, April.
Jon Kleinberg & Sendhil Mullainathan, 2019. "Simplicity Creates Inequity: Implications for Fairness, Stereotypes, and Interpretability," NBER Working Papers 25854, National Bureau of Economic Research, Inc.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Kuanhao Jiang & Rajarshi Mukherjee & Subhabrata Sen & Pragya Sur, 2022. "A New Central Limit Theorem for the Augmented IPW Estimator: Variance Inflation, Cross-Fit Covariance and Beyond," Papers 2205.10198, arXiv.org, revised Oct 2022.
Tengyuan Liang, 2021. "Universal Prediction Band via Semi-Definite Programming," Papers 2103.17203, arXiv.org, revised Jan 2023.

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Zhu Wang, 2022. "MM for penalized estimation," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 31(1), pages 54-75, March.
Semyon Malamud & Andreas Schrimpf, 2021. "Persuasion by Dimension Reduction," Swiss Finance Institute Research Paper Series 21-69, Swiss Finance Institute.
- Semyon Malamud & Andreas Schrimpf, 2021. "Persuasion by Dimension Reduction," Papers 2110.08884, arXiv.org, revised Oct 2022.
Claire Lazar Reich, 2021. "Affirmative Action vs. Affirmative Information," Papers 2102.10019, arXiv.org, revised Oct 2024.
John W. Patty & Elizabeth Maggie Penn, 2022. "Algorithmic Fairness and Statistical Discrimination," Papers 2208.08341, arXiv.org.
Ju, Xiaomeng & Salibián-Barrera, Matías, 2021. "Robust boosting for regression problems," Computational Statistics & Data Analysis, Elsevier, vol. 153(C).
Piotr Skórka & Beata Grzywacz & Dawid Moroń & Magdalena Lenda, 2020. "The macroecology of the COVID-19 pandemic in the Anthropocene," PLOS ONE, Public Library of Science, vol. 15(7), pages 1-17, July.
Ashesh Rambachan & Jon Kleinberg & Sendhil Mullainathan & Jens Ludwig, 2020. "An Economic Approach to Regulating Algorithms," NBER Working Papers 27111, National Bureau of Economic Research, Inc.
Elizabeth Maggie Penn & John W. Patty, 2023. "Algorithms, Incentives, and Democracy," Papers 2307.02319, arXiv.org.
Alex Albright, 2024. "The Hidden Effects of Algorithmic Recommendations," Opportunity and Inclusive Growth Institute Working Papers 104, Federal Reserve Bank of Minneapolis.
Malamud, Semyon & Cieslak, Anna & Schrimpf, Paul, 2021. "Optimal Transport of Information," CEPR Discussion Papers 15859, C.E.P.R. Discussion Papers.
- Semyon Malamud & Anna Cieslak & Andreas Schrimpf, 2021. "Optimal Transport of Information," Papers 2102.10909, arXiv.org, revised Mar 2021.
- Semyon Malamud & Anna Cieslak & Andreas Schrimpf, 2021. "Optimal Transport of Information," Swiss Finance Institute Research Paper Series 21-15, Swiss Finance Institute.
Max H. Farrell & Tengyuan Liang & Sanjog Misra, 2020. "Deep Learning for Individual Heterogeneity," Papers 2010.14694, arXiv.org, revised Apr 2025.
Runshan Fu & Manmohan Aseri & Param Vir Singh & Kannan Srinivasan, 2022. "“Un”Fair Machine Learning Algorithms," Management Science, INFORMS, vol. 68(6), pages 4173-4195, June.
Heng Xu & Nan Zhang, 2022. "Implications of Data Anonymization on the Statistical Evidence of Disparity," Management Science, INFORMS, vol. 68(4), pages 2600-2618, April.

More about this item

NEP fields

This paper has been announced in the following NEP Reports:

NEP-CMP-2021-03-01 (Computational Economics)
NEP-ECM-2021-03-01 (Econometrics)

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bfi:wpaper:2020-152. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Toni Shears The email address of this maintainer does not seem to be valid anymore. Please ask Toni Shears to update the entry or send us the correct address (email available below). General contact details of provider: https://edirc.repec.org/data/mfichus.html .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

A Precise High-Dimensional Asymptotic Theory for Boosting and Minimum-L1-Norm Interpolated Classifiers

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

NEP fields

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data