IDEAS home Printed from https://ideas.repec.org/a/inm/oropre/v64y2016i1p2-16.html
   My bibliography  Save this article

OR Forum—An Algorithmic Approach to Linear Regression

Author

Listed:
  • Dimitris Bertsimas

    (Operations Research Center, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139)

  • Angela King

    (Operations Research Center, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139)

Abstract

Linear regression models are traditionally built through trial and error to balance many competing goals such as predictive power, interpretability, significance, robustness to error in data, and sparsity, among others. This problem lends itself naturally to a mixed integer quadratic optimization (MIQO) approach but has not been modeled this way because of the belief in the statistics community that MIQO is intractable for large scale problems. However, in the last 25 years (1991–2015), algorithmic advances in integer optimization combined with hardware improvements have resulted in an astonishing 450 billion factor speedup in solving mixed integer optimization problems. We present an MIQO-based approach for designing high quality linear regression models that explicitly addresses various competing objectives and demonstrate the effectiveness of our approach on both real and synthetic data sets.

Suggested Citation

  • Dimitris Bertsimas & Angela King, 2016. "OR Forum—An Algorithmic Approach to Linear Regression," Operations Research, INFORMS, vol. 64(1), pages 2-16, February.
  • Handle: RePEc:inm:oropre:v:64:y:2016:i:1:p:2-16
    DOI: 10.1287/opre.2015.1436
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/opre.2015.1436
    Download Restriction: no

    File URL: https://libkey.io/10.1287/opre.2015.1436?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Miles Lubin & Iain Dunning, 2015. "Computing in Operations Research Using Julia," INFORMS Journal on Computing, INFORMS, vol. 27(2), pages 238-248, May.
    2. Xiaotong Shen & Wei Pan & Yunzhang Zhu & Hui Zhou, 2013. "On constrained and regularized high-dimensional regression," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 65(5), pages 807-832, October.
    3. Mazumder, Rahul & Friedman, Jerome H. & Hastie, Trevor, 2011. "SparseNet: Coordinate Descent With Nonconvex Penalties," Journal of the American Statistical Association, American Statistical Association, vol. 106(495), pages 1125-1138.
    4. Ming Yuan & Yi Lin, 2006. "Model selection and estimation in regression with grouped variables," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 68(1), pages 49-67, February.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Navarro-García, Manuel & Guerrero, Vanesa & Durban, María, 2023. "On constrained smoothing and out-of-range prediction using P-splines: A conic optimization approach," Applied Mathematics and Computation, Elsevier, vol. 441(C).
    2. VÁZQUEZ-ALCOCER, Alan & SCHOEN, Eric D. & GOOS, Peter, 2018. "A mixed integer optimization approach for model selection in screening experiments," Working Papers 2018007, University of Antwerp, Faculty of Business and Economics.
    3. Benítez-Peña, Sandra & Bogetoft, Peter & Romero Morales, Dolores, 2020. "Feature Selection in Data Envelopment Analysis: A Mathematical Optimization approach," Omega, Elsevier, vol. 96(C).
    4. Berk Wheelock, Lauren & Pachamanova, Dessislava A., 2022. "Acceptable set topic modeling," European Journal of Operational Research, Elsevier, vol. 299(2), pages 653-673.
    5. Youssef M. Aboutaleb & Moshe Ben-Akiva & Patrick Jaillet, 2020. "Learning Structure in Nested Logit Models," Papers 2008.08048, arXiv.org.
    6. Matsypura, Dmytro & Thompson, Ryan & Vasnev, Andrey L., 2018. "Optimal selection of expert forecasts with integer programming," Omega, Elsevier, vol. 78(C), pages 165-175.
    7. Kraus, Mathias & Feuerriegel, Stefan & Oztekin, Asil, 2020. "Deep learning in business analytics and operations research: Models, applications and managerial implications," European Journal of Operational Research, Elsevier, vol. 281(3), pages 628-641.
    8. Benati, Stefano & Puerto, Justo & Rodríguez-Chía, Antonio M., 2017. "Clustering data that are graph connected," European Journal of Operational Research, Elsevier, vol. 261(1), pages 43-53.
    9. Young Woong Park & Diego Klabjan, 2020. "Subset selection for multiple linear regression via optimization," Journal of Global Optimization, Springer, vol. 77(3), pages 543-574, July.
    10. Ryuta Tamura & Ken Kobayashi & Yuichi Takano & Ryuhei Miyashiro & Kazuhide Nakata & Tomomi Matsui, 2019. "Mixed integer quadratic optimization formulations for eliminating multicollinearity based on variance inflation factor," Journal of Global Optimization, Springer, vol. 73(2), pages 431-446, February.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ben-Ameur, Walid & Neto, José, 2022. "New bounds for subset selection from conic relaxations," European Journal of Operational Research, Elsevier, vol. 298(2), pages 425-438.
    2. Wang, Yueyao & Lee, I-Chen & Hong, Yili & Deng, Xinwei, 2022. "Building degradation index with variable selection for multivariate sensory data," Reliability Engineering and System Safety, Elsevier, vol. 227(C).
    3. Rahul Ghosal & Arnab Maity & Timothy Clark & Stefano B. Longo, 2020. "Variable selection in functional linear concurrent regression," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 69(3), pages 565-587, June.
    4. Minh Pham & Xiaodong Lin & Andrzej Ruszczyński & Yu Du, 2021. "An outer–inner linearization method for non-convex and nondifferentiable composite regularization problems," Journal of Global Optimization, Springer, vol. 81(1), pages 179-202, September.
    5. Siwei Xia & Yuehan Yang & Hu Yang, 2022. "Sparse Laplacian Shrinkage with the Graphical Lasso Estimator for Regression Problems," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 31(1), pages 255-277, March.
    6. Gabriel E Hoffman & Benjamin A Logsdon & Jason G Mezey, 2013. "PUMA: A Unified Framework for Penalized Multiple Regression Analysis of GWAS Data," PLOS Computational Biology, Public Library of Science, vol. 9(6), pages 1-19, June.
    7. Hirose, Kei & Tateishi, Shohei & Konishi, Sadanori, 2013. "Tuning parameter selection in sparse regression modeling," Computational Statistics & Data Analysis, Elsevier, vol. 59(C), pages 28-40.
    8. Yen, Yu-Min & Yen, Tso-Jung, 2014. "Solving norm constrained portfolio optimization via coordinate-wise descent algorithms," Computational Statistics & Data Analysis, Elsevier, vol. 76(C), pages 737-759.
    9. Benjamin G. Stokell & Rajen D. Shah & Ryan J. Tibshirani, 2021. "Modelling high‐dimensional categorical data using nonconvex fusion penalties," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 83(3), pages 579-611, July.
    10. Yanhang Zhang & Junxian Zhu & Jin Zhu & Xueqin Wang, 2023. "A Splicing Approach to Best Subset of Groups Selection," INFORMS Journal on Computing, INFORMS, vol. 35(1), pages 104-119, January.
    11. Shanshan Qin & Hao Ding & Yuehua Wu & Feng Liu, 2021. "High-dimensional sign-constrained feature selection and grouping," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 73(4), pages 787-819, August.
    12. Friedman, Jerome H., 2012. "Fast sparse regression and classification," International Journal of Forecasting, Elsevier, vol. 28(3), pages 722-738.
    13. Tutz, Gerhard & Pößnecker, Wolfgang & Uhlmann, Lorenz, 2015. "Variable selection in general multinomial logit models," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 207-222.
    14. Margherita Giuzio, 2017. "Genetic algorithm versus classical methods in sparse index tracking," Decisions in Economics and Finance, Springer;Associazione per la Matematica, vol. 40(1), pages 243-256, November.
    15. Jun Yan & Jian Huang, 2012. "Model Selection for Cox Models with Time-Varying Coefficients," Biometrics, The International Biometric Society, vol. 68(2), pages 419-428, June.
    16. Ye, Ya-Fen & Shao, Yuan-Hai & Deng, Nai-Yang & Li, Chun-Na & Hua, Xiang-Yu, 2017. "Robust Lp-norm least squares support vector regression with feature selection," Applied Mathematics and Computation, Elsevier, vol. 305(C), pages 32-52.
    17. Guillaume Sagnol & Edouard Pauwels, 2019. "An unexpected connection between Bayes A-optimal designs and the group lasso," Statistical Papers, Springer, vol. 60(2), pages 565-584, April.
    18. Bakalli, Gaetan & Guerrier, Stéphane & Scaillet, Olivier, 2023. "A penalized two-pass regression to predict stock returns with time-varying risk premia," Journal of Econometrics, Elsevier, vol. 237(2).
    19. Shuichi Kawano, 2014. "Selection of tuning parameters in bridge regression models via Bayesian information criterion," Statistical Papers, Springer, vol. 55(4), pages 1207-1223, November.
    20. Peng, Heng & Lu, Ying, 2012. "Model selection in linear mixed effect models," Journal of Multivariate Analysis, Elsevier, vol. 109(C), pages 109-129.

    More about this item

    Keywords

    integer programming; statistics;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:oropre:v:64:y:2016:i:1:p:2-16. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.