IDEAS home Printed from https://ideas.repec.org/a/eee/econom/v249y2025ipas0304407624000198.html
   My bibliography  Save this article

Fast inference for quantile regression with tens of millions of observations

Author

Listed:
  • Lee, Sokbae
  • Liao, Yuan
  • Seo, Myung Hwan
  • Shin, Youngki

Abstract

Big data analytics has opened new avenues in economic research, but the challenge of analyzing datasets with tens of millions of observations is substantial. Conventional econometric methods based on extreme estimators require large amounts of computing resources and memory, which are often not readily available. In this paper, we focus on linear quantile regression applied to “ultra-large” datasets, such as U.S. decennial censuses. A fast inference framework is presented, utilizing stochastic subgradient descent (S-subGD) updates. The inference procedure handles cross-sectional data sequentially: (i) updating the parameter estimate with each incoming “new observation”, (ii) aggregating it as a Polyak–Ruppert average, and (iii) computing a pivotal statistic for inference using only a solution path. The methodology draws from time-series regression to create an asymptotically pivotal statistic through random scaling. Our proposed test statistic is calculated in a fully online fashion and critical values are calculated without resampling. We conduct extensive numerical studies to showcase the computational merits of our proposed inference. For inference problems as large as (n,d)∼(107,103), where n is the sample size and d is the number of regressors, our method generates new insights, surpassing current inference methods in computation. Our method specifically reveals trends in the gender gap in the U.S. college wage premium using millions of observations, while controlling over 103 covariates to mitigate confounding effects.

Suggested Citation

  • Lee, Sokbae & Liao, Yuan & Seo, Myung Hwan & Shin, Youngki, 2025. "Fast inference for quantile regression with tens of millions of observations," Journal of Econometrics, Elsevier, vol. 249(PA).
  • Handle: RePEc:eee:econom:v:249:y:2025:i:pa:s0304407624000198
    DOI: 10.1016/j.jeconom.2024.105673
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0304407624000198
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.jeconom.2024.105673?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to

    for a different version of it.

    References listed on IDEAS

    as
    1. Claudia Goldin & Lawrence F. Katz & Ilyana Kuziemko, 2006. "The Homecoming of American College Women: The Reversal of the College Gender Gap," Journal of Economic Perspectives, American Economic Association, vol. 20(4), pages 133-156, Fall.
    2. Victor Chernozhukov & Iván Fernández‐Val & Blaise Melly, 2013. "Inference on Counterfactual Distributions," Econometrica, Econometric Society, vol. 81(6), pages 2205-2268, November.
    3. Roger Koenker, 2017. "Quantile Regression: 40 Years On," Annual Review of Economics, Annual Reviews, vol. 9(1), pages 155-176, September.
    4. Joel L. Horowitz, 1998. "Bootstrap Methods for Median Regression Models," Econometrica, Econometric Society, vol. 66(6), pages 1327-1352, November.
    5. Sokbae Lee & Serena Ng, 2020. "An Econometric Perspective on Algorithmic Subsampling," Annual Review of Economics, Annual Reviews, vol. 12(1), pages 45-80, August.
    6. Marcelo Fernandes & Emmanuel Guerre & Eduardo Horta, 2021. "Smoothing Quantile Regressions," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 39(1), pages 338-357, January.
    7. Yixiao Sun & Peter C. B. Phillips & Sainan Jin, 2008. "Optimal Bandwidth Selection in Heteroskedasticity-Autocorrelation Robust Testing," Econometrica, Econometric Society, vol. 76(1), pages 175-194, January.
    8. Karim M. Abadir & Paolo Paruolo, 2002. "Simple Robust Testing of Regression Hypotheses: A Comment," Econometrica, Econometric Society, vol. 70(5), pages 2097-2099, September.
    9. Kean Ming Tan & Lan Wang & Wen‐Xin Zhou, 2022. "High‐dimensional quantile regression: Convolution smoothing and concave regularization," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(1), pages 205-233, February.
    10. Kasey S. Buckles & Daniel M. Hungerman, 2013. "Season of Birth and Later Outcomes: Old Questions, New Answers," The Review of Economics and Statistics, MIT Press, vol. 95(3), pages 711-724, July.
    11. Pierre-André Chiappori & Murat Iyigun & Yoram Weiss, 2009. "Investment in Schooling and the Marriage Market," American Economic Review, American Economic Association, vol. 99(5), pages 1689-1713, December.
    12. Moshe Buchinsky, 1998. "Recent Advances in Quantile Regression Models: A Practical Guideline for Empirical Research," Journal of Human Resources, University of Wisconsin Press, vol. 33(1), pages 88-126.
    13. Roger Koenker, 2017. "Quantile regression 40 years on," CeMMAP working papers CWP36/17, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    14. Victor Chernozhukov & Iván Fernández-Val & Blaise Melly, 2022. "Fast algorithms for the quantile regression process," Empirical Economics, Springer, vol. 62(1), pages 7-33, January.
    15. Nicholas M. Kiefer & Timothy J. Vogelsang & Helle Bunzel, 2000. "Simple Robust Testing of Regression Hypotheses," Econometrica, Econometric Society, vol. 68(3), pages 695-714, May.
    16. Koenker,Roger, 2005. "Quantile Regression," Cambridge Books, Cambridge University Press, number 9780521845731, May.
    17. He, Xuming & Pan, Xiaoou & Tan, Kean Ming & Zhou, Wen-Xin, 2023. "Smoothed quantile regression with large-scale inference," Journal of Econometrics, Elsevier, vol. 232(2), pages 367-388.
    18. Jean-Jacques Forneron, 2022. "Estimation and Inference by Stochastic Optimization," Papers 2205.03254, arXiv.org.
    19. Jean-Jacques Forneron & Serena Ng, 2021. "Estimation and Inference by Stochastic Optimization: Three Examples," AEA Papers and Proceedings, American Economic Association, vol. 111, pages 626-630, May.
    20. Sokbae Lee & Serena Ng, 2020. "Least Squares Estimation Using Sketched Data with Heteroskedastic Errors," Papers 2007.07781, arXiv.org, revised Jun 2022.
    21. Chen, Le-Yu & Lee, Sokbae, 2023. "Sparse quantile regression," Journal of Econometrics, Elsevier, vol. 235(2), pages 2195-2217.
    22. Amanda Gosling & Stephen Machin & Costas Meghir, 2000. "The Changing Distribution of Male Wages in the U.K," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 67(4), pages 635-666.
    23. Eben Lazarus & Daniel J. Lewis & James H. Stock & Mark W. Watson, 2018. "HAR Inference: Recommendations for Practice," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 36(4), pages 541-559, October.
    24. Koenker, Roger W & Bassett, Gilbert, Jr, 1978. "Regression Quantiles," Econometrica, Econometric Society, vol. 46(1), pages 33-50, January.
    25. Eben Lazarus & Daniel J. Lewis & James H. Stock & Mark W. Watson, 2018. "HAR Inference: Recommendations for Practice Rejoinder," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 36(4), pages 574-575, October.
    26. Gadat, Sébastien & Panloup, Fabien, 2023. "Optimal non-asymptotic analysis of the Ruppert–Polyak averaging stochastic algorithm," Stochastic Processes and their Applications, Elsevier, vol. 156(C), pages 312-348.
    27. Joshua Angrist & Victor Chernozhukov & Iván Fernández-Val, 2006. "Quantile Regression under Misspecification, with an Application to the U.S. Wage Structure," Econometrica, Econometric Society, vol. 74(2), pages 539-563, March.
    28. Haiying Wang & Yanyuan Ma, 2021. "Optimal subsampling for quantile regression in big data," Biometrika, Biometrika Trust, vol. 108(1), pages 99-112.
    29. Johansen, Soren, 1991. "Estimation and Hypothesis Testing of Cointegration Vectors in Gaussian Vector Autoregressive Models," Econometrica, Econometric Society, vol. 59(6), pages 1551-1580, November.
    30. Yixiao Sun, 2014. "Fixed‐Smoothing Asymptotics in a Two‐Step Generalized Method of Moments Framework," Econometrica, Econometric Society, vol. 82, pages 2327-2370, November.
    31. Sébastien Gadat & Fabien Panloup, 2023. "Optimal non-asymptotic analysis of the Ruppert-Polyak averaging stochastic algorithm," Post-Print hal-03947026, HAL.
    32. Roger Koenker, 2017. "Quantile regression 40 years on," CeMMAP working papers 36/17, Institute for Fiscal Studies.
    33. Karim M. Abadir & Paolo Paruolo, 1997. "Two Mixed Normal Densities from Cointegration Analysis," Econometrica, Econometric Society, vol. 65(3), pages 671-680, May.
    34. Wanrong Zhu & Xi Chen & Wei Biao Wu, 2023. "Online Covariance Matrix Estimation in Stochastic Gradient Descent," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 118(541), pages 393-404, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Sokbae Lee & Yuan Liao & Myung Hwan Seo & Youngki Shin, 2022. "Fast Inference for Quantile Regression with Tens of Millions of Observations," Papers 2209.14502, arXiv.org, revised Oct 2023.
    2. Chen, Le-Yu & Lee, Sokbae, 2023. "Sparse quantile regression," Journal of Econometrics, Elsevier, vol. 235(2), pages 2195-2217.
    3. Chen, Zhao & Cheng, Vivian Xinyi & Liu, Xu, 2024. "Reprint: Hypothesis testing on high dimensional quantile regression," Journal of Econometrics, Elsevier, vol. 239(2).
    4. Xie, Jinhan & Yan, Xiaodong & Jiang, Bei & Kong, Linglong, 2025. "Statistical inference for smoothed quantile regression with streaming data," Journal of Econometrics, Elsevier, vol. 249(PA).
    5. Chen, Zhao & Cheng, Vivian Xinyi & Liu, Xu, 2024. "Hypothesis testing on high dimensional quantile regression," Journal of Econometrics, Elsevier, vol. 238(1).
    6. Wen, Jiawei & Yang, Songshan & Wang, Christina Dan & Jiang, Yifan & Li, Runze, 2025. "Feature-splitting algorithms for ultrahigh dimensional quantile regression," Journal of Econometrics, Elsevier, vol. 249(PA).
    7. de Castro, Luciano & Galvao, Antonio F. & Kaplan, David M. & Liu, Xin, 2019. "Smoothed GMM for quantile models," Journal of Econometrics, Elsevier, vol. 213(1), pages 121-144.
    8. Hirukawa, Masayuki, 2023. "Robust Covariance Matrix Estimation in Time Series: A Review," Econometrics and Statistics, Elsevier, vol. 27(C), pages 36-61.
    9. Sun, Zhaoyang & Liu, Ling & Pan, Runquan & Wang, Yiwei & Zhang, Bingbing, 2025. "Tourism and economic growth: The role of institutional quality," International Review of Economics & Finance, Elsevier, vol. 98(C).
    10. Jungbin Hwang & Gonzalo Valdés, 2025. "HAR Inference for Quantile Regression in Time Series," Working papers 2025-03, University of Connecticut, Department of Economics.
    11. Shu, Lei & Hao, Yifan & Chen, Yu & Yang, Qing, 2025. "SFQRA: Scaled factor-augmented quantile regression with aggregation in conditional mean forecasting," Journal of Multivariate Analysis, Elsevier, vol. 207(C).
    12. de Castro, Luciano & Galvao, Antonio F. & Kaplan, David M. & Liu, Xin, 2019. "Smoothed GMM for quantile models," Journal of Econometrics, Elsevier, vol. 213(1), pages 121-144.
    13. Xiaohong Chen & Sokbae Lee & Yuan Liao & Myung Hwan Seo & Youngki Shin & Myunghyun Song, 2023. "SGMM: Stochastic Approximation to Generalized Method of Moments," Papers 2308.13564, arXiv.org, revised Oct 2023.
    14. Victor Chernozhukov & Iván Fernández‐Val & Blaise Melly, 2013. "Inference on Counterfactual Distributions," Econometrica, Econometric Society, vol. 81(6), pages 2205-2268, November.
    15. Wiji Arulampalam & Alison Booth & Mark Bryan, 2010. "Are there asymmetries in the effects of training on the conditional male wage distribution?," Journal of Population Economics, Springer;European Society for Population Economics, vol. 23(1), pages 251-272, January.
    16. Firpo, Sergio & Galvao, Antonio F. & Pinto, Cristine & Poirier, Alexandre & Sanroman, Graciela, 2022. "GMM quantile regression," Journal of Econometrics, Elsevier, vol. 230(2), pages 432-452.
    17. Fan, Ye & Lin, Nan, 2025. "Sequential quantile regression for stream data by least squares," Journal of Econometrics, Elsevier, vol. 249(PA).
    18. Graham, Bryan S. & Hahn, Jinyong & Poirier, Alexandre & Powell, James L., 2018. "A quantile correlated random coefficients panel data model," Journal of Econometrics, Elsevier, vol. 206(2), pages 305-335.
    19. Chong-Chuo Chang & Oshamah Lin Lin & Oshamah Yu-Cheng Chang & Oshamah Kun-Zhan Hsu, 2023. "Impact of Financial Liberalization on Firm Risk," Advances in Decision Sciences, Asia University, Taiwan, vol. 27(3), pages 14-45, September.
    20. Alejo, Javier & Galvao, Antonio F. & Martinez-Iriarte, Julian & Montes-Rojas, Gabriel, 2025. "Unconditional quantile partial effects via conditional quantile regression," Journal of Econometrics, Elsevier, vol. 249(PA).

    More about this item

    Keywords

    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:econom:v:249:y:2025:i:pa:s0304407624000198. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/jeconom .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.