IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2209.14502.html
   My bibliography  Save this paper

Fast Inference for Quantile Regression with Tens of Millions of Observations

Author

Listed:
  • Sokbae Lee
  • Yuan Liao
  • Myung Hwan Seo
  • Youngki Shin

Abstract

Big data analytics has opened new avenues in economic research, but the challenge of analyzing datasets with tens of millions of observations is substantial. Conventional econometric methods based on extreme estimators require large amounts of computing resources and memory, which are often not readily available. In this paper, we focus on linear quantile regression applied to "ultra-large" datasets, such as U.S. decennial censuses. A fast inference framework is presented, utilizing stochastic subgradient descent (S-subGD) updates. The inference procedure handles cross-sectional data sequentially: (i) updating the parameter estimate with each incoming "new observation", (ii) aggregating it as a $\textit{Polyak-Ruppert}$ average, and (iii) computing a pivotal statistic for inference using only a solution path. The methodology draws from time-series regression to create an asymptotically pivotal statistic through random scaling. Our proposed test statistic is calculated in a fully online fashion and critical values are calculated without resampling. We conduct extensive numerical studies to showcase the computational merits of our proposed inference. For inference problems as large as $(n, d) \sim (10^7, 10^3)$, where $n$ is the sample size and $d$ is the number of regressors, our method generates new insights, surpassing current inference methods in computation. Our method specifically reveals trends in the gender gap in the U.S. college wage premium using millions of observations, while controlling over $10^3$ covariates to mitigate confounding effects.

Suggested Citation

  • Sokbae Lee & Yuan Liao & Myung Hwan Seo & Youngki Shin, 2022. "Fast Inference for Quantile Regression with Tens of Millions of Observations," Papers 2209.14502, arXiv.org, revised Oct 2023.
  • Handle: RePEc:arx:papers:2209.14502
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2209.14502
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Yixiao Sun & Peter C. B. Phillips & Sainan Jin, 2008. "Optimal Bandwidth Selection in Heteroskedasticity-Autocorrelation Robust Testing," Econometrica, Econometric Society, vol. 76(1), pages 175-194, January.
    2. Alexandre Belloni & Victor Chernozhukov & Christian Hansen, 2014. "High-Dimensional Methods and Inference on Structural and Treatment Effects," Journal of Economic Perspectives, American Economic Association, vol. 28(2), pages 29-50, Spring.
    3. Claudia Goldin & Lawrence F. Katz & Ilyana Kuziemko, 2006. "The Homecoming of American College Women: The Reversal of the College Gender Gap," Journal of Economic Perspectives, American Economic Association, vol. 20(4), pages 133-156, Fall.
    4. Victor Chernozhukov & Iván Fernández‐Val & Blaise Melly, 2013. "Inference on Counterfactual Distributions," Econometrica, Econometric Society, vol. 81(6), pages 2205-2268, November.
    5. Roger Koenker, 2017. "Quantile Regression: 40 Years On," Annual Review of Economics, Annual Reviews, vol. 9(1), pages 155-176, September.
    6. Joel L. Horowitz, 1998. "Bootstrap Methods for Median Regression Models," Econometrica, Econometric Society, vol. 66(6), pages 1327-1352, November.
    7. Koenker, Roger W & Bassett, Gilbert, Jr, 1978. "Regression Quantiles," Econometrica, Econometric Society, vol. 46(1), pages 33-50, January.
    8. Sokbae Lee & Serena Ng, 2020. "An Econometric Perspective on Algorithmic Subsampling," Annual Review of Economics, Annual Reviews, vol. 12(1), pages 45-80, August.
    9. Karim M. Abadir & Paolo Paruolo, 2002. "Simple Robust Testing of Regression Hypotheses: A Comment," Econometrica, Econometric Society, vol. 70(5), pages 2097-2099, September.
    10. Kean Ming Tan & Lan Wang & Wen‐Xin Zhou, 2022. "High‐dimensional quantile regression: Convolution smoothing and concave regularization," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(1), pages 205-233, February.
    11. Buchinsky, Moshe, 1994. "Changes in the U.S. Wage Structure 1963-1987: Application of Quantile Regression," Econometrica, Econometric Society, vol. 62(2), pages 405-458, March.
    12. Victor Chernozhukov & Iván Fernández-Val & Blaise Melly, 2022. "Fast algorithms for the quantile regression process," Empirical Economics, Springer, vol. 62(1), pages 7-33, January.
    13. Pierre-André Chiappori & Murat Iyigun & Yoram Weiss, 2009. "Investment in Schooling and the Marriage Market," American Economic Review, American Economic Association, vol. 99(5), pages 1689-1713, December.
    14. Nicholas M. Kiefer & Timothy J. Vogelsang & Helle Bunzel, 2000. "Simple Robust Testing of Regression Hypotheses," Econometrica, Econometric Society, vol. 68(3), pages 695-714, May.
    15. Moshe Buchinsky, 1998. "Recent Advances in Quantile Regression Models: A Practical Guideline for Empirical Research," Journal of Human Resources, University of Wisconsin Press, vol. 33(1), pages 88-126.
    16. Eben Lazarus & Daniel J. Lewis & James H. Stock & Mark W. Watson, 2018. "HAR Inference: Recommendations for Practice Rejoinder," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 36(4), pages 574-575, October.
    17. Jean-Jacques Forneron & Serena Ng, 2021. "Estimation and Inference by Stochastic Optimization: Three Examples," AEA Papers and Proceedings, American Economic Association, vol. 111, pages 626-630, May.
    18. Roger Koenker, 2017. "Quantile regression 40 years on," CeMMAP working papers CWP36/17, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    19. Amanda Gosling & Stephen Machin & Costas Meghir, 2000. "The Changing Distribution of Male Wages in the U.K," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 67(4), pages 635-666.
    20. Joshua Angrist & Victor Chernozhukov & Iván Fernández-Val, 2006. "Quantile Regression under Misspecification, with an Application to the U.S. Wage Structure," Econometrica, Econometric Society, vol. 74(2), pages 539-563, March.
    21. Susan Athey & Guido W. Imbens, 2019. "Machine Learning Methods That Economists Should Know About," Annual Review of Economics, Annual Reviews, vol. 11(1), pages 685-725, August.
    22. Athey, Susan & Imbens, Guido W., 2019. "Machine Learning Methods Economists Should Know About," Research Papers 3776, Stanford University, Graduate School of Business.
    23. Johansen, Soren, 1991. "Estimation and Hypothesis Testing of Cointegration Vectors in Gaussian Vector Autoregressive Models," Econometrica, Econometric Society, vol. 59(6), pages 1551-1580, November.
    24. Jean-Jacques Forneron, 2022. "Estimation and Inference by Stochastic Optimization," Papers 2205.03254, arXiv.org.
    25. Yixiao Sun, 2014. "Fixed‐Smoothing Asymptotics in a Two‐Step Generalized Method of Moments Framework," Econometrica, Econometric Society, vol. 82, pages 2327-2370, November.
    26. Sokbae Lee & Serena Ng, 2020. "Least Squares Estimation Using Sketched Data with Heteroskedastic Errors," Papers 2007.07781, arXiv.org, revised Jun 2022.
    27. Karim M. Abadir & Paolo Paruolo, 1997. "Two Mixed Normal Densities from Cointegration Analysis," Econometrica, Econometric Society, vol. 65(3), pages 671-680, May.
    28. Eben Lazarus & Daniel J. Lewis & James H. Stock & Mark W. Watson, 2018. "HAR Inference: Recommendations for Practice," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 36(4), pages 541-559, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Victor Chernozhukov & Iván Fernández‐Val & Blaise Melly, 2013. "Inference on Counterfactual Distributions," Econometrica, Econometric Society, vol. 81(6), pages 2205-2268, November.
    2. Hirukawa, Masayuki, 2023. "Robust Covariance Matrix Estimation in Time Series: A Review," Econometrics and Statistics, Elsevier, vol. 27(C), pages 36-61.
    3. Ruofan Xu & Jiti Gao & Tatsushi Oka & Yoon-Jae Whang, 2022. "Estimation of Heterogeneous Treatment Effects Using Quantile Regression with Interactive Fixed Effects," Monash Econometrics and Business Statistics Working Papers 13/22, Monash University, Department of Econometrics and Business Statistics.
    4. Yu-Yen Ku & Tze-Yu Yen, 2016. "Heterogeneous Effect of Financial Leverage on Corporate Performance: A Quantile Regression Analysis of Taiwanese Companies," Review of Pacific Basin Financial Markets and Policies (RPBFMP), World Scientific Publishing Co. Pte. Ltd., vol. 19(03), pages 1-33, September.
    5. Firpo, Sergio & Galvao, Antonio F. & Pinto, Cristine & Poirier, Alexandre & Sanroman, Graciela, 2022. "GMM quantile regression," Journal of Econometrics, Elsevier, vol. 230(2), pages 432-452.
    6. Graham, Bryan S. & Hahn, Jinyong & Poirier, Alexandre & Powell, James L., 2018. "A quantile correlated random coefficients panel data model," Journal of Econometrics, Elsevier, vol. 206(2), pages 305-335.
    7. de Castro, Luciano & Galvao, Antonio F. & Kaplan, David M. & Liu, Xin, 2019. "Smoothed GMM for quantile models," Journal of Econometrics, Elsevier, vol. 213(1), pages 121-144.
    8. Casini, Alessandro, 2023. "Theory of evolutionary spectra for heteroskedasticity and autocorrelation robust inference in possibly misspecified and nonstationary models," Journal of Econometrics, Elsevier, vol. 235(2), pages 372-392.
    9. Daniel Pollmann & Thomas Dohmen & Franz Palm, 2020. "Robust Estimation of Wage Dispersion with Censored Data: An Application to Occupational Earnings Risk and Risk Attitudes," De Economist, Springer, vol. 168(4), pages 519-540, December.
    10. Jayeeta Bhattacharya, 2020. "Quantile regression with generated dependent variable and covariates," Papers 2012.13614, arXiv.org.
    11. Liang Chen & Juan J. Dolado & Jesús Gonzalo, 2021. "Quantile Factor Models," Econometrica, Econometric Society, vol. 89(2), pages 875-910, March.
    12. Pellatt, Daniel F. & Sun, Yixiao, 2023. "Asymptotic F test in regressions with observations collected at high frequency over long span," Journal of Econometrics, Elsevier, vol. 235(2), pages 1281-1309.
    13. Thomschke, Lorenz, 2015. "Changes in the distribution of rental prices in Berlin," Regional Science and Urban Economics, Elsevier, vol. 51(C), pages 88-100.
    14. Martinez-Sanchis, Elena & Mora, Juan & Kandemir, Ilker, 2012. "Counterfactual distributions of wages via quantile regression with endogeneity," Computational Statistics & Data Analysis, Elsevier, vol. 56(11), pages 3212-3229.
    15. Knud MUNK, 2010. "Optimal Border Taxes in Developing Countries: On the Importance of a Large Informal Sector," EcoMod2010 259600119, EcoMod.
    16. Francisco J. Delgado, 2021. "On the Determinants of Fiscal Decentralization: Evidence From the EU," The AMFITEATRU ECONOMIC journal, Academy of Economic Studies - Bucharest, Romania, vol. 23(56), pages 206-206, February.
    17. Jungbin Hwang & Gonzalo Valdés, 2020. "Finite-sample Corrected Inference for Two-step GMM in Time Series," Working papers 2020-02, University of Connecticut, Department of Economics.
    18. Pellatt , Daniel & Sun, Yixiao, 2020. "Asymptotic F test in Regressions with Observations Collected at High Frequency over Long Span," University of California at San Diego, Economics Working Paper Series qt19f0d9wz, Department of Economics, UC San Diego.
    19. Stacy, Brian, 2014. "Left with Bias? Quantile Regression with Measurement Error in Left Hand Side Variables," EconStor Preprints 104744, ZBW - Leibniz Information Centre for Economics.
    20. de Castro, Luciano & Galvao, Antonio F. & Kaplan, David M. & Liu, Xin, 2019. "Smoothed GMM for quantile models," Journal of Econometrics, Elsevier, vol. 213(1), pages 121-144.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2209.14502. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.