IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2511.16187.html

Quantile Selection in the Gender Pay Gap

Author

Listed:
  • Egshiglen Batbayar
  • Christoph Breunig
  • Peter Haan
  • Boryana Ilieva

Abstract

We propose a new approach to estimate selection-corrected quantiles of the gender wage gap. Our method employs instrumental variables that explain variation in the latent variable but, conditional on the latent process, do not directly affect selection. We provide semiparametric identification of the quantile parameters without imposing parametric restrictions on the selection probability, derive the asymptotic distribution of the proposed estimator based on constrained selection probability weighting, and demonstrate how the approach applies to the Roy model of labor supply. Using German administrative data, we analyze the distribution of the gender gap in full-time earnings. We find pronounced positive selection among women at the lower end, especially those with less education, which widens the gender gap in this segment, and strong positive selection among highly educated men at the top, which narrows the gender wage gap at upper quantiles.

Suggested Citation

  • Egshiglen Batbayar & Christoph Breunig & Peter Haan & Boryana Ilieva, 2025. "Quantile Selection in the Gender Pay Gap," Papers 2511.16187, arXiv.org, revised Jan 2026.
  • Handle: RePEc:arx:papers:2511.16187
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2511.16187
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Severini, Thomas A. & Tripathi, Gautam, 2012. "Efficiency bounds for estimating linear functionals of nonparametric regression models with endogenous regressors," Journal of Econometrics, Elsevier, vol. 170(2), pages 491-498.
    2. Heckman, James J, 1974. "Shadow Prices, Market Wages, and Labor Supply," Econometrica, Econometric Society, vol. 42(4), pages 679-694, July.
    3. Chen, Songnian & Khan, Shakeeb, 2003. "Semiparametric Estimation Of A Heteroskedastic Sample Selection Model," Econometric Theory, Cambridge University Press, vol. 19(6), pages 1040-1064, December.
    4. Esfandiar Maasoumi & Le Wang, 2019. "The Gender Gap between Earnings Distributions," Journal of Political Economy, University of Chicago Press, vol. 127(5), pages 2438-2504.
    5. Wolfgang Dauth & Johann Eppelsheimer, 2020. "Preparing the sample of integrated labour market biographies (SIAB) for scientific analysis: a guide," Journal for Labour Market Research, Springer;Institute for Employment Research/ Institut für Arbeitsmarkt- und Berufsforschung (IAB), vol. 54(1), pages 1-14, December.
    6. Breunig, Christoph & Mammen, Enno & Simoni, Anna, 2018. "Nonparametric estimation in case of endogenous selection," Journal of Econometrics, Elsevier, vol. 202(2), pages 268-285.
    7. Esmeralda A. Ramalho & Richard J. Smith, 2013. "Discrete Choice Non-Response," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 80(1), pages 343-364.
    8. Dauth, Wolfgang & Eppelsheimer, Johann, 2020. "Preparing the sample of integrated labour market biographies (SIAB) for scientific analysis," Journal for Labour Market Research, Institut für Arbeitsmarkt- und Berufsforschung (IAB), Nürnberg [Institute for Employment Research, Nuremberg, Germany], vol. 54(1), pages 1-10.
    9. Peter Hall & Jeff Racine & Qi Li, 2004. "Cross-Validation and the Estimation of Conditional Probability Densities," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 1015-1026, December.
    10. Koenker, Roger W & Bassett, Gilbert, Jr, 1978. "Regression Quantiles," Econometrica, Econometric Society, vol. 46(1), pages 33-50, January.
    11. Gronau, Reuben, 1974. "Wage Comparisons-A Selectivity Bias," Journal of Political Economy, University of Chicago Press, vol. 82(6), pages 1119-1143, Nov.-Dec..
    12. Gong Tang, 2003. "Analysis of multivariate missing data with nonignorable nonresponse," Biometrika, Biometrika Trust, vol. 90(4), pages 747-764, December.
    13. Christoph Breunig & Xiaohong Chen, 2020. "Adaptive, Rate-Optimal Hypothesis Testing in Nonparametric IV Models," Papers 2006.09587, arXiv.org, revised Nov 2024.
    14. Donald, Stephen G., 1995. "Two-step estimation of heteroskedastic sample selection models," Journal of Econometrics, Elsevier, vol. 65(2), pages 347-380, February.
    15. Jiwei Zhao & Jun Shao, 2015. "Semiparametric Pseudo-Likelihoods in Generalized Linear Models With Nonignorable Missing Data," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(512), pages 1577-1590, December.
    16. A. D. Roy, 1951. "Some Thoughts On The Distribution Of Earnings," Oxford Economic Papers, Oxford University Press, vol. 3(2), pages 135-146.
    17. Ahn, Hyungtaik & Powell, James L., 1993. "Semiparametric estimation of censored selection models with a nonparametric selection mechanism," Journal of Econometrics, Elsevier, vol. 58(1-2), pages 3-29, July.
    18. Mitali Das & Whitney K. Newey & Francis Vella, 2003. "Nonparametric Estimation of Sample Selection Models," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 70(1), pages 33-58.
    19. repec:mpr:mprres:8160 is not listed on IDEAS
    20. Bo E. Honoré & Luojia Hu, 2020. "Selection Without Exclusion," Econometrica, Econometric Society, vol. 88(3), pages 1007-1029, May.
    21. Devereux, Paul J., 2002. "The Importance of Obtaining a High-Paying Job," MPRA Paper 49326, University Library of Munich, Germany.
    22. James Heckman, 2013. "Sample selection bias as a specification error," Applied Econometrics, Russian Presidential Academy of National Economy and Public Administration (RANEPA), vol. 31(3), pages 129-137.
    23. Christoph Breunig & Xiaohong Chen, 2024. "Adaptive, Rate‐Optimal Hypothesis Testing in Nonparametric IV Models," Econometrica, Econometric Society, vol. 92(6), pages 2027-2067, November.
    24. Victor Chernozhukov & Christian Hansen, 2005. "An IV Model of Quantile Treatment Effects," Econometrica, Econometric Society, vol. 73(1), pages 245-261, January.
    25. Hannes Schwandt & Till von Wachter, 2019. "Unlucky Cohorts: Estimating the Long-Term Effects of Entering the Labor Market in a Recession in Large Cross-Sectional Data Sets," Journal of Labor Economics, University of Chicago Press, vol. 37(S1), pages 161-198.
    26. Boryana Ilieva & Katharina Wrohlich, 2022. "Gender Gaps in Employment, Working Hours and Wages in Germany: Trends and Developments Over the Last 35 Years," CESifo Forum, ifo Institute - Leibniz Institute for Economic Research at the University of Munich, vol. 23(02), pages 17-19, March.
    27. Racine, Jeff & Li, Qi, 2004. "Nonparametric estimation of regression functions with both categorical and continuous data," Journal of Econometrics, Elsevier, vol. 119(1), pages 99-130, March.
    28. Philip Oreopoulos & Till von Wachter & Andrew Heisz, 2012. "The Short- and Long-Term Career Effects of Graduating in a Recession," American Economic Journal: Applied Economics, American Economic Association, vol. 4(1), pages 1-29, January.
    29. Francine D. Blau & Lawrence M. Kahn & Nikolai Boboshko & Matthew Comey, 2024. "The Impact of Selection into the Labor Force on the Gender Wage Gap," Journal of Labor Economics, University of Chicago Press, vol. 42(4), pages 1093-1133.
    30. repec:iab:iabjlr:v:54:i:1:p:art.10 is not listed on IDEAS
    31. Xiaohong Chen & Timothy M. Christensen, 2018. "Optimal sup‐norm rates and uniform inference on nonlinear functionals of nonparametric IV regression," Quantitative Economics, Econometric Society, vol. 9(1), pages 39-84, March.
    32. Heckman, James J, 1991. "Identifying the Hand of the Past: Distinguishing State Dependence from Heterogeneity," American Economic Review, American Economic Association, vol. 81(2), pages 75-79, May.
    33. Aiai Yu & Yujie Zhong & Xingdong Feng & Ying Wei, 2023. "Quantile regression for nonignorable missing data with its application of analyzing electronic medical records," Biometrics, The International Biometric Society, vol. 79(3), pages 2036-2049, September.
    34. Manuel Arellano & Stéphane Bonhomme, 2017. "Quantile Selection Models With an Application to Understanding Changes in Wage Inequality," Econometrica, Econometric Society, vol. 85, pages 1-28, January.
    35. Pollard, David, 1991. "Asymptotics for Least Absolute Deviation Regression Estimators," Econometric Theory, Cambridge University Press, vol. 7(2), pages 186-199, June.
    36. Claudia Goldin, 2014. "A Grand Gender Convergence: Its Last Chapter," American Economic Review, American Economic Association, vol. 104(4), pages 1091-1119, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Huber, Martin & Melly, Blaise, 2011. "Quantile Regression in the Presence of Sample Selection," Economics Working Paper Series 1109, University of St. Gallen, School of Economics and Political Science.
    2. Liu, Ruixuan & Yu, Zhengfei, 2022. "Sample selection models with monotone control functions," Journal of Econometrics, Elsevier, vol. 226(2), pages 321-342.
    3. D’Haultfœuille, Xavier & Maurel, Arnaud & Zhang, Yichong, 2018. "Extremal quantile regressions for selection models and the black–white wage gap," Journal of Econometrics, Elsevier, vol. 203(1), pages 129-142.
    4. Breunig, Christoph & Mammen, Enno & Simoni, Anna, 2018. "Nonparametric estimation in case of endogenous selection," Journal of Econometrics, Elsevier, vol. 202(2), pages 268-285.
    5. Fan Wu & Yi Xin, 2024. "Estimating Nonseparable Selection Models: A Functional Contraction Approach," Papers 2411.01799, arXiv.org, revised Dec 2025.
    6. Patricia Gallego Granados, 2019. "The Part-Time Wage Gap across the Wage Distribution," Discussion Papers of DIW Berlin 1791, DIW Berlin, German Institute for Economic Research.
    7. Qi Li & Jeffrey Scott Racine, 2006. "Nonparametric Econometrics: Theory and Practice," Economics Books, Princeton University Press, edition 1, volume 1, number 8355.
    8. Casey B. Mulligan & Yona Rubinstein, 2004. "The Closing of the Gender Gap as a Roy Model Illusion," NBER Working Papers 10892, National Bureau of Economic Research, Inc.
    9. Martin Huber & Blaise Melly, 2012. "A test of the conditional independence assumption in sample selection models," Working Papers 2012-11, Brown University, Department of Economics.
    10. Manuel Arellano & Stéphane Bonhomme, 2017. "Quantile Selection Models With an Application to Understanding Changes in Wage Inequality," Econometrica, Econometric Society, vol. 85, pages 1-28, January.
    11. Masayuki Hirukawa & Di Liu & Irina Murtazashvili & Artem Prokhorov, 2024. "DS-HECK: double-lasso estimation of Heckman selection model," Advanced Studies in Theoretical and Applied Econometrics, in: Subal C. Kumbhakar & Robin C. Sickles & Hung-Jen Wang (ed.), Advances in Applied Econometrics, pages 711-739, Springer.
    12. Lukáš Lafférs & Bernhard Schmidpeter, 2021. "Early child development and parents' labor supply," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 36(2), pages 190-208, March.
    13. Lewbel, Arthur, 2007. "Endogenous selection or treatment model estimation," Journal of Econometrics, Elsevier, vol. 141(2), pages 777-806, December.
    14. Breunig, Christoph & Haan, Peter, 2021. "Nonparametric regression with selectively missing covariates," Journal of Econometrics, Elsevier, vol. 223(1), pages 28-52.
    15. Zhewen Pan & Yifan Zhang, 2024. "Locally robust semiparametric estimation of sample selection models without exclusion restrictions," Papers 2412.01208, arXiv.org.
    16. Richard Blundell & Amanda Gosling & Hidehiko Ichimura & Costas Meghir, 2007. "Changes in the Distribution of Male and Female Wages Accounting for Employment Composition Using Bounds," Econometrica, Econometric Society, vol. 75(2), pages 323-363, March.
    17. Christoph Breunig & Peter Haan, 2018. "Nonparametric Regression with Selectively Missing Covariates," Papers 1810.00411, arXiv.org, revised Oct 2020.
    18. Martin Huber & Giovanni Mellace, 2014. "Testing exclusion restrictions and additive separability in sample selection models," Empirical Economics, Springer, vol. 47(1), pages 75-92, August.
    19. Elass, Kenza, 2024. "Male and female selection effects on gender wage gaps in three countries," Labour Economics, Elsevier, vol. 87(C).
    20. Iván Fernández‐Val & Aico van Vuuren & Francis Vella & Franco Peracchi, 2023. "Selection and the distribution of female real hourly wages in the United States," Quantitative Economics, Econometric Society, vol. 14(2), pages 571-607, May.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2511.16187. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.