IDEAS home Printed from
   My bibliography  Save this paper

Sparse Linear Models and Two-Stage Estimation in High-Dimensional Settings with Possibly Many Endogenous Regressors


  • Zhu, Ying


This paper explores the validity of the two-stage estimation procedure for sparse linear models in high-dimensional settings with possibly many endogenous regressors. In particular, the number of endogenous regressors in the main equation and the instruments in the first-stage equations can grow with and exceed the sample size n. The analysis concerns the exact sparsity case, i.e., the maximum number of non-zero components in the vectors of parameters in the first-stage equations, k1, and the number of non-zero components in the vector of parameters in the second-stage equation, k2, are allowed to grow with n but slowly compared to n. I consider the high-dimensional version of the two-stage least square estimator where one obtains the fitted regressors from the first-stage regression by a least square estimator with l_1-regularization (the Lasso or Dantzig selector) when the first-stage regression concerns a large number of instruments relative to n, and then construct a similar estimator using these fitted regressors in the second-stage regression. The main theoretical results of this paper are non-asymptotic bounds from which I establish sufficient scaling conditions on the sample size for estimation consistency in l_2-norm and variable-selection consistency (i.e., the two-stage high-dimensional estimators correctly select the non-zero coefficients in the main equation with high probability). A technical issue regarding the so-called "restricted eigenvalue (RE) condition" for estimation consistency and the "mutual incoherence (MI) condition" for selection consistency arises in the two-stage estimation from allowing the number of regressors in the main equation to exceed n and this paper provides analysis to verify these RE and MI conditions. Depending on the underlying assumptions that are imposed, the upper bounds on the l_2-error and the sample size required to obtain these consistency results differ by factors involving k1 and/or k2. Simulations are conducted to gain insight on the finite sample performance of the high-dimensional two-stage estimator.

Suggested Citation

  • Zhu, Ying, 2013. "Sparse Linear Models and Two-Stage Estimation in High-Dimensional Settings with Possibly Many Endogenous Regressors," MPRA Paper 49846, University Library of Munich, Germany.
  • Handle: RePEc:pra:mprapa:49846

    Download full text from publisher

    File URL:
    File Function: original version
    Download Restriction: no

    File URL:
    File Function: revised version
    Download Restriction: no

    References listed on IDEAS

    1. Carrasco, Marine, 2012. "A regularization approach to the many instruments problem," Journal of Econometrics, Elsevier, vol. 170(2), pages 383-398.
    2. Nevo, Aviv, 2001. "Measuring Market Power in the Ready-to-Eat Cereal Industry," Econometrica, Econometric Society, vol. 69(2), pages 307-342, March.
    3. Carrasco, Marine & Florens, Jean-Pierre, 2000. "Generalization Of Gmm To A Continuum Of Moment Conditions," Econometric Theory, Cambridge University Press, vol. 16(06), pages 797-834, December.
    4. A. Belloni & D. Chen & V. Chernozhukov & C. Hansen, 2012. "Sparse Models and Methods for Optimal Instruments With an Application to Eminent Domain," Econometrica, Econometric Society, vol. 80(6), pages 2369-2429, November.
    5. Sala-i-Martin, Xavier, 1997. "I Just Ran Two Million Regressions," American Economic Review, American Economic Association, vol. 87(2), pages 178-183, May.
    6. Eric Gautier & Alexandre Tsybakov, 2011. "High-Dimensional Instrumental Variables Regression and Confidence Sets," Working Papers 2011-13, Center for Research in Economics and Statistics.
    7. Amemiya, Takeshi, 1974. "The nonlinear two-stage least-squares estimator," Journal of Econometrics, Elsevier, vol. 2(2), pages 105-110, July.
    8. C. Lanier Benkard & Patrick Bajari, 2005. "Hedonic Price Indexes With Unobserved Product Characteristics, and Application to Personal Computers," Journal of Business & Economic Statistics, American Statistical Association, vol. 23, pages 61-75, January.
    9. Pradeep Ravikumar & John Lafferty & Han Liu & Larry Wasserman, 2009. "Sparse additive models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(5), pages 1009-1030.
    10. Hansen, Christian & Hausman, Jerry & Newey, Whitney, 2008. "Estimation With Many Instrumental Variables," Journal of Business & Economic Statistics, American Statistical Association, vol. 26, pages 398-422.
    11. Fan J. & Li R., 2001. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1348-1360, December.
    12. A. Belloni & V. Chernozhukov & L. Wang, 2011. "Square-root lasso: pivotal recovery of sparse signals via conic programming," Biometrika, Biometrika Trust, vol. 98(4), pages 791-806.
    13. Garen, John, 1984. "The Returns to Schooling: A Selectivity Bias Approach with a Continuous Choice Variable," Econometrica, Econometric Society, vol. 52(5), pages 1199-1218, September.
    14. Joshua D. Angrist & Alan B. Keueger, 1991. "Does Compulsory School Attendance Affect Schooling and Earnings?," The Quarterly Journal of Economics, Oxford University Press, vol. 106(4), pages 979-1014.
    15. Caner, Mehmet, 2009. "Lasso-Type Gmm Estimator," Econometric Theory, Cambridge University Press, vol. 25(01), pages 270-290, February.
    16. Jianqing Fan & Jinchi Lv & Lei Qi, 2011. "Sparse High-Dimensional Models in Economics," Annual Review of Economics, Annual Reviews, vol. 3(1), pages 291-317, September.
    17. Berry, Steven & Levinsohn, James & Pakes, Ariel, 1995. "Automobile Prices in Market Equilibrium," Econometrica, Econometric Society, vol. 63(4), pages 841-890, July.
    Full references (including those not matched with items on IDEAS)

    More about this item


    High-dimensional statistics; Lasso; sparse linear models; endogeneity; two-stage estimation;

    JEL classification:

    • C1 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General
    • C13 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Estimation: General
    • C31 - Mathematical and Quantitative Methods - - Multiple or Simultaneous Equation Models; Multiple Variables - - - Cross-Sectional Models; Spatial Models; Treatment Effect Models; Quantile Regressions; Social Interaction Models
    • C36 - Mathematical and Quantitative Methods - - Multiple or Simultaneous Equation Models; Multiple Variables - - - Instrumental Variables (IV) Estimation

    NEP fields

    This paper has been announced in the following NEP Reports:


    Access and download statistics


    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:pra:mprapa:49846. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Joachim Winter). General contact details of provider: .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.