IDEAS home Printed from https://ideas.repec.org/p/hal/cesptp/halshs-03325118.html
   My bibliography  Save this paper

Biases on variances estimated on large data-sets

Author

Listed:
  • François Gardes

    (CES - Centre d'économie de la Sorbonne - UP1 - Université Paris 1 Panthéon-Sorbonne - CNRS - Centre National de la Recherche Scientifique, UP1 - Université Paris 1 Panthéon-Sorbonne, PSE - Paris School of Economics - ENPC - École des Ponts ParisTech - ENS Paris - École normale supérieure - Paris - PSL - Université Paris sciences et lettres - UP1 - Université Paris 1 Panthéon-Sorbonne - CNRS - Centre National de la Recherche Scientifique - EHESS - École des hautes études en sciences sociales - INRAE - Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement, UCO - Université Catholique de l'Ouest)

Abstract

The inverse dependency of the estimated variances over the sample size throws a fundamental question on the validity of the usual statistical methodology, since any hypothesis on the value of a coefficient can be tested negatively by increasing the size of the data-set. I suppose that large data-sets are characterized by a concentration of information on homogenous sub-populations, a spatial autocorrelation of the error terms and the covariates may bias the estimation of variances. Using the corrections of variances under spatial autocorrelation, we obtain variances comparable to an estimation on sub-samples (named efficient sub-samples) the sizes of which are sufficient to contain the information which gives rise to similar estimates to those obtained on the whole population. Moreover, the estimation on efficient data-sets does not necessitate the specification of the spatial autocorrelations which are supposed to bias the estimated variances.

Suggested Citation

  • François Gardes, 2021. "Biases on variances estimated on large data-sets," Université Paris1 Panthéon-Sorbonne (Post-Print and Working Papers) halshs-03325118, HAL.
  • Handle: RePEc:hal:cesptp:halshs-03325118
    Note: View the original document on HAL open archive server: https://halshs.archives-ouvertes.fr/halshs-03325118
    as

    Download full text from publisher

    File URL: https://halshs.archives-ouvertes.fr/halshs-03325118/document
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. MacKinnon , James G., 2015. "Wild Cluster Bootstrap Confidence Intervals," L'Actualité Economique, Société Canadienne de Science Economique, vol. 91(1-2), pages 11-33, Mars-Juin.
    2. Adrian C. Darnell & J. L. Evans, 1990. "The Limits of Econometrics," Books, Edward Elgar Publishing, number 119.
    3. McCloskey, Donald N, 1985. "The Loss Function Has Been Mislaid: The Rhetoric of Significance Tests," American Economic Review, American Economic Association, vol. 75(2), pages 201-205, May.
    4. Greenwald, Bruce C., 1983. "A general analysis of bias in the estimated standard errors of least squares coefficients," Journal of Econometrics, Elsevier, vol. 22(3), pages 323-338, August.
    5. Moulton, Brent R, 1987. "Diagnostics for Group Effects in Regression Analysis," Journal of Business & Economic Statistics, American Statistical Association, vol. 5(2), pages 275-282, April.
    6. Moulton, Brent R., 1986. "Random group effects and the precision of regression estimates," Journal of Econometrics, Elsevier, vol. 32(3), pages 385-397, August.
    7. Moulton, Brent R, 1990. "An Illustration of a Pitfall in Estimating the Effects of Aggregate Variables on Micro Unit," The Review of Economics and Statistics, MIT Press, vol. 72(2), pages 334-338, May.
    8. François Gardes, 2019. "The Estimation of Price Elasticities and the Value of Time in a Domestic Production Framework: an Application using French Micro-Data," Annals of Economics and Statistics, GENES, issue 135, pages 89-120.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. François Gardes, 2021. "Biases on variances estimated on large data-sets," Documents de travail du Centre d'Economie de la Sorbonne 21022, Université Panthéon-Sorbonne (Paris 1), Centre d'Economie de la Sorbonne.
    2. François Gardes, 2021. "Biases on variances estimated on large data-sets," Post-Print halshs-03325118, HAL.
    3. David G. Blanchflower & Andrew Oswald, 1995. "International Wage Curves," NBER Chapters, in: Differences and Changes in Wage Structures, pages 145-174, National Bureau of Economic Research, Inc.
    4. Liu, Can & Mullan, Katrina & Liu, Hao & Zhu, Wenqing & Rong, Qingjiao, 2014. "The estimation of long term impacts of China's key priority forestry programs on rural household incomes," Journal of Forest Economics, Elsevier, vol. 20(3), pages 267-285.
    5. A. Colin Cameron & Douglas L. Miller, 2010. "Robust Inference with Clustered Data," Working Papers 318, University of California, Davis, Department of Economics.
    6. Blanchflower, David G & Oswald, Andrew J, 1994. "Estimating a Wage Curve for Britain: 1973-90," Economic Journal, Royal Economic Society, vol. 104(426), pages 1025-1043, September.
    7. Alejo, Javier & Montes-Rojas, Gabriel & Sosa-Escudero, Walter, 2018. "Testing for serial correlation in hierarchical linear models," Journal of Multivariate Analysis, Elsevier, vol. 165(C), pages 101-116.
    8. A. Colin Cameron & Jonah B. Gelbach & Douglas L. Miller, 2008. "Bootstrap-Based Improvements for Inference with Clustered Errors," The Review of Economics and Statistics, MIT Press, vol. 90(3), pages 414-427, August.
    9. Vikström, Johan, 2009. "Cluster sample inference using sensitivity analysis: the case with few groups," Working Paper Series 2009:15, IFAU - Institute for Evaluation of Labour Market and Education Policy.
    10. Kolasa Marcin, 2008. "How does FDI inflow affect productivity of domestic firms? The role of horizontal and vertical spillovers, absorptive capacity and competition," The Journal of International Trade & Economic Development, Taylor & Francis Journals, vol. 17(1), pages 155-173.
    11. Moulton, Brent R & Randolph, William C, 1989. "Alternative Tests of the Error Components Model," Econometrica, Econometric Society, vol. 57(3), pages 685-693, May.
    12. Hao, Can Liu & Mullan, Katrina & Rong, Qingjiao & Zhu, Wenqing, 2013. "Have the Key Priority Forestry Programs Really Impacted on China’s Rural Household Income," PEP Working Papers 160429, Partnership for Economic Policy (PEP).
    13. Montes-Rojas, Gabriel, 2016. "An equicorrelation Moulton factor in the presence of arbitrary intra-cluster correlation," Economics Letters, Elsevier, vol. 145(C), pages 221-224.
    14. Can Liu Hao & Katrina Mullan & Qingjiao Rong & Wenqing Zhu, 2013. "Have the Key Priority Forestry Programs Really Impacted on China's Rural Household Income," Working Papers PIERI 2013-08, PEP-PIERI.
    15. Wasmer, Etienne, 2006. "The Economics of Prozac (Do Employees Really Gain from Strong Employment Protection?)," CEPR Discussion Papers 5991, C.E.P.R. Discussion Papers.
    16. James G. MacKinnon & Matthew D. Webb, 2020. "When and How to Deal with Clustered Errors in Regression Models," Working Paper 1421, Economics Department, Queen's University.
    17. Stephen Drinkwater, 2003. "Go West? Assessing the willingness to move from Central and Eastern European Countries," School of Economics Discussion Papers 0503, School of Economics, University of Surrey.
    18. Valentine Fays & Benoît Mahy & François Rycx, 2021. "Wage Differences According to Workers’ Origin: The Role of Working More Upstream in GVCs," Working Papers CEB 21-016, ULB -- Universite Libre de Bruxelles.
    19. Tarun Khanna & Krishna Palepu, 1999. "Policy Shocks, Market Intermediaries, and Corporate Strategy: The Evolution of Business Groups in Chile and India," Journal of Economics & Management Strategy, Wiley Blackwell, vol. 8(2), pages 271-310, June.
    20. A. Colin Cameron & Jonah B. Gelbach & Douglas L. Miller, 2011. "Robust Inference With Multiway Clustering," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 29(2), pages 238-249, April.

    More about this item

    Keywords

    dataset; estimated variance; spatial autocorrelation; grouped observations;
    All these keywords.

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:hal:cesptp:halshs-03325118. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: . General contact details of provider: https://hal.archives-ouvertes.fr/ .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: CCSD (email available below). General contact details of provider: https://hal.archives-ouvertes.fr/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.