IDEAS home Printed from https://ideas.repec.org/p/hal/journl/halshs-03325118.html
   My bibliography  Save this paper

Biases on variances estimated on large data-sets

Author

Listed:
  • François Gardes

    (CES - Centre d'économie de la Sorbonne - UP1 - Université Paris 1 Panthéon-Sorbonne - CNRS - Centre National de la Recherche Scientifique, UP1 - Université Paris 1 Panthéon-Sorbonne, PSE - Paris School of Economics - UP1 - Université Paris 1 Panthéon-Sorbonne - ENS-PSL - École normale supérieure - Paris - PSL - Université Paris sciences et lettres - EHESS - École des hautes études en sciences sociales - ENPC - École des Ponts ParisTech - CNRS - Centre National de la Recherche Scientifique - INRAE - Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement, UCO - Université Catholique de l'Ouest)

Abstract

The inverse dependency of the estimated variances over the sample size throws a fundamental question on the validity of the usual statistical methodology, since any hypothesis on the value of a coefficient can be tested negatively by increasing the size of the data-set. I suppose that large data-sets are characterized by a concentration of information on homogenous sub-populations, a spatial autocorrelation of the error terms and the covariates may bias the estimation of variances. Using the corrections of variances under spatial autocorrelation, we obtain variances comparable to an estimation on sub-samples (named efficient sub-samples) the sizes of which are sufficient to contain the information which gives rise to similar estimates to those obtained on the whole population. Moreover, the estimation on efficient data-sets does not necessitate the specification of the spatial autocorrelations which are supposed to bias the estimated variances.

Suggested Citation

  • François Gardes, 2021. "Biases on variances estimated on large data-sets," Post-Print halshs-03325118, HAL.
  • Handle: RePEc:hal:journl:halshs-03325118
    Note: View the original document on HAL open archive server: https://shs.hal.science/halshs-03325118
    as

    Download full text from publisher

    File URL: https://shs.hal.science/halshs-03325118/document
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. MacKinnon, James G., 2020. "Wild cluster bootstrap confidence intervals," L'Actualité Economique, Société Canadienne de Science Economique, vol. 96(4), pages 721-743, Décembre.
    2. McCloskey, Donald N, 1985. "The Loss Function Has Been Mislaid: The Rhetoric of Significance Tests," American Economic Review, American Economic Association, vol. 75(2), pages 201-205, May.
    3. Greenwald, Bruce C., 1983. "A general analysis of bias in the estimated standard errors of least squares coefficients," Journal of Econometrics, Elsevier, vol. 22(3), pages 323-338, August.
    4. François Gardes, 2019. "The Estimation of Price Elasticities and the Value of Time in a Domestic Production Framework: an Application using French Micro-Data," Annals of Economics and Statistics, GENES, issue 135, pages 89-120.
    5. Adrian C. Darnell & J. L. Evans, 1990. "The Limits of Econometrics," Books, Edward Elgar Publishing, number 119.
    6. François Gardes, 2019. "The Estimation of Price Elasticities and the Value of Time in a Domestic Production Framework: an Application using French Micro-Data," PSE-Ecole d'économie de Paris (Postprint) hal-03281830, HAL.
    7. François Gardes, 2019. "The Estimation of Price Elasticities and the Value of Time in a Domestic Production Framework: an Application using French Micro-Data," Post-Print hal-03281830, HAL.
    8. Moulton, Brent R, 1987. "Diagnostics for Group Effects in Regression Analysis," Journal of Business & Economic Statistics, American Statistical Association, vol. 5(2), pages 275-282, April.
    9. Moulton, Brent R., 1986. "Random group effects and the precision of regression estimates," Journal of Econometrics, Elsevier, vol. 32(3), pages 385-397, August.
    10. Kloek, T, 1981. "OLS Estimation in a Model Where a Microvariable Is Explained by Aggregates and Contemporaneous Disturbances Are Equicorrelated," Econometrica, Econometric Society, vol. 49(1), pages 205-207, January.
    11. François Gardes, 2019. "The Estimation of Price Elasticities and the Value of Time in a Domestic Production Framework: an Application using French Micro-Data," Université Paris1 Panthéon-Sorbonne (Post-Print and Working Papers) hal-03281830, HAL.
    12. Moulton, Brent R, 1990. "An Illustration of a Pitfall in Estimating the Effects of Aggregate Variables on Micro Unit," The Review of Economics and Statistics, MIT Press, vol. 72(2), pages 334-338, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. François Gardes, 2021. "Biases on variances estimated on large data-sets," Documents de travail du Centre d'Economie de la Sorbonne 21022, Université Panthéon-Sorbonne (Paris 1), Centre d'Economie de la Sorbonne.
    2. François Gardes, 2021. "Biases on variances estimated on large data-sets," Université Paris1 Panthéon-Sorbonne (Post-Print and Working Papers) halshs-03325118, HAL.
    3. François Gardes, 2021. "On the value of time and human life," Documents de travail du Centre d'Economie de la Sorbonne 21023, Université Panthéon-Sorbonne (Paris 1), Centre d'Economie de la Sorbonne.
    4. David G. Blanchflower & Andrew Oswald, 1995. "International Wage Curves," NBER Chapters, in: Differences and Changes in Wage Structures, pages 145-174, National Bureau of Economic Research, Inc.
    5. François Gardes, 2021. "A Method to infer time preference from the value of time," Post-Print halshs-03289200, HAL.
    6. A. Colin Cameron & Douglas L. Miller, 2010. "Robust Inference with Clustered Data," Working Papers 106, University of California, Davis, Department of Economics.
    7. Thomas Barrios & Rebecca Diamond & Guido W. Imbens & Michal Kolesár, 2012. "Clustering, Spatial Correlations, and Randomization Inference," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(498), pages 578-591, June.
    8. A. Colin Cameron & Jonah B. Gelbach & Douglas L. Miller, 2008. "Bootstrap-Based Improvements for Inference with Clustered Errors," The Review of Economics and Statistics, MIT Press, vol. 90(3), pages 414-427, August.
    9. François Gardes, 2021. "An Austrian Trade Cycle model with an Endogenous Value of Time," Documents de travail du Centre d'Economie de la Sorbonne 21025, Université Panthéon-Sorbonne (Paris 1), Centre d'Economie de la Sorbonne.
    10. Vikström, Johan, 2009. "Cluster sample inference using sensitivity analysis: the case with few groups," Working Paper Series 2009:15, IFAU - Institute for Evaluation of Labour Market and Education Policy.
    11. Kolasa Marcin, 2008. "How does FDI inflow affect productivity of domestic firms? The role of horizontal and vertical spillovers, absorptive capacity and competition," The Journal of International Trade & Economic Development, Taylor & Francis Journals, vol. 17(1), pages 155-173.
    12. François Gardes, 2021. "On the value of time and human life," Université Paris1 Panthéon-Sorbonne (Post-Print and Working Papers) halshs-03325332, HAL.
    13. Philip R. P. Coelho & Tung Liu, 2012. "The Returns to College Education," Working Papers 201202, Ball State University, Department of Economics, revised Aug 2012.
    14. François Gardes, 2021. "Endogenous Prices in a Riemannian Geometry Framework," Documents de travail du Centre d'Economie de la Sorbonne 21026, Université Panthéon-Sorbonne (Paris 1), Centre d'Economie de la Sorbonne.
    15. A. Colin Cameron & Jonah B. Gelbach & Douglas L. Miller, 2008. "Bootstrap-Based Improvements for Inference with Clustered Errors," The Review of Economics and Statistics, MIT Press, vol. 90(3), pages 414-427, August.
    16. Liu, Can & Mullan, Katrina & Liu, Hao & Zhu, Wenqing & Rong, Qingjiao, 2014. "The estimation of long term impacts of China's key priority forestry programs on rural household incomes," Journal of Forest Economics, Elsevier, vol. 20(3), pages 267-285.
    17. A. Colin Cameron & Douglas L. Miller, 2010. "Robust Inference with Clustered Data," Working Papers 318, University of California, Davis, Department of Economics.
    18. François Gardes, 2021. "A Method to infer time preference from the value of time," Université Paris1 Panthéon-Sorbonne (Post-Print and Working Papers) halshs-03289200, HAL.
    19. Karanfil, Fatih & Pierru, Axel, 2021. "The opportunity cost of domestic oil consumption for an oil exporter: Illustration for Saudi Arabia," Energy Economics, Elsevier, vol. 96(C).
    20. James G. MacKinnon & Matthew D. Webb, 2020. "When and How to Deal with Clustered Errors in Regression Models," Working Paper 1421, Economics Department, Queen's University.

    More about this item

    Keywords

    dataset; estimated variance; spatial autocorrelation; grouped observations;
    All these keywords.

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:hal:journl:halshs-03325118. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: CCSD (email available below). General contact details of provider: https://hal.archives-ouvertes.fr/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.