IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2103.00631.html
   My bibliography  Save this paper

On the Subbagging Estimation for Massive Data

Author

Listed:
  • Tao Zou
  • Xian Li
  • Xuan Liang
  • Hansheng Wang

Abstract

This article introduces subbagging (subsample aggregating) estimation approaches for big data analysis with memory constraints of computers. Specifically, for the whole dataset with size $N$, $m_N$ subsamples are randomly drawn, and each subsample with a subsample size $k_N\ll N$ to meet the memory constraint is sampled uniformly without replacement. Aggregating the estimators of $m_N$ subsamples can lead to subbagging estimation. To analyze the theoretical properties of the subbagging estimator, we adapt the incomplete $U$-statistics theory with an infinite order kernel to allow overlapping drawn subsamples in the sampling procedure. Utilizing this novel theoretical framework, we demonstrate that via a proper hyperparameter selection of $k_N$ and $m_N$, the subbagging estimator can achieve $\sqrt{N}$-consistency and asymptotic normality under the condition $(k_Nm_N)/N\to \alpha \in (0,\infty]$. Compared to the full sample estimator, we theoretically show that the $\sqrt{N}$-consistent subbagging estimator has an inflation rate of $1/\alpha$ in its asymptotic variance. Simulation experiments are presented to demonstrate the finite sample performances. An American airline dataset is analyzed to illustrate that the subbagging estimate is numerically close to the full sample estimate, and can be computationally fast under the memory constraint.

Suggested Citation

  • Tao Zou & Xian Li & Xuan Liang & Hansheng Wang, 2021. "On the Subbagging Estimation for Massive Data," Papers 2103.00631, arXiv.org.
  • Handle: RePEc:arx:papers:2103.00631
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2103.00631
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Rilstone, Paul & Srivastava, V. K. & Ullah, Aman, 1996. "The second-order bias and mean squared error of nonlinear estimators," Journal of Econometrics, Elsevier, vol. 75(2), pages 369-395, December.
    2. Honoré,Bo & Pakes,Ariel & Piazzesi,Monika & Samuelson,Larry (ed.), 2017. "Advances in Economics and Econometrics," Cambridge Books, Cambridge University Press, number 9781108400008.
    3. Sokbae Lee & Serena Ng, 2020. "An Econometric Perspective on Algorithmic Subsampling," Annual Review of Economics, Annual Reviews, vol. 12(1), pages 45-80, August.
    4. Boivin, Jean & Ng, Serena, 2006. "Are more data always better for factor analysis?," Journal of Econometrics, Elsevier, vol. 132(1), pages 169-194, May.
    5. Honoré,Bo & Pakes,Ariel & Piazzesi,Monika & Samuelson,Larry (ed.), 2017. "Advances in Economics and Econometrics," Cambridge Books, Cambridge University Press, number 9781316510520.
    6. Hong, H. & Scaillet, O., 2006. "A fast subsampling method for nonlinear dynamic models," Journal of Econometrics, Elsevier, vol. 133(2), pages 557-578, August.
    7. Honoré,Bo & Pakes,Ariel & Piazzesi,Monika & Samuelson,Larry (ed.), 2017. "Advances in Economics and Econometrics," Cambridge Books, Cambridge University Press, number 9781108414982.
    8. Honoré,Bo & Pakes,Ariel & Piazzesi,Monika & Samuelson,Larry (ed.), 2017. "Advances in Economics and Econometrics," Cambridge Books, Cambridge University Press, number 9781108400022.
    9. HaiYing Wang & Rong Zhu & Ping Ma, 2018. "Optimal Subsampling for Large Sample Logistic Regression," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(522), pages 829-844, April.
    10. Serena Ng, 2017. "Opportunities and Challenges: Lessons from Analyzing Terabytes of Scanner Data," NBER Working Papers 23673, National Bureau of Economic Research, Inc.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Mogliani, Matteo & Simoni, Anna, 2021. "Bayesian MIDAS penalized regressions: Estimation, selection, and prediction," Journal of Econometrics, Elsevier, vol. 222(1), pages 833-860.
    2. Nizar Allouch, 2017. "Aggregation in Networks," Studies in Economics 1718, School of Economics, University of Kent.
    3. Pablo Guillen & Róbert F. Veszteg, 2021. "Strategy-proofness in experimental matching markets," Experimental Economics, Springer;Economic Science Association, vol. 24(2), pages 650-668, June.
    4. Martín Almuzara & Gabriele Fiorentini & Enrique Sentana, 2023. "Aggregate Output Measurements: A Common Trend Approach," Advances in Econometrics, in: Essays in Honor of Joon Y. Park: Econometric Methodology in Empirical Applications, volume 45, pages 3-33, Emerald Group Publishing Limited.
    5. Yann Bramoullé & Habiba Djebbari & Bernard Fortin, 2020. "Peer Effects in Networks: A Survey," Annual Review of Economics, Annual Reviews, vol. 12(1), pages 603-629, August.
    6. Guido M. Kuersteiner & Ingmar R. Prucha, 2020. "Dynamic Spatial Panel Models: Networks, Common Shocks, and Sequential Exogeneity," Econometrica, Econometric Society, vol. 88(5), pages 2109-2146, September.
    7. Jungbin Hwang & Gonzalo Valdés, 2020. "Low Frequency Cointegrating Regression in the Presence of Local to Unity Regressors and Unknown Form of Serial Dependence," Working papers 2020-03, University of Connecticut, Department of Economics, revised Aug 2020.
    8. Caggiano, Giovanni & Castelnuovo, Efrem & Delrio, Silvia & Kima, Richard, 2021. "Financial uncertainty and real activity: The good, the bad, and the ugly," European Economic Review, Elsevier, vol. 136(C).
    9. Raffaella Giacomini & Toru Kitagawa, 2021. "Robust Bayesian Inference for Set‐Identified Models," Econometrica, Econometric Society, vol. 89(4), pages 1519-1556, July.
    10. Francesca Molinari, 2020. "Microeconometrics with Partial Identi?cation," CeMMAP working papers CWP15/20, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    11. Marco Stenborg Petterson & David Seim & Jesse M. Shapiro, 2023. "Bounds on a Slope from Size Restrictions on Economic Shocks," American Economic Journal: Microeconomics, American Economic Association, vol. 15(3), pages 552-572, August.
    12. Liza Charroin, 2018. "Homophily, peer effects and dishonesty," Post-Print halshs-01993618, HAL.
    13. Crawford, Vincent P., 2021. "Efficient mechanisms for level-k bilateral trading," Games and Economic Behavior, Elsevier, vol. 127(C), pages 80-101.
    14. Vitor Possebom, 2021. "Crime and Mismeasured Punishment: Marginal Treatment Effect with Misclassification," Papers 2106.00536, arXiv.org, revised Jul 2023.
    15. Dirk Bergemann & Juuso Välimäki, 2019. "Dynamic Mechanism Design: An Introduction," Journal of Economic Literature, American Economic Association, vol. 57(2), pages 235-274, June.
    16. Chen, Mingli & Fernández-Val, Iván & Weidner, Martin, 2021. "Nonlinear factor models for network and panel data," Journal of Econometrics, Elsevier, vol. 220(2), pages 296-324.
    17. Raffaella Giacomini & Toru Kitagawa & Matthew Read, 2021. "Identification and Inference Under Narrative Restrictions," Papers 2102.06456, arXiv.org.
    18. Atsushi Inoue & Lutz Kilian, 2020. "The Role of the Prior in Estimating VAR Models with Sign Restrictions," Working Papers 2030, Federal Reserve Bank of Dallas.
    19. Alaa Abi Morshed & Elena Andreou & Otilia Boldea, 2018. "Structural Break Tests Robust to Regression Misspecification," Econometrics, MDPI, vol. 6(2), pages 1-39, May.
    20. de Paula, Aureo & Rasul, Imran & Souza, Pedro, 2018. "Identifying Network Ties from Panel Data: Theory and an Application to Tax Competition," CEPR Discussion Papers 12792, C.E.P.R. Discussion Papers.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2103.00631. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.