IDEAS home Printed from https://ideas.repec.org/a/bla/scjsta/v50y2023i3p1391-1419.html

Spatial bootstrapped microeconometrics: Forecasting for out‐of‐sample geo‐locations in big data

Author

Listed:
  • Katarzyna Kopczewska

Abstract

Spatial econometric models estimated on the big geo‐located point data have at least two problems: limited computational capabilities and inefficient forecasting for the new out‐of‐sample geo‐points. This is because of spatial weights matrix W defined for in‐sample observations only and the computational complexity. Machine learning models suffer the same when using kriging for predictions; thus this problem still remains unsolved. The paper presents a novel methodology for estimating spatial models on big data and predicting in new locations. The approach uses bootstrap and tessellation to calibrate both model and space. The best bootstrapped model is selected with the PAM (Partitioning Around Medoids) algorithm by classifying the regression coefficients jointly in a nonindependent manner. Voronoi polygons for the geo‐points used in the best model allow for a representative space division. New out‐of‐sample points are assigned to tessellation tiles and linked to the spatial weights matrix as a replacement for an original point what makes feasible usage of calibrated spatial models as a forecasting tool for new locations. There is no trade‐off between forecast quality and computational efficiency in this approach. An empirical example illustrates a model for business locations and firms' profitability.

Suggested Citation

  • Katarzyna Kopczewska, 2023. "Spatial bootstrapped microeconometrics: Forecasting for out‐of‐sample geo‐locations in big data," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 50(3), pages 1391-1419, September.
  • Handle: RePEc:bla:scjsta:v:50:y:2023:i:3:p:1391-1419
    DOI: 10.1111/sjos.12636
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/sjos.12636
    Download Restriction: no

    File URL: https://libkey.io/10.1111/sjos.12636?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. LeSage, James P. & Kelley Pace, R., 2007. "A matrix exponential spatial specification," Journal of Econometrics, Elsevier, vol. 140(1), pages 190-214, September.
    2. Michel Goulard & Thibault Laurent & Christine Thomas-Agnan, 2017. "About predictions in spatial autoregressive models: optimal and almost optimal strategies," Spatial Economic Analysis, Taylor & Francis Journals, vol. 12(2-3), pages 304-325, July.
    3. Matthew J. Heaton & Abhirup Datta & Andrew O. Finley & Reinhard Furrer & Joseph Guinness & Rajarshi Guhaniyogi & Florian Gerber & Robert B. Gramacy & Dorit Hammerling & Matthias Katzfuss & Finn Lindgr, 2019. "A Case Study Competition Among Methods for Analyzing Large Spatial Data," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 24(3), pages 398-425, September.
    4. Katarzyna Kopczewska, 2022. "Spatial machine learning: new opportunities for regional science," The Annals of Regional Science, Springer;Western Regional Science Association, vol. 68(3), pages 713-755, June.
    5. Moulton, Lawrence H. & Zeger, Scott L., 1991. "Bootstrapping generalized linear models," Computational Statistics & Data Analysis, Elsevier, vol. 11(1), pages 53-63, January.
    6. Tim C. Hesterberg, 2015. "What Teachers Should Know About the Bootstrap: Resampling in the Undergraduate Statistics Curriculum," The American Statistician, Taylor & Francis Journals, vol. 69(4), pages 371-386, November.
    7. Davide Piacentino & Martina Aronica & Diego Giuliani & Andrea Mazzitelli & Maria Francesca Cracolici, 2021. "The effect of agglomeration economies and geography on the survival of accommodation businesses in Sicily," Spatial Economic Analysis, Taylor & Francis Journals, vol. 16(2), pages 176-193, April.
    8. Schratz, Patrick & Muenchow, Jannes & Iturritxa, Eugenia & Richter, Jakob & Brenning, Alexander, 2019. "Hyperparameter tuning and performance assessment of statistical and machine-learning algorithms using spatial data," Ecological Modelling, Elsevier, vol. 406(C), pages 109-120.
    9. Flavio Santi & Maria Michela Dickson & Giuseppe Espa & Emanuele Taufer & Andrea Mazzitelli, 2021. "Handling spatial dependence under unknown unit locations," Spatial Economic Analysis, Taylor & Francis Journals, vol. 16(2), pages 194-216, April.
    10. Arbia, G. & Ghiringhelli, C. & Mira, A., 2019. "Estimation of spatial econometric linear models with large datasets: How big can spatial Big Data be?," Regional Science and Urban Economics, Elsevier, vol. 76(C), pages 67-73.
    11. Gandomi, Amir & Haider, Murtaza, 2015. "Beyond the hype: Big data concepts, methods, and analytics," International Journal of Information Management, Elsevier, vol. 35(2), pages 137-144.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Fernando López & Konstatin Kholodilin, 2023. "Putting MARS into space. Non‐linearities and spatial effects in hedonic models," Papers in Regional Science, Wiley Blackwell, vol. 102(4), pages 871-896, August.
    2. Katarzyna Kopczewska, 2022. "Spatial machine learning: new opportunities for regional science," The Annals of Regional Science, Springer;Western Regional Science Association, vol. 68(3), pages 713-755, June.
    3. May, Paul & Biesecker, Matthew & Rekabdarkolaee, Hossein Moradi, 2022. "Response envelopes for linear coregionalization models," Journal of Multivariate Analysis, Elsevier, vol. 192(C).
    4. Claeskens, Gerda & Aerts, Marc & Molenberghs, Geert, 2003. "A quadratic bootstrap method and improved estimation in logistic regression," Statistics & Probability Letters, Elsevier, vol. 61(4), pages 383-394, February.
    5. Huang, Whitney K. & Chung, Yu-Min & Wang, Yu-Bo & Mandel, Jeff E. & Wu, Hau-Tieng, 2022. "Airflow recovery from thoracic and abdominal movements using synchrosqueezing transform and locally stationary Gaussian process regression," Computational Statistics & Data Analysis, Elsevier, vol. 174(C).
    6. Yiliang Zhu & Tao Wang & Jenny Z.H. Jelsovsky, 2007. "Bootstrap Estimation of Benchmark Doses and Confidence Limits with Clustered Quantal Data," Risk Analysis, John Wiley & Sons, vol. 27(2), pages 447-465, April.
    7. Lorenzo Tedesco & Jacopo Rodeschini & Philipp Otto, 2025. "Computational Benchmark Study in Spatio‐Temporal Statistics With a Hands‐On Guide to Optimise R," Environmetrics, John Wiley & Sons, Ltd., vol. 36(5), July.
    8. Jale Samuwai & Jeremy Maxwell Hills, 2018. "Assessing Climate Finance Readiness in the Asia-Pacific Region," Sustainability, MDPI, vol. 10(4), pages 1-18, April.
    9. Julia Eichholz & Thorsten Knauer & Sandra Winkelmann, 2023. "Digital Maturity of Forecasting and its Impact in Times of Crisis," Schmalenbach Journal of Business Research, Springer, vol. 75(4), pages 443-481, December.
    10. Sayed Akif Hussain & Chen Qiu-Shi & Faiza Saleem & Syed Amer Hussain & Syed Atif Hussain & Asma Komal, 2025. "Algorithmic Echo Chamber: How AI and Social Media Platforms Influence and Amplify Investor Sentiment," International Journal of Research and Innovation in Social Science, International Journal of Research and Innovation in Social Science (IJRISS), vol. 9(15), pages 1149-1158, September.
    11. Gregory Bauer & Keith Vorkink, 2007. "Multivariate Realized Stock Market Volatility," Staff Working Papers 07-20, Bank of Canada.
    12. Juergen Deppner & Marcelo Cajias, 2024. "Accounting for Spatial Autocorrelation in Algorithm-Driven Hedonic Models: A Spatial Cross-Validation Approach," The Journal of Real Estate Finance and Economics, Springer, vol. 68(2), pages 235-273, February.
    13. Matthias Katzfuss & Joseph Guinness & Wenlong Gong & Daniel Zilber, 2020. "Vecchia Approximations of Gaussian-Process Predictions," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 25(3), pages 383-414, September.
    14. Ahmad Ibrahim Aljumah & Mohammed T. Nuseir & Md. Mahmudul Alam, 2021. "Traditional marketing analytics, big data analytics and big data system quality and the success of new product development," Post-Print hal-03538161, HAL.
    15. Cano-Marin, Enrique & Mora-Cantallops, Marçal & Sánchez-Alonso, Salvador, 2023. "Twitter as a predictive system: A systematic literature review," Journal of Business Research, Elsevier, vol. 157(C).
    16. de Camargo Fiorini, Paula & Roman Pais Seles, Bruno Michel & Chiappetta Jabbour, Charbel Jose & Barberio Mariano, Enzo & de Sousa Jabbour, Ana Beatriz Lopes, 2018. "Management theory and big data literature: From a review to a research agenda," International Journal of Information Management, Elsevier, vol. 43(C), pages 112-129.
    17. Zilber, Daniel & Katzfuss, Matthias, 2021. "Vecchia–Laplace approximations of generalized Gaussian processes for big non-Gaussian spatial data," Computational Statistics & Data Analysis, Elsevier, vol. 153(C).
    18. Margaret C Johnson & Brian J Reich & Josh M Gray, 2021. "Multisensor fusion of remotely sensed vegetation indices using space‐time dynamic linear models," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 70(3), pages 793-812, June.
    19. Amiri, Babak & Karimianghadim, Ramin, 2024. "A novel text clustering model based on topic modelling and social network analysis," Chaos, Solitons & Fractals, Elsevier, vol. 181(C).
    20. Heaton, Matthew J. & Dahl, Benjamin K. & Dayley, Caleb & Warr, Richard L. & White, Philip, 2024. "Integrating machine learning and Bayesian nonparametrics for flexible modeling of point pattern data," Computational Statistics & Data Analysis, Elsevier, vol. 191(C).

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:scjsta:v:50:y:2023:i:3:p:1391-1419. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.blackwellpublishing.com/journal.asp?ref=0303-6898 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.