IDEAS home Printed from https://ideas.repec.org/p/arz/wpaper/eres2021_51.html
   My bibliography  Save this paper

Accounting for Spatial Autocorrelation in Algorithm-Driven Hedonic Models: A Spatial Cross-Validation Approach

Author

Listed:
  • Juergen Deppner
  • Marcelo Cajias
  • Wolfgang Schäfers

Abstract

Aim of research: Real estate markets are featured with a spatial dimension that is pivotal for the economic value of housing. The inherent spatial dependence in the underlying price determination process cannot be simply overlooked in linear hedonic model specifications, as this would render spurious results (see Anselin, 1988; Can and Megbolugbe, 1997; Basu and Thibodeau, 1998). Guidance on how to account for spatial dependence in linear regression models is vast and remains subject of many contributions to the hedonic and spatial econometric literature (see LeSage and Pace, 2009; Anselin, 2010; Elhorst, 2014). Moving from the parametric paradigm of hedonic regression methods to the universe of non-parametric statistical learning methods such as decision trees, random forests, or boosting techniques, literature has brought forth an increasing body of evidence that such algorithms are capable of providing a superior predictive performance for complex non-linear and multi-dimensional regression problems, including various applications to house price estimation (e.g. Mayer et al., 2019; Pace and Hayunga, 2020; Bogin and Shui, 2020). However, in contrast to linear models, little attention has been paid to the implications of spatial dependence in house prices for the statistical validity of error estimates of machine learning algorithms although independence of the data is implicitly assumed (see Roberts et al., 2017; Schratz et al., 2019). Our study aims at investigating the role of spatial autocorrelation (SAC) on the accuracy assessment of algorithmic hedonic methods, thereby benchmarking spatially conscious machine learning approaches to linear and spatial hedonic methods. Study design and methodology: Machine learning algorithms learn the relationship between the response and the regressors autonomously without requiring any a-priori specifications about their functional form. As their high flexibility makes such approaches prone to overfitting, resampling strategies such as k-fold cross validation are applied to approximate a models out-of-sample predictive performance. During resampling, the observations are randomly partitioned into mutually exclusive training and test subsets, whereby the predictor is fitted on the training data and evaluated on the test data. SAC can be accounted for using spatial resampling strategies which attempt to reduce SAC between training and test data through a modification in the splitting process. Instead of randomly partitioning the data which implicitly assumes their independence, spatially clustered partitions are created using the observations coordinates (see Brenning, 2012). We train and evaluate tree-based algorithms on a pooled cross-section of asking rents in Germany using both, random as well as spatial partitioning and subsequently forecast out-of-sample data to assess the bias in the in-sample error estimates associated with SAC. The results are benchmarked to well-specified ordinary least squares and spatial autoregressive frameworks to compare the models generalizability. Originalty and implications: Applying machine learning to spatial data without accounting for SAC provides the predictor with information that is assumed to be unavailable during training, which may lead to biased accuracy assessment (see Lovelace et al., 2021). This study sheds light on the accuracy bias of random resampling induced by SAC in a hedonic context. The results prove useful for increasing the robustness and generalizability of algorithmic approaches to hedonic regression problems, thereby containing valuable implications for appraisal practices. To the best of our knowledge, no research in the existing literature has thus far accounted for SAC in an algorithm-driven hedonic context by applying spatial cross-validation. We conclude that random resampling yields over-optimistic prediction accuracies whereas spatial resampling increases generalizability, and thus robustness to unseen data. We also find the bias to be lower for algorithms which apply column-subsampling to counteract overfitting.

Suggested Citation

  • Juergen Deppner & Marcelo Cajias & Wolfgang Schäfers, 2021. "Accounting for Spatial Autocorrelation in Algorithm-Driven Hedonic Models: A Spatial Cross-Validation Approach," ERES eres2021_51, European Real Estate Society (ERES).
  • Handle: RePEc:arz:wpaper:eres2021_51
    as

    Download full text from publisher

    File URL: https://eres.architexturez.net/doc/oai-eres-id-eres2021-51
    Download Restriction: no
    ---><---

    More about this item

    Keywords

    Hedonic Models; Machine Learning; Spatial Autocorrelation; Spatial Cross Validation;
    All these keywords.

    JEL classification:

    • R3 - Urban, Rural, Regional, Real Estate, and Transportation Economics - - Real Estate Markets, Spatial Production Analysis, and Firm Location

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arz:wpaper:eres2021_51. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Architexturez Imprints (email available below). General contact details of provider: https://edirc.repec.org/data/eressea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.