IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2506.15723.html
   My bibliography  Save this paper

Modern approaches to building interpretable models of the property market using machine learning on the base of mass cadastral valuation

Author

Listed:
  • Irina G. Tanashkina
  • Alexey S. Tanashkin
  • Alexander S. Maksimchuik
  • Anna Yu. Poshivailo

Abstract

In this article, we review modern approaches to building interpretable models of property markets using machine learning on the base of mass valuation of property in the Primorye region, Russia. The researcher, lacking expertise in this topic, encounters numerous difficulties in the effort to build a good model. The main source of this is the huge difference between noisy real market data and ideal data which is very common in all types of tutorials on machine learning. This paper covers all stages of modeling: the collection of initial data, identification of outliers, the search and analysis of patterns in the data, the formation and final choice of price factors, the building of the model, and the evaluation of its efficiency. For each stage, we highlight potential issues and describe sound methods for overcoming emerging difficulties on actual examples. We show that the combination of classical linear regression with interpolation methods of geostatistics allows to build an effective model for land parcels. For flats, when many objects are attributed to one spatial point the application of geostatistical methods is difficult. Therefore we suggest linear regression with automatic generation and selection of additional rules on the base of decision trees, so called the RuleFit method. Thus we show, that despite such a strong restriction as the requirement of interpretability which is important in practical aspects, for example, legal matters, it is still possible to build effective models of real property markets.

Suggested Citation

  • Irina G. Tanashkina & Alexey S. Tanashkin & Alexander S. Maksimchuik & Anna Yu. Poshivailo, 2025. "Modern approaches to building interpretable models of the property market using machine learning on the base of mass cadastral valuation," Papers 2506.15723, arXiv.org, revised Jul 2025.
  • Handle: RePEc:arx:papers:2506.15723
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2506.15723
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Chica-Olmo, Jorge & Cano-Guervos, Rafael, 2020. "Does my house have a premium or discount in relation to my neighbors? A regression-kriging approach," Socio-Economic Planning Sciences, Elsevier, vol. 72(C).
    2. Jorge Chica-Olmo, 2007. "Prediction of Housing Location Price by a Multivariate Spatial Method: Cokriging," Journal of Real Estate Research, American Real Estate Society, vol. 29(1), pages 95-114.
    3. Friedman, Jerome H. & Hastie, Trevor & Tibshirani, Rob, 2010. "Regularization Paths for Generalized Linear Models via Coordinate Descent," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 33(i01).
    4. Jorge Chica-Olmo, 2007. "Prediction of Housing Location Price by a Multivariate Spatial Method: Cokriging," Journal of Real Estate Research, Taylor & Francis Journals, vol. 29(1), pages 91-114, January.
    5. Charles R. Harris & K. Jarrod Millman & Stéfan J. Walt & Ralf Gommers & Pauli Virtanen & David Cournapeau & Eric Wieser & Julian Taylor & Sebastian Berg & Nathaniel J. Smith & Robert Kern & Matti Picu, 2020. "Array programming with NumPy," Nature, Nature, vol. 585(7825), pages 357-362, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Reza Amindarbari & Perver Baran & Ross K. Meentemeyer, 2023. "Spatially disaggregated simulation of interactions between home prices and land-use change," Environment and Planning B, , vol. 50(7), pages 1879-1897, September.
    2. Krzysztof Drachal, 2022. "Forecasting the Crude Oil Spot Price with Bayesian Symbolic Regression," Energies, MDPI, vol. 16(1), pages 1-29, December.
    3. Wang, Kailai & Lim, Gino J. & Race, Bruce & Zhang, Yunpeng (Jack) & Gao, Lu & Qiao, Fengxiang (George), 2025. "Examining spatial patterns and economic interactions of logistics activities across three Texas metropolitan areas," Journal of Transport Geography, Elsevier, vol. 123(C).
    4. Tutz, Gerhard & Pößnecker, Wolfgang & Uhlmann, Lorenz, 2015. "Variable selection in general multinomial logit models," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 207-222.
    5. Viet Hoang Dinh & Didier Nibbering & Benjamin Wong, 2023. "Random Subspace Local Projections," CAMA Working Papers 2023-34, Centre for Applied Macroeconomic Analysis, Crawford School of Public Policy, The Australian National University.
    6. Tan Wang & L. Jeff Hong, 2023. "Large-Scale Inventory Optimization: A Recurrent Neural Networks–Inspired Simulation Approach," INFORMS Journal on Computing, INFORMS, vol. 35(1), pages 196-215, January.
    7. Geeraert, Joke & Rocha, Luis E.C. & Vandeviver, Christophe, 2024. "The impact of violent behavior on co-offender selection: Evidence of behavioral homophily," Journal of Criminal Justice, Elsevier, vol. 94(C).
    8. Léon Faure & Bastien Mollet & Wolfram Liebermeister & Jean-Loup Faulon, 2023. "A neural-mechanistic hybrid approach improving the predictive power of genome-scale metabolic models," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    9. Ernesto Carrella & Richard M. Bailey & Jens Koed Madsen, 2018. "Indirect inference through prediction," Papers 1807.01579, arXiv.org.
    10. Claudia Quinteros-Cartaya & Guillermo Solorio-Magaña & Francisco Javier Núñez-Cornú & Felipe de Jesús Escalona-Alcázar & Diana Núñez, 2023. "Microearthquakes in the Guadalajara Metropolitan Zone, Mexico: evidence from buried active faults in Tesistán Valley, Zapopan," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 116(3), pages 2797-2818, April.
    11. Rui Wang & Naihua Xiu & Kim-Chuan Toh, 2021. "Subspace quadratic regularization method for group sparse multinomial logistic regression," Computational Optimization and Applications, Springer, vol. 79(3), pages 531-559, July.
    12. Mkhadri, Abdallah & Ouhourane, Mohamed, 2013. "An extended variable inclusion and shrinkage algorithm for correlated variables," Computational Statistics & Data Analysis, Elsevier, vol. 57(1), pages 631-644.
    13. Masakazu Higuchi & Mitsuteru Nakamura & Shuji Shinohara & Yasuhiro Omiya & Takeshi Takano & Daisuke Mizuguchi & Noriaki Sonota & Hiroyuki Toda & Taku Saito & Mirai So & Eiji Takayama & Hiroo Terashi &, 2022. "Detection of Major Depressive Disorder Based on a Combination of Voice Features: An Exploratory Approach," IJERPH, MDPI, vol. 19(18), pages 1-13, September.
    14. Susan Athey & Guido W. Imbens & Stefan Wager, 2018. "Approximate residual balancing: debiased inference of average treatment effects in high dimensions," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 80(4), pages 597-623, September.
    15. Furqan Dar & Samuel R. Cohen & Diana M. Mitrea & Aaron H. Phillips & Gergely Nagy & Wellington C. Leite & Christopher B. Stanley & Jeong-Mo Choi & Richard W. Kriwacki & Rohit V. Pappu, 2024. "Biomolecular condensates form spatially inhomogeneous network fluids," Nature Communications, Nature, vol. 15(1), pages 1-17, December.
    16. Vincent, Martin & Hansen, Niels Richard, 2014. "Sparse group lasso and high dimensional multinomial classification," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 771-786.
    17. Chen, Le-Yu & Lee, Sokbae, 2018. "Best subset binary prediction," Journal of Econometrics, Elsevier, vol. 206(1), pages 39-56.
    18. Álvarez-Liébana, J. & López-Pérez, A. & González-Manteiga, W. & Febrero-Bande, M., 2025. "A goodness-of-fit test for functional time series with applications to Ornstein-Uhlenbeck processes," Computational Statistics & Data Analysis, Elsevier, vol. 203(C).
    19. Perrot-Dockès Marie & Lévy-Leduc Céline & Chiquet Julien & Sansonnet Laure & Brégère Margaux & Étienne Marie-Pierre & Robin Stéphane & Genta-Jouve Grégory, 2018. "A variable selection approach in the multivariate linear model: an application to LC-MS metabolomics data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 17(5), pages 1-14, October.
    20. Fan, Jianqing & Jiang, Bai & Sun, Qiang, 2022. "Bayesian factor-adjusted sparse regression," Journal of Econometrics, Elsevier, vol. 230(1), pages 3-19.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2506.15723. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.