IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0321263.html
   My bibliography  Save this article

A population spatialization method based on the integration of feature selection and an improved random forest model

Author

Listed:
  • Zhen Zhao
  • Hongmei Guo
  • Xueli Jiang
  • Ying Zhang
  • Changjiang Lu
  • Can Zhang
  • Zonghang He

Abstract

Ascertaining the precise and accurate spatial distribution of population is essential in conducting effective urban planning, resource allocation, and emergency rescue planning. The random forest (RF) model is widely used in population spatialization studies. However, the complexity of population distribution characteristics and the limitations of the RF model in processing unbalanced datasets affect population prediction accuracy. To address these issues, a population spatialization model that integrates feature selection with an improved random forest is proposed herein. Firstly, recursive feature elimination using cross validation (RFECV), maximum information coefficient (MIC), and mean decrease accuracy (MDA) methods were utilized to select population distribution feature factors. The random forest was constructed using feature subsets that were selected via different feature selection methods, namely MIC-RF, RFECV-RF and MDA-RF. Subsequently, the feature factors corresponding to the model with the highest accuracy were selected as the optimal feature subsets and used in the model construction as input data. Additionally, considering the imbalanced in population spatial distribution, we used the K-means ++ clustering algorithm to cluster the optimal feature subset, and we used the bootstrap sampling method to extract the same amount of data from each cluster and fuse it with the training subset to build an improved random forest model. Based on this model, a spatial population distribution dataset of the Southern Sichuan Economic Zone at a 500m resolution was generated. Finally, the population dataset generated in this study was compared and validated with the WorldPop dataset. The results showed that utilizing feature selection methods improves model accuracy to varying degrees compared with RF based on all factors, and the MDA-RF had the lowest MAPE of 0.174 and the highest R2 of 0.913 among them. Therefore, feature factors selection using the MDA method was considered the optimal feature subset. Compared with MDA-RF, the prediction accuracy of the improved RF built on the same subset increased by 1.7%, indicating that improving the bootstrap sampling of random forest by using the K-means++ clustering algorithm can enhance model accuracy to some extent. Compared with the WorldPop dataset, the accuracy of the results predicted using the proposed method was enhanced. The MRE and RMSE of the WorldPop dataset were 57.24 and 23174.98, respectively, while the MRE and RMSE of the proposed method were 25.00 and 15776.50, respectively. This implies that the method proposed in this paper could simulate population spatial distribution more accurately.

Suggested Citation

  • Zhen Zhao & Hongmei Guo & Xueli Jiang & Ying Zhang & Changjiang Lu & Can Zhang & Zonghang He, 2025. "A population spatialization method based on the integration of feature selection and an improved random forest model," PLOS ONE, Public Library of Science, vol. 20(4), pages 1-25, April.
  • Handle: RePEc:plo:pone00:0321263
    DOI: 10.1371/journal.pone.0321263
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0321263
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0321263&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0321263?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Guangqing Chi & Jun Zhu, 2008. "Spatial Regression Models for Demographic Analysis," Population Research and Policy Review, Springer;Southern Demographic Association (SDA), vol. 27(1), pages 17-42, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yi Yang & Jie Li & Guobin Zhu & Qiangqiang Yuan, 2019. "Spatio–Temporal Relationship and Evolvement of Socioeconomic Factors and PM 2.5 in China During 1998–2016," IJERPH, MDPI, vol. 16(7), pages 1-24, March.
    2. Xiya Zhang & Haibo Hu, 2019. "Combining Data from Multiple Sources to Evaluate Spatial Variations in the Economic Costs of PM 2.5 -Related Health Conditions in the Beijing–Tianjin–Hebei Region," IJERPH, MDPI, vol. 16(20), pages 1-17, October.
    3. Wanxu Chen & Guangqing Chi & Jiangfeng Li, 2020. "Ecosystem Services and Their Driving Forces in the Middle Reaches of the Yangtze River Urban Agglomerations, China," IJERPH, MDPI, vol. 17(10), pages 1-19, May.
    4. Tom Wilson & Irina Grossman & Monica Alexander & Phil Rees & Jeromey Temple, 2022. "Methods for Small Area Population Forecasts: State-of-the-Art and Research Needs," Population Research and Policy Review, Springer;Southern Demographic Association (SDA), vol. 41(3), pages 865-898, June.
    5. Dike Zhang & Jianpeng Wang & Ying Wang & Lei Xu & Liang Zheng & Bowen Zhang & Yuzhe Bi & Hui Yang, 2022. "Is There a Spatial Relationship between Urban Landscape Pattern and Habitat Quality? Implication for Landscape Planning of the Yellow River Basin," IJERPH, MDPI, vol. 19(19), pages 1-17, September.
    6. Hannaliis Jaadla & Alice Reid, 2017. "The geography of early childhood mortality in England and Wales, 1881–1911," Demographic Research, Max Planck Institute for Demographic Research, Rostock, Germany, vol. 37(58), pages 1861-1890.
    7. Stephen Matthews & Daniel M. Parker, 2013. "Progress in Spatial Demography," Demographic Research, Max Planck Institute for Demographic Research, Rostock, Germany, vol. 28(10), pages 271-312.
    8. Sascha O. Becker & Francesco Cinnirella & Ludger Woessmann, 2012. "The effect of investment in children’s education on fertility in 1816 Prussia," Cliometrica, Journal of Historical Economics and Econometric History, Association Française de Cliométrie (AFC), vol. 6(1), pages 29-44, January.
    9. Michael R. Schwob & Mevin B. Hooten & Travis McDevitt-Galles, 2023. "Dynamic Population Models with Temporal Preferential Sampling to Infer Phenology," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 28(4), pages 774-791, December.
    10. Daniel Liviano & Josep-Maria Arauzo-Carod, 2012. "Spatial Exploration of Age Distribution in Catalan Municipalities," ERSA conference papers ersa12p81, European Regional Science Association.
    11. Greg Rybarczyk & Dorceta Taylor & Shannon Brines & Richard Wetzel, 2019. "A Geospatial Analysis of Access to Ethnic Food Retailers in Two Michigan Cities: Investigating the Importance of Outlet Type within Active Travel Neighborhoods," IJERPH, MDPI, vol. 17(1), pages 1-18, December.
    12. Rares Halbac-Cotoara-Zamfir & Gianluca Egidi & Rosanna Salvia & Luca Salvati & Adele Sateriano & Antonio Gimenez-Morera, 2021. "Recession, Local Fertility, and Urban Sustainability: Results of a Quasi-Experiment in Greece, 1991–2018," Sustainability, MDPI, vol. 13(3), pages 1-18, January.
    13. Chen, Wanxu & Chi, Guangqing, 2022. "Urbanization and ecosystem services: The multi-scale spatial spillover effects and spatial variations," Land Use Policy, Elsevier, vol. 114(C).
    14. Agnese Vitali & Arnstein Aassve & Trude Lappegård, 2015. "Diffusion of Childbearing Within Cohabitation," Demography, Springer;Population Association of America (PAA), vol. 52(2), pages 355-377, April.
    15. Gülhan, Sinan Tankut, 2022. "The Election Day that Lasted 84 Days: Mapping the Electoral Geography of the 2019 Istanbul Metropolitan Mayoral Race," SocArXiv ufvtz, Center for Open Science.
    16. Rubo Zhao & Yixiang Tian & Ao Lei & Francis Boadu & Ze Ren, 2019. "The Effect of Local Government Debt on Regional Economic Growth in China: A Nonlinear Relationship Approach," Sustainability, MDPI, vol. 11(11), pages 1-22, May.
    17. Kirsten Schwarz & Michail Fragkias & Christopher G Boone & Weiqi Zhou & Melissa McHale & J Morgan Grove & Jarlath O’Neil-Dunne & Joseph P McFadden & Geoffrey L Buckley & Dan Childers & Laura Ogden & S, 2015. "Trees Grow on Money: Urban Tree Canopy Cover and Environmental Justice," PLOS ONE, Public Library of Science, vol. 10(4), pages 1-17, April.
    18. Dustin T. Duncan & Márcia C. Castro & Jared Aldstadt & David R. Williams & John Whalen & Kellee White, 2012. "Space, race, and poverty: Spatial inequalities in walkable neighborhood amenities?," Demographic Research, Max Planck Institute for Demographic Research, Rostock, Germany, vol. 26(17), pages 409-448.
    19. Leah H. Schinasi & Helen V. S. Cole & Jana A. Hirsch & Ghassan B. Hamra & Pedro Gullon & Felicia Bayer & Steven J. Melly & Kathryn M. Neckerman & Jane E. Clougherty & Gina S. Lovasi, 2021. "Associations between Greenspace and Gentrification-Related Sociodemographic and Housing Cost Changes in Major Metropolitan Areas across the United States," IJERPH, MDPI, vol. 18(6), pages 1-24, March.
    20. Sha Chen & Guan Li & Zhongguo Xu & Yuefei Zhuo & Cifang Wu & Yanmei Ye, 2019. "Combined Impact of Socioeconomic Forces and Policy Implications: Spatial-Temporal Dynamics of the Ecosystem Services Value in Yangtze River Delta, China," Sustainability, MDPI, vol. 11(9), pages 1-22, May.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0321263. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.