IDEAS home Printed from https://ideas.repec.org/a/sae/envirb/v53y2026i1p107-124.html

Natural language processing for planning policy identification: A benchmarking study using 113 Chinese cities between 2011 and 2019

Author

Listed:
  • Tianyuan Wang
  • Jerry Chen
  • Zhenyun Deng
  • Li Wan

Abstract

Identifying policy topics from lengthy text documents such as official reports is an important analytical task for policy evaluation purposes; yet it is a time-consuming process if done manually. Advanced Natural Language Processing (NLP) models such as Bidirectional Encoder Representations from Transformers (BERT) model and other emerging Large Language Models (LLMs) have demonstrated the capability to capture semantic meanings within sentences. However, BERT-based models require substantial expert-labelled training data for fine-tuning for optimal performance, whereas off-the-shelf LLMs provide greater flexibility but tend to overgeneralise. This paper aims to (1) demonstrate three NLP/LLMs methods (fine-tuned BERT, with 10,000 manually labelled sentences; standard GPT-4o; and GPT-4o with in-context learning, with five example sentences provided for each policy domain) for identifying planning policies from planning documents at sentence level and (2) assess the accuracy of NLP/LLMs in policy identification, using 2,000 manually identified results by planning experts as the benchmark. Our results demonstrate that policy identification can be performed effectively and accurately when in-domain labelled training data is available, achieving a high out-of-sample accuracy of 92.50%. However, the time and labor cost of manual labelling is significant. In cases where in-domain training data is lacking, LLMs can still achieve a notable accuracy of 72.50%, which improves to 87.75% when given a small number of examples and 89.00% with role-prompting that guides the model to act as an urban planning research expert. We applied this method to analyse policy priority across 1,026 official government annual reports for both the national level and across 113 prefecture-level cities in China between 2011 and 2019. We found significant divergence between national and local planning policies and cross-city heterogeneity in planning policies.

Suggested Citation

  • Tianyuan Wang & Jerry Chen & Zhenyun Deng & Li Wan, 2026. "Natural language processing for planning policy identification: A benchmarking study using 113 Chinese cities between 2011 and 2019," Environment and Planning B, , vol. 53(1), pages 107-124, January.
  • Handle: RePEc:sae:envirb:v:53:y:2026:i:1:p:107-124
    DOI: 10.1177/23998083251351743
    as

    Download full text from publisher

    File URL: https://journals.sagepub.com/doi/10.1177/23998083251351743
    Download Restriction: no

    File URL: https://libkey.io/10.1177/23998083251351743?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Angelico, Cristina & Marcucci, Juri & Miccoli, Marcello & Quarta, Filippo, 2022. "Can we measure inflation expectations using Twitter?," Journal of Econometrics, Elsevier, vol. 228(2), pages 259-277.
    2. Li Fang & Reid Ewing, 2020. "Tracking Our Footsteps," Journal of the American Planning Association, Taylor & Francis Journals, vol. 86(4), pages 470-480, October.
    3. Simone Cenci & Matteo Burato & Marek Rei & Maurizio Zollo, 2023. "The alignment of companies' sustainability behavior and emissions with global climate targets," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Runar J. Solberg, 2023. "Incorporating the choice of centralized vs. decentralized resource allocation into sustainability frameworks," Journal of Organization Design, Springer;Organizational Design Community, vol. 12(4), pages 239-244, December.
    2. Xinyu Li & Zihan Tang, 2022. "Sentiment Analysis on Inflation after Covid-19," Papers 2209.14737, arXiv.org, revised Dec 2022.
    3. Donato Masciandaro & Davide Romelli & Gaia Rubera, 2021. "Monetary policy and financial markets: evidence from Twitter traffic," BAFFI CAREFIN Working Papers 21160, BAFFI CAREFIN, Centre for Applied Research on International Markets Banking Finance and Regulation, Universita' Bocconi, Milano, Italy.
    4. Anastasia Matevosova, 2025. "Modelling Trust in the Central Bank Using Sentiment Analysis," Russian Journal of Money and Finance, Bank of Russia, vol. 84(1), pages 3-25, March.
    5. Travis Adams & Andrea Ajello & Diego Silva & Francisco Vazquez-Grande, 2023. "More than Words: Twitter Chatter and Financial Market Sentiment," Papers 2305.16164, arXiv.org.
    6. Swapnil Virendra Chalwadi & Preeti Tushar Joshi & Nitin Mohanlal Sharma & Chaitanya Gite & Sangita Salve, 2023. "Gender Differences in Inflation Expectations: Recent Evidence from India," Administrative Sciences, MDPI, vol. 13(2), pages 1-14, February.
    7. Jouchi Nakajima & Hiroaki Yamagata & Tatsushi Okuda & Shinnosuke Katsuki & Takeshi Shinohara, 2021. "Extracting Firms' Short-Term Inflation Expectations from the Economy Watchers Survey Using Text Analysis," Bank of Japan Working Paper Series 21-E-12, Bank of Japan.
    8. Massimiliano Marcellino & Dalibor Stevanovic, 2022. "The demand and supply of information about inflation," Working Papers 22-06, Chair in macroeconomics and forecasting, University of Quebec in Montreal's School of Management, revised Nov 2022.
    9. Maria Saveria Mavillonio, 2024. "Natural Language Processing Techniques for Long Financial Document," Discussion Papers 2024/317, Dipartimento di Economia e Management (DEM), University of Pisa, Pisa, Italy.
    10. Cafferata, Alessia & Cerruti, Gianluca & Mazzone, Giulio, 2023. "Taxation, health system endowment and institutional quality: ‘Social media’ perceptions across Europe," Journal of Economic Behavior & Organization, Elsevier, vol. 215(C), pages 224-243.
    11. Aprigliano, Valentina & Emiliozzi, Simone & Guaitoli, Gabriele & Luciani, Andrea & Marcucci, Juri & Monteforte, Libero, 2023. "The power of text-based indicators in forecasting Italian economic activity," International Journal of Forecasting, Elsevier, vol. 39(2), pages 791-808.
    12. White, Katherine & Cakanlar, Aylin & Sethi, Shakti & Trudel, Remi, 2025. "The past, present, and future of sustainability marketing: How did we get here and where might we go?," Journal of Business Research, Elsevier, vol. 187(C).
    13. Cafferata, Alessia & Cerruti, Gianluca & Mazzone, Giulio, 2022. "Taxation, health system endowment and quality of institutions: a "social" perception across Europe," MPRA Paper 112118, University Library of Munich, Germany.
    14. Tetiana Yukhymenko, 2021. "Role of the Media in the Inflation Expectation Formation Process," IHEID Working Papers 13-2021, Economics Section, The Graduate Institute of International Studies.
    15. Simone Cenci & Enrico Biffis, 2025. "Lack of harmonisation of greenhouse gases reporting standards and the methane emissions gap," Nature Communications, Nature, vol. 16(1), pages 1-9, December.
    16. Rosalind L. Bennett & Manju Puri & Paul E. Soto, 2024. "Inside the Boardroom: Evidence from the Board Structure and Meeting Minutes of Community Banks," Finance and Economics Discussion Series 2024-085, Board of Governors of the Federal Reserve System (U.S.).
    17. Yang, Kyongchyol & Jung, Jaemin, 2024. "Analysis of the expectation dynamics and discourse on the metaverse using X(Twitter) data," 24th ITS Biennial Conference, Seoul 2024. New bottles for new wine: digital transformation demands new policies and strategies 302496, International Telecommunications Society (ITS).
    18. Benjamin Born & Nora Lamersdorf & Jana-Lynn Schuster & Sascha Steffen, 2025. "From Tweets to Transactions: High-Frequency Inflation Expectations, Consumption, and Stock Returns," CESifo Working Paper Series 12361, CESifo.
    19. Mirko Ðukic, 2022. "Assessment of inflationary pressures using newspaper text analysis," Working Papers Bulletin 12, National Bank of Serbia.
    20. J. Daniel Aromí & Martín Llada, 2024. "Are professional forecasters inattentive to public discussions? The case of inflation in Argentina," Working Papers 300, Red Nacional de Investigadores en Economía (RedNIE).

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:sae:envirb:v:53:y:2026:i:1:p:107-124. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: SAGE Publications (email available below). General contact details of provider: .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.