IDEAS home Printed from https://ideas.repec.org/a/sae/envirb/v53y2026i1p107-124.html

Natural language processing for planning policy identification: A benchmarking study using 113 Chinese cities between 2011 and 2019

Author

Listed:
  • Tianyuan Wang
  • Jerry Chen
  • Zhenyun Deng
  • Li Wan

Abstract

Identifying policy topics from lengthy text documents such as official reports is an important analytical task for policy evaluation purposes; yet it is a time-consuming process if done manually. Advanced Natural Language Processing (NLP) models such as Bidirectional Encoder Representations from Transformers (BERT) model and other emerging Large Language Models (LLMs) have demonstrated the capability to capture semantic meanings within sentences. However, BERT-based models require substantial expert-labelled training data for fine-tuning for optimal performance, whereas off-the-shelf LLMs provide greater flexibility but tend to overgeneralise. This paper aims to (1) demonstrate three NLP/LLMs methods (fine-tuned BERT, with 10,000 manually labelled sentences; standard GPT-4o; and GPT-4o with in-context learning, with five example sentences provided for each policy domain) for identifying planning policies from planning documents at sentence level and (2) assess the accuracy of NLP/LLMs in policy identification, using 2,000 manually identified results by planning experts as the benchmark. Our results demonstrate that policy identification can be performed effectively and accurately when in-domain labelled training data is available, achieving a high out-of-sample accuracy of 92.50%. However, the time and labor cost of manual labelling is significant. In cases where in-domain training data is lacking, LLMs can still achieve a notable accuracy of 72.50%, which improves to 87.75% when given a small number of examples and 89.00% with role-prompting that guides the model to act as an urban planning research expert. We applied this method to analyse policy priority across 1,026 official government annual reports for both the national level and across 113 prefecture-level cities in China between 2011 and 2019. We found significant divergence between national and local planning policies and cross-city heterogeneity in planning policies.

Suggested Citation

  • Tianyuan Wang & Jerry Chen & Zhenyun Deng & Li Wan, 2026. "Natural language processing for planning policy identification: A benchmarking study using 113 Chinese cities between 2011 and 2019," Environment and Planning B, , vol. 53(1), pages 107-124, January.
  • Handle: RePEc:sae:envirb:v:53:y:2026:i:1:p:107-124
    DOI: 10.1177/23998083251351743
    as

    Download full text from publisher

    File URL: https://journals.sagepub.com/doi/10.1177/23998083251351743
    Download Restriction: no

    File URL: https://libkey.io/10.1177/23998083251351743?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:sae:envirb:v:53:y:2026:i:1:p:107-124. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: SAGE Publications (email available below). General contact details of provider: .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.