IDEAS home Printed from https://ideas.repec.org/p/arz/wpaper/eres2023_304.html
   My bibliography  Save this paper

How big is the real estate property? Using zero-shot vs. rule-based classificationfor size extraction in real estate contracts

Author

Listed:
  • Julia Angerer
  • Wolfgang Brunauer

Abstract

Due to the massive amount of real-estate related text documents, the necessity to automatically process the data is evident. Especially purchase contracts contain valuable transaction and property description information, like usable area. In this research project, a natural language processing (NLP) approach using open-source transformer-based models was investigated. The potential of pre-trained language models for zero-shot classification is highlighted, especially in cases where no training data is available. This approach is particularly relevant for analyzing purchase contracts in the legal domain, where it can be challenging to manually extract the information or to build comprehensive regular expression rules manually. A data set consisting of classified contract sentence parts, each containing onesize and context information, was created manually for model comparison. The experiments conducted in this study demonstrate that pre-trained language models can accurately classify sentence parts containing a size, with varying levels of performance across different models. The results suggest that pre-trained language models can be effective tools for processing textual data in the real estate and legal domains and can provide valuable insights into the underlying structures and patterns in such data. Overall, this research contributes to the understanding of the capabilities of pre-trained language models in NLP and highlights their potential for practical applications in real-world settings, particularly in the legal domain where there is a large volume of textual data and annotated training data is not available.

Suggested Citation

  • Julia Angerer & Wolfgang Brunauer, 2023. "How big is the real estate property? Using zero-shot vs. rule-based classificationfor size extraction in real estate contracts," ERES eres2023_304, European Real Estate Society (ERES).
  • Handle: RePEc:arz:wpaper:eres2023_304
    as

    Download full text from publisher

    File URL: https://eres.architexturez.net/doc/oai-eres-id-eres2023-304
    Download Restriction: no
    ---><---

    More about this item

    Keywords

    contract documents; Information Extraction; Natural Language Processing; zero-shot classification;
    All these keywords.

    JEL classification:

    • R3 - Urban, Rural, Regional, Real Estate, and Transportation Economics - - Real Estate Markets, Spatial Production Analysis, and Firm Location

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arz:wpaper:eres2023_304. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Architexturez Imprints (email available below). General contact details of provider: https://edirc.repec.org/data/eressea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.