Author
Listed:
- Han, Ce
- Chen, Yongbao
- Wang, Huilong
- Zhan, Sicheng
- Chen, Zhe
Abstract
Building energy consumption accounts for a significant proportion of global energy use, making smart control and efficient management critical for sustainability. Machine learning (ML) models have shown promise in building energy control systems, but their performance is inherently tied to data quality characteristics, a factor often overlooked in traditional ML modelling. This study proposes a novel data quality assessment (DQA) framework to guide ML modelling for building energy applications, addressing the gap between data characteristics and ML algorithm suitability. The framework evaluates data quality (DQ) across five dimensions: completeness, comprehensiveness, range, consistency, and information entropy. Using 45 office buildings from the building data genome project 2 (BDG2) dataset, the framework was comprehensively investigated and validated using five typical ML models in the building sector including similar day model, LightGBM, long short-term memory (LSTM), transformer, and ensemble model, covering models from simple statistical methods to deeper ML models. The results demonstrate that statistical models such as the similar day model are sufficient for high-DQ scenarios, whereas low-DQ scenarios require deep ML models such as LSTM and transformer. DQ-based model selection approach reduces CVRMSE errors by 38%–40% on average. The proposed framework guides building owners and energy managers to select suitable ML models, thereby enhancing smart control, improving energy efficiency, and reducing carbon emissions. It also paves the way for standardized DQA in future research, effectively bridging the gap between data science and practical ML application in building energy systems.
Suggested Citation
Han, Ce & Chen, Yongbao & Wang, Huilong & Zhan, Sicheng & Chen, Zhe, 2026.
"Development and validation of a data quality assessment framework for machine learning-based building load prediction,"
Applied Energy, Elsevier, vol. 415(C).
Handle:
RePEc:eee:appene:v:415:y:2026:i:c:s0306261926005775
DOI: 10.1016/j.apenergy.2026.127925
Download full text from publisher
As the access to this document is restricted, you may want to
for a different version of it.
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:appene:v:415:y:2026:i:c:s0306261926005775. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/wps/find/journaldescription.cws_home/405891/description#description .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.