IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v190y2024ics016794732300169x.html
   My bibliography  Save this article

GP-BART: A novel Bayesian additive regression trees approach using Gaussian processes

Author

Listed:
  • Maia, Mateus
  • Murphy, Keefe
  • Parnell, Andrew C.

Abstract

The Bayesian additive regression trees (BART) model is an ensemble method extensively and successfully used in regression tasks due to its consistently strong predictive performance and its ability to quantify uncertainty. BART combines “weak” tree models through a set of shrinkage priors, whereby each tree explains a small portion of the variability in the data. However, the lack of smoothness and the absence of an explicit covariance structure over the observations in standard BART can yield poor performance in cases where such assumptions would be necessary. The Gaussian processes Bayesian additive regression trees (GP-BART) model is an extension of BART which addresses this limitation by assuming Gaussian process (GP) priors for the predictions of each terminal node among all trees. The model's effectiveness is demonstrated through applications to simulated and real-world data, surpassing the performance of traditional modelling approaches in various scenarios.

Suggested Citation

  • Maia, Mateus & Murphy, Keefe & Parnell, Andrew C., 2024. "GP-BART: A novel Bayesian additive regression trees approach using Gaussian processes," Computational Statistics & Data Analysis, Elsevier, vol. 190(C).
  • Handle: RePEc:eee:csdana:v:190:y:2024:i:c:s016794732300169x
    DOI: 10.1016/j.csda.2023.107858
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S016794732300169X
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2023.107858?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Jingyu He & P. Richard Hahn, 2023. "Stochastic Tree Ensembles for Regularized Nonlinear Regression," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 118(541), pages 551-570, January.
    2. Wright, Marvin N. & Ziegler, Andreas, 2017. "ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 77(i01).
    3. Blaser, Rico & Fryzlewicz, Piotr, 2016. "Random rotation ensembles," LSE Research Online Documents on Economics 62182, London School of Economics and Political Science, LSE Library.
    4. Sudipto Banerjee & Alan E. Gelfand & Andrew O. Finley & Huiyan Sang, 2008. "Gaussian predictive process models for large spatial data sets," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(4), pages 825-848, September.
    5. Roger S. Bivand & David W. S. Wong, 2018. "Comparing implementations of global and local indicators of spatial association," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 27(3), pages 716-748, September.
    6. Gilley, Otis W. & Pace, R. Kelley, 1996. "On the Harrison and Rubinfeld Data," Journal of Environmental Economics and Management, Elsevier, vol. 31(3), pages 403-405, November.
    7. Gramacy, Robert B. & Taddy, Matthew Alan, 2010. "Categorical Inputs, Sensitivity Analysis, Optimization and Importance Tempering with tgp Version 2, an R Package for Treed Gaussian Process Models," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 33(i06).
    8. Harrison, David Jr. & Rubinfeld, Daniel L., 1978. "Hedonic housing prices and the demand for clean air," Journal of Environmental Economics and Management, Elsevier, vol. 5(1), pages 81-102, March.
    9. Saeid Janizadeh & Mehdi Vafakhah & Zoran Kapelan & Naghmeh Mobarghaee Dinan, 2021. "Novel Bayesian Additive Regression Tree Methodology for Flood Susceptibility Modeling," Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA), Springer;European Water Resources Association (EWRA), vol. 35(13), pages 4621-4646, October.
    10. Gneiting, Tilmann & Raftery, Adrian E., 2007. "Strictly Proper Scoring Rules, Prediction, and Estimation," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 359-378, March.
    11. Gramacy, Robert B & Lee, Herbert K. H, 2008. "Bayesian Treed Gaussian Process Models With an Application to Computer Modeling," Journal of the American Statistical Association, American Statistical Association, vol. 103(483), pages 1119-1130.
    12. Lindgren, Finn & Rue, Håvard, 2015. "Bayesian Spatial Modelling with R-INLA," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 63(i19).
    13. Kapelner, Adam & Bleich, Justin, 2016. "bartMachine: Machine Learning with Bayesian Additive Regression Trees," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 70(i04).
    14. Antonio R. Linero & Yun Yang, 2018. "Bayesian regression tree ensembles that adapt to smoothness and sparsity," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 80(5), pages 1087-1110, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Isabelle Grenier & Bruno Sansó & Jessica L. Matthews, 2024. "Multivariate nearest‐neighbors Gaussian processes with random covariance matrices," Environmetrics, John Wiley & Sons, Ltd., vol. 35(3), May.
    2. Chen, Yewen & Chang, Xiaohui & Luo, Fangzhi & Huang, Hui, 2023. "Additive dynamic models for correcting numerical model outputs," Computational Statistics & Data Analysis, Elsevier, vol. 187(C).
    3. Cole, D. Austin & Gramacy, Robert B. & Ludkovski, Mike, 2022. "Large-scale local surrogate modeling of stochastic simulation experiments," Computational Statistics & Data Analysis, Elsevier, vol. 174(C).
    4. Davis, Casey B. & Hans, Christopher M. & Santner, Thomas J., 2021. "Prediction of non-stationary response functions using a Bayesian composite Gaussian process," Computational Statistics & Data Analysis, Elsevier, vol. 154(C).
    5. Peter A. Gao & Hannah M. Director & Cecilia M. Bitz & Adrian E. Raftery, 2022. "Probabilistic Forecasts of Arctic Sea Ice Thickness," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 27(2), pages 280-302, June.
    6. Waley W. J. Liang & Herbert K. H. Lee, 2019. "Bayesian nonstationary Gaussian process models via treed process convolutions," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(3), pages 797-818, September.
    7. Ranadeep Daw & Christopher K. Wikle, 2023. "REDS: Random ensemble deep spatial prediction," Environmetrics, John Wiley & Sons, Ltd., vol. 34(1), February.
    8. Cheng, Tsung-Chi, 2012. "On simultaneously identifying outliers and heteroscedasticity without specific form," Computational Statistics & Data Analysis, Elsevier, vol. 56(7), pages 2258-2272.
    9. Ankur Moitra & Dhruv Rohatgi, 2022. "Provably Auditing Ordinary Least Squares in Low Dimensions," Papers 2205.14284, arXiv.org, revised Jun 2022.
    10. Bodhisattva Sen & Mary Meyer, 2017. "Testing against a linear regression model using ideas from shape-restricted estimation," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 79(2), pages 423-448, March.
    11. Takafumi Kato, 2020. "Likelihood-based strategies for estimating unknown parameters and predicting missing data in the simultaneous autoregressive model," Journal of Geographical Systems, Springer, vol. 22(1), pages 143-176, January.
    12. James P. LeSage & R. Kelley Pace, 2014. "The Biggest Myth in Spatial Econometrics," Econometrics, MDPI, vol. 2(4), pages 1-33, December.
    13. Philippe Goulet Coulombe & Mikael Frenette & Karin Klieber, 2023. "From Reactive to Proactive Volatility Modeling with Hemisphere Neural Networks," Working Papers 23-04, Chair in macroeconomics and forecasting, University of Quebec in Montreal's School of Management, revised Nov 2023.
    14. Zhang, Shen & Liu, Xin & Tang, Jinjun & Cheng, Shaowu & Qi, Yong & Wang, Yinhai, 2018. "Spatio-temporal modeling of destination choice behavior through the Bayesian hierarchical approach," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 512(C), pages 537-551.
    15. Paige, John & Fuglstad, Geir-Arne & Riebler, Andrea & Wakefield, Jon, 2022. "Bayesian multiresolution modeling of georeferenced data: An extension of ‘LatticeKrig’," Computational Statistics & Data Analysis, Elsevier, vol. 173(C).
    16. C. Forlani & S. Bhatt & M. Cameletti & E. Krainski & M. Blangiardo, 2020. "A joint Bayesian space–time model to integrate spatially misaligned air pollution data in R‐INLA," Environmetrics, John Wiley & Sons, Ltd., vol. 31(8), December.
    17. Andrew Hoegh & Marco A. R. Ferreira & Scotland Leman, 2016. "Spatiotemporal model fusion: multiscale modelling of civil unrest," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 65(4), pages 529-545, August.
    18. Marco H. Benedetti & Veronica J. Berrocal & Naveen N. Narisetty, 2022. "Identifying regions of inhomogeneities in spatial processes via an M‐RA and mixture priors," Biometrics, The International Biometric Society, vol. 78(2), pages 798-811, June.
    19. Kelly R. Moran & Matthew W. Wheeler, 2022. "Fast increased fidelity samplers for approximate Bayesian Gaussian process regression," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(4), pages 1198-1228, September.
    20. Simlai, Prodosh, 2014. "Estimation of variance of housing prices using spatial conditional heteroskedasticity (SARCH) model with an application to Boston housing price data," The Quarterly Review of Economics and Finance, Elsevier, vol. 54(1), pages 17-30.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:190:y:2024:i:c:s016794732300169x. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.