IDEAS home Printed from https://ideas.repec.org/a/gam/jstats/v9y2026i3p46-d1926808.html

A Practical Framework for Incorporating Complex Survey Design in Bayesian Kernel Machine Regression

Author

Listed:
  • Doreen Jehu-Appiah

    (Department of Built Environment, North Carolina A&T State University, Greensboro, NC 27411, USA
    Department of Computational Data Science and Engineering, North Carolina A&T State University, Greensboro, NC 27411, USA
    Environmental Health and Disease Laboratory, North Carolina A&T State University, Greensboro, NC 27411, USA)

  • Emmanuel Obeng-Gyasi

    (Department of Built Environment, North Carolina A&T State University, Greensboro, NC 27411, USA
    Environmental Health and Disease Laboratory, North Carolina A&T State University, Greensboro, NC 27411, USA)

Abstract

Large-scale population datasets are rarely generated via simple random sampling; instead, they reflect complex designs involving stratification, clustering, and unequal inclusion probabilities. While survey weights are provided to recover population-representative estimates, standard Bayesian Kernel Machine Regression (BKMR), a flexible nonlinear model for high-dimensional exposure mixtures, does not explicitly accommodate these design features. We present a simulation-based framework that evaluates performance under complex sampling by comparing two analytic strategies applied to identical survey-like data: (i) a naïve, unweighted BKMR implementation and (ii) a design-aware workflow that can be executed using existing software without modifying the BKMR algorithm itself. Finite populations are generated with correlated exposures and a known nonlinear data-generating function. Stratified two-stage cluster samples are then drawn under both non-informative and exposure-dependent (informative) selection mechanisms, with controlled intra-class correlation (ICC). The design-aware approach incorporates sampling weights through resampling of the dataset while preserving primary sampling unit structure, followed by standard BKMR fitting. Methods are evaluated using bias, interval width, and empirical 95% coverage relative to the known truth. Across simulation scenarios, naïve BKMR exhibits bias and systematic under-coverage under informative sampling, with empirical 95% coverage often dropping to approximately 0–40%, whereas the design-aware workflow improves coverage to approximately 40–60%, moving results closer to nominal levels. These findings provide a practical, implementation-ready strategy for integrating survey design considerations into BKMR analyses and delineate conditions under which accounting for sampling design affects inference. While the proposed approach improves inferential performance relative to naïve BKMR, it does not fully achieve nominal coverage, indicating that further methodological development is required for fully valid uncertainty quantification under complex survey designs.

Suggested Citation

  • Doreen Jehu-Appiah & Emmanuel Obeng-Gyasi, 2026. "A Practical Framework for Incorporating Complex Survey Design in Bayesian Kernel Machine Regression," Stats, MDPI, vol. 9(3), pages 1-28, April.
  • Handle: RePEc:gam:jstats:v:9:y:2026:i:3:p:46-:d:1926808
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2571-905X/9/3/46/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2571-905X/9/3/46/
    Download Restriction: no
    ---><---

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jstats:v:9:y:2026:i:3:p:46-:d:1926808. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.