IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0152700.html
   My bibliography  Save this article

A Bayesian Modelling Approach with Balancing Informative Prior for Analysing Imbalanced Data

Author

Listed:
  • Kerenaftali Klein
  • Stefanie Hennig
  • Sanjoy Ketan Paul

Abstract

When a dataset is imbalanced, the prediction of the scarcely-sampled subpopulation can be over-influenced by the population contributing to the majority of the data. The aim of this study was to develop a Bayesian modelling approach with balancing informative prior so that the influence of imbalance to the overall prediction could be minimised. The new approach was developed in order to weigh the data in favour of the smaller subset(s). The method was assessed in terms of bias and precision in predicting model parameter estimates of simulated datasets. Moreover, the method was evaluated in predicting optimal dose levels of tobramycin for various age groups in a motivating example. The bias estimates using the balancing informative prior approach were smaller than those generated using the conventional approach which was without the consideration for the imbalance in the datasets. The precision estimates were also superior. The method was further evaluated in a motivating example of optimal dosage prediction of tobramycin. The resulting predictions also agreed well with what had been reported in the literature. The proposed Bayesian balancing informative prior approach has shown a real potential to adequately weigh the data in favour of smaller subset(s) of data to generate robust prediction models.

Suggested Citation

  • Kerenaftali Klein & Stefanie Hennig & Sanjoy Ketan Paul, 2016. "A Bayesian Modelling Approach with Balancing Informative Prior for Analysing Imbalanced Data," PLOS ONE, Public Library of Science, vol. 11(4), pages 1-12, April.
  • Handle: RePEc:plo:pone00:0152700
    DOI: 10.1371/journal.pone.0152700
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0152700
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0152700&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0152700?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. John Geweke, 1991. "Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments," Staff Report 148, Federal Reserve Bank of Minneapolis.
    2. Philip Heidelberger & Peter D. Welch, 1983. "Simulation Run Length Control in the Presence of an Initial Transient," Operations Research, INFORMS, vol. 31(6), pages 1109-1144, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. John K. Kruschke, 2021. "Bayesian Analysis Reporting Guidelines," Nature Human Behaviour, Nature, vol. 5(10), pages 1282-1291, October.
    2. Chen, Kefei & O'Leary, Rebecca A. & Evans, Fiona H., 2019. "A simple and parsimonious generalised additive model for predicting wheat yield in a decision support tool," Agricultural Systems, Elsevier, vol. 173(C), pages 140-150.
    3. Paul Hewson & Keming Yu, 2008. "Quantile regression for binary performance indicators," Applied Stochastic Models in Business and Industry, John Wiley & Sons, vol. 24(5), pages 401-418, September.
    4. Hancock, Joana & Vieira, Sara & Lima, Hipólito & Schmitt, Vanessa & Pereira, Jaconias & Rebelo, Rui & Girondot, Marc, 2019. "Overcoming field monitoring restraints in estimating marine turtle internesting period by modelling individual nesting behaviour using capture-mark-recapture data," Ecological Modelling, Elsevier, vol. 402(C), pages 76-84.
    5. Buddhavarapu, Prasad & Bansal, Prateek & Prozzi, Jorge A., 2021. "A new spatial count data model with time-varying parameters," Transportation Research Part B: Methodological, Elsevier, vol. 150(C), pages 566-586.
    6. Jesús Fernández-Villaverde & Juan F. Rubio-Ramirez, 2001. "Comparing dynamic equilibrium economies to data," FRB Atlanta Working Paper 2001-23, Federal Reserve Bank of Atlanta.
    7. Lada, Emily K. & Wilson, James R., 2006. "A wavelet-based spectral procedure for steady-state simulation analysis," European Journal of Operational Research, Elsevier, vol. 174(3), pages 1769-1801, November.
    8. Enver Yücesan, 1993. "Randomization tests for initialization bias in simulation output," Naval Research Logistics (NRL), John Wiley & Sons, vol. 40(5), pages 643-663, August.
    9. Riccardo (Jack) Lucchetti & Luca Pedini, 2020. "ParMA: Parallelised Bayesian Model Averaging for Generalised Linear Models," Working Papers 2020:28, Department of Economics, University of Venice "Ca' Foscari".
    10. Atahan Afsar; José Elías Gallegos; Richard Jaimes; Edgar Silgado Gómez & José Elías Gallegos & Richard Jaimes & Edgar Silgado Gómez, 2020. "Reconciling Empirics and Theory: The Behavioral Hybrid New Keynesian Model," Vniversitas Económica 18560, Universidad Javeriana - Bogotá.
    11. Bai, Yizhou & Xue, Cheng, 2021. "An empirical study on the regulated Chinese agricultural commodity futures market based on skew Ornstein-Uhlenbeck model," Research in International Business and Finance, Elsevier, vol. 57(C).
    12. Burda Martin & Bélisle Louis, 2019. "Copula multivariate GARCH model with constrained Hamiltonian Monte Carlo," Dependence Modeling, De Gruyter, vol. 7(1), pages 133-149, January.
    13. Yu, Jun, 2012. "A semiparametric stochastic volatility model," Journal of Econometrics, Elsevier, vol. 167(2), pages 473-482.
    14. Goldman Elena & Tsurumi Hiroki, 2005. "Bayesian Analysis of a Doubly Truncated ARMA-GARCH Model," Studies in Nonlinear Dynamics & Econometrics, De Gruyter, vol. 9(2), pages 1-38, June.
    15. Andr'es Ram'irez-Hassan & Alejandro L'opez-Vera, 2021. "Semi-parametric estimation of the EASI model: Welfare implications of taxes identifying clusters due to unobserved preference heterogeneity," Papers 2109.07646, arXiv.org.
    16. Aßmann, Christian & Boysen-Hogrefe, Jens & Pape, Markus, 2012. "The directional identification problem in Bayesian factor analysis: An ex-post approach," Kiel Working Papers 1799, Kiel Institute for the World Economy (IfW Kiel).
    17. Michael T. Owyang, 2002. "Modeling Volcker as a non-absorbing state: agnostic identification of a Markov-switching VAR," Working Papers 2002-018, Federal Reserve Bank of St. Louis.
    18. Amoroso, S., 2013. "Heterogeneity of innovative, collaborative, and productive firm-level processes," Other publications TiSEM f5784a49-7053-401d-855d-1, Tilburg University, School of Economics and Management.
    19. Feng Dai & Baumgartner Richard & Svetnik Vladimir, 2018. "A Bayesian Framework for Estimating the Concordance Correlation Coefficient Using Skew-elliptical Distributions," The International Journal of Biostatistics, De Gruyter, vol. 14(1), pages 1-8, May.
    20. Keane, Michael & Stavrunova, Olena, 2016. "Adverse selection, moral hazard and the demand for Medigap insurance," Journal of Econometrics, Elsevier, vol. 190(1), pages 62-78.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0152700. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.