IDEAS home Printed from https://ideas.repec.org/p/osf/osfxxx/vct9y.html
   My bibliography  Save this paper

A probability transducer and decision-theoretic augmentation for machine-learning classifiers

Author

Listed:
  • Dyrland, Kjetil
  • Lundervold, Alexander Selvikvåg

    (Western Norway University of Applied Sciences)

  • Porta Mana, PierGianLuca

    (HVL Western Norway University of Applied Sciences)

Abstract

In a classification task from a set of features, one would ideally like to have the probability of the class conditional on the features. Such probability is computationally almost impossible to find in many important cases. The primary idea of the present work is to calculate the probability of a class conditional not on the features, but on a trained classifying algorithm's output. Such probability is easily calculated and provides an output-to-probability ’transducer’ that can be applied to the algorithm's future outputs. In conjunction with problem-dependent utilities, the probabilities of the transducer allows one to make the optimal choice among the classes or among a set of more general decisions, by means of expected-utility maximization. The combined procedure is a computationally cheap yet powerful ‘augmentation’ of the original classifier. This idea is demonstrated in a simplified drug-discovery problem with a highly imbalanced dataset. The augmentation leads to improved results, sometimes close to theoretical maximum, for any set of problem-dependent utilities. The calculation of the transducer also provides, automatically: (i) a quantification of the uncertainty about the transducer itself; (ii) the expected utility of the augmented algorithm (including its uncertainty), which can be used for algorithm selection; (iii) the possibility of using the algorithm in a ‘generative mode’, useful if the training dataset is biased. It is argued that the optimality, flexibility, and uncertainty assessment provided by the transducer & augmentation are dearly needed for classification problems in fields such as medicine and drug discovery.

Suggested Citation

  • Dyrland, Kjetil & Lundervold, Alexander Selvikvåg & Porta Mana, PierGianLuca, 2022. "A probability transducer and decision-theoretic augmentation for machine-learning classifiers," OSF Preprints vct9y, Center for Open Science.
  • Handle: RePEc:osf:osfxxx:vct9y
    DOI: 10.31219/osf.io/vct9y
    as

    Download full text from publisher

    File URL: https://osf.io/download/62971cb606863102ff729e57/
    Download Restriction: no

    File URL: https://libkey.io/10.31219/osf.io/vct9y?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Porta Mana, PierGianLuca, 2019. "A relation between log-likelihood and cross-validation log-scores," OSF Preprints k8mj3, Center for Open Science.
    2. David B. Dunson & Natesh Pillai & Ju‐Hyun Park, 2007. "Bayesian density regression," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 69(2), pages 163-183, April.
    3. E Fong & C C Holmes, 2020. "On the marginal likelihood and cross-validation," Biometrika, Biometrika Trust, vol. 107(2), pages 489-496.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Luping Zhao & Timothy E. Hanson, 2011. "Spatially Dependent Polya Tree Modeling for Survival Data," Biometrics, The International Biometric Society, vol. 67(2), pages 391-403, June.
    2. Ryo Kato & Takahiro Hoshino, 2020. "Semiparametric Bayesian multiple imputation for regression models with missing mixed continuous–discrete covariates," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 72(3), pages 803-825, June.
    3. Fuentes-García, Ruth & Mena, Ramsés H. & Walker, Stephen G., 2009. "A nonparametric dependent process for Bayesian regression," Statistics & Probability Letters, Elsevier, vol. 79(8), pages 1112-1119, April.
    4. A Ford Ramsey, 2020. "Probability Distributions of Crop Yields: A Bayesian Spatial Quantile Regression Approach," American Journal of Agricultural Economics, John Wiley & Sons, vol. 102(1), pages 220-239, January.
    5. Antonio Canale & Bruno Scarpa, 2016. "Bayesian nonparametric location–scale–shape mixtures," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 25(1), pages 113-130, March.
    6. Griffin, J.E. & Steel, M.F.J., 2011. "Stick-breaking autoregressive processes," Journal of Econometrics, Elsevier, vol. 162(2), pages 383-396, June.
    7. Glen McGee & Ander Wilson & Thomas F. Webster & Brent A. Coull, 2023. "Bayesian multiple index models for environmental mixtures," Biometrics, The International Biometric Society, vol. 79(1), pages 462-474, March.
    8. Hübler, Olaf, 2017. "Health and Body Mass Index: No Simple Relationship," IZA Discussion Papers 10620, Institute of Labor Economics (IZA).
    9. Dennis Leung & Wenguang Sun, 2022. "ZAP: Z$$ Z $$‐value adaptive procedures for false discovery rate control with side information," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(5), pages 1886-1946, November.
    10. Igari, Ryosuke & Hoshino, Takahiro, 2018. "A Bayesian data combination approach for repeated durations under unobserved missing indicators: Application to interpurchase-timing in marketing," Computational Statistics & Data Analysis, Elsevier, vol. 126(C), pages 150-166.
    11. Tsionas, Mike G., 2023. "Combining data envelopment analysis and stochastic frontiers via a LASSO prior," European Journal of Operational Research, Elsevier, vol. 304(3), pages 1158-1166.
    12. Alexander Henzi & Johanna F. Ziegel & Tilmann Gneiting, 2021. "Isotonic distributional regression," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 83(5), pages 963-993, November.
    13. Xu Chen & Surya T. Tokdar, 2021. "Joint quantile regression for spatial data," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 83(4), pages 826-852, September.
    14. Sun, Peng & Kim, Inyoung & Lee, Ki-Ahm, 2018. "Dual-semiparametric regression using weighted Dirichlet process mixture," Computational Statistics & Data Analysis, Elsevier, vol. 117(C), pages 162-181.
    15. Tsionas, Mike & Parmeter, Christopher F. & Zelenyuk, Valentin, 2023. "Bayesian Artificial Neural Networks for frontier efficiency analysis," Journal of Econometrics, Elsevier, vol. 236(2).
    16. Huang, Yifan & Meng, Shengwang, 2020. "A Bayesian nonparametric model and its application in insurance loss prediction," Insurance: Mathematics and Economics, Elsevier, vol. 93(C), pages 84-94.
    17. Villani, Mattias & Kohn, Robert & Giordani, Paolo, 2007. "Nonparametric Regression Density Estimation Using Smoothly Varying Normal Mixtures," Working Paper Series 211, Sveriges Riksbank (Central Bank of Sweden).
    18. Chib, Siddhartha & Greenberg, Edward, 2010. "Additive cubic spline regression with Dirichlet process mixture errors," Journal of Econometrics, Elsevier, vol. 156(2), pages 322-336, June.
    19. Ryo Kato & Takahiro Hoshino, 2018. "Semiparametric Bayes Instrumental Variable Estimation with Many Weak Instruments," Discussion Paper Series DP2018-14, Research Institute for Economics & Business Administration, Kobe University.
    20. Zheng, Xiaoyu & Itoh, Hiroto & Kawaguchi, Kenji & Tamaki, Hitoshi & Maruyama, Yu, 2015. "Application of Bayesian nonparametric models to the uncertainty and sensitivity analysis of source term in a BWR severe accident," Reliability Engineering and System Safety, Elsevier, vol. 138(C), pages 253-262.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:osf:osfxxx:vct9y. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: OSF (email available below). General contact details of provider: https://osf.io/preprints/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.