IDEAS home Printed from https://ideas.repec.org/a/spr/advdac/v13y2019i4d10.1007_s11634-018-0344-z.html
   My bibliography  Save this article

Supervised learning via smoothed Polya trees

Author

Listed:
  • William Cipolli

    (Colgate University)

  • Timothy Hanson

    (University of South Carolina)

Abstract

We propose a generative classification model that extends Quadratic Discriminant Analysis (QDA) (Cox in J R Stat Soc Ser B (Methodol) 20:215–242, 1958) and Linear Discriminant Analysis (LDA) (Fisher in Ann Eugen 7:179–188, 1936; Rao in J R Stat Soc Ser B 10:159–203, 1948) to the Bayesian nonparametric setting, providing a competitor to MclustDA (Fraley and Raftery in Am Stat Assoc 97:611–631, 2002). This approach models the data distribution for each class using a multivariate Polya tree and realizes impressive results in simulations and real data analyses. The flexibility gained from further relaxing the distributional assumptions of QDA can greatly improve the ability to correctly classify new observations for models with severe deviations from parametric distributional assumptions, while still performing well when the assumptions hold. The proposed method is quite fast compared to other supervised classifiers and very simple to implement as there are no kernel tricks or initialization steps perhaps making it one of the more user-friendly approaches to supervised learning. This highlights a significant feature of the proposed methodology as suboptimal tuning can greatly hamper classification performance; e.g., SVMs fit with non-optimal kernels perform significantly worse.

Suggested Citation

  • William Cipolli & Timothy Hanson, 2019. "Supervised learning via smoothed Polya trees," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(4), pages 877-904, December.
  • Handle: RePEc:spr:advdac:v:13:y:2019:i:4:d:10.1007_s11634-018-0344-z
    DOI: 10.1007/s11634-018-0344-z
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11634-018-0344-z
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11634-018-0344-z?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Mukhopadhyay, Subhadeep & Ghosh, Anil K., 2011. "Bayesian multiscale smoothing in supervised and semi-supervised kernel discriminant analysis," Computational Statistics & Data Analysis, Elsevier, vol. 55(7), pages 2344-2353, July.
    2. Marco Marzio & Charles C. Taylor, 2005. "On boosting kernel density methods for multivariate data: density estimation and classification," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 14(2), pages 163-178, November.
    3. Hanson, Timothy E., 2006. "Inference for Mixtures of Finite Polya Tree Models," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1548-1565, December.
    4. Adriano Z. Zambom & Ronaldo Dias, 2013. "A Review of Kernel Density Estimation with Applications to Econometrics," International Econometric Review (IER), Econometric Research Association, vol. 5(1), pages 20-42, April.
    5. Chen, Yuhui & Hanson, Timothy E., 2014. "Bayesian nonparametric k-sample tests for censored and uncensored data," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 335-346.
    6. Bergé, Laurent & Bouveyron, Charles & Girard, Stéphane, 2012. "HDclassif: An R Package for Model-Based Clustering and Discriminant Analysis of High-Dimensional Data," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 46(i06).
    7. Cipolli III, William & Hanson, Timothy & McLain, Alexander C., 2016. "Bayesian nonparametric multiple testing," Computational Statistics & Data Analysis, Elsevier, vol. 101(C), pages 64-79.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Mahsa Samsami & Ralf Wagner, 2021. "Investment Decisions with Endogeneity: A Dirichlet Tree Analysis," JRFM, MDPI, vol. 14(7), pages 1-19, July.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ma, Zichen & Hanson, Timothy E., 2020. "Bayesian nonparametric test for independence between random vectors," Computational Statistics & Data Analysis, Elsevier, vol. 149(C).
    2. Adam Branscum & Timothy Hanson & Ian Gardner, 2008. "Bayesian non-parametric models for regional prevalence estimation," Journal of Applied Statistics, Taylor & Francis Journals, vol. 35(5), pages 567-582.
    3. Han, Qinkai & Wang, Tianyang & Chu, Fulei, 2022. "Nonparametric copula modeling of wind speed-wind shear for the assessment of height-dependent wind energy in China," Renewable and Sustainable Energy Reviews, Elsevier, vol. 161(C).
    4. Luping Zhao & Timothy E. Hanson, 2011. "Spatially Dependent Polya Tree Modeling for Survival Data," Biometrics, The International Biometric Society, vol. 67(2), pages 391-403, June.
    5. Luai Al-Labadi, 2021. "The two-sample problem via relative belief ratio," Computational Statistics, Springer, vol. 36(3), pages 1791-1808, September.
    6. Angela Schörgendorfer & Adam J. Branscum & Timothy E. Hanson, 2013. "A Bayesian Goodness of Fit Test and Semiparametric Generalization of Logistic Regression with Measurement Data," Biometrics, The International Biometric Society, vol. 69(2), pages 508-519, June.
    7. Han, Qinkai & Chu, Fulei, 2021. "Directional wind energy assessment of China based on nonparametric copula models," Renewable Energy, Elsevier, vol. 164(C), pages 1334-1349.
    8. Shinya Sugawara, 2017. "Firm‐Driven Management of Longevity Risk: Analysis of Lump‐Sum Forward Payments in Japanese Nursing Homes," Journal of Economics & Management Strategy, Wiley Blackwell, vol. 26(1), pages 169-204, February.
    9. Song Zhang & Peter Müller & Kim-Anh Do, 2010. "A Bayesian Semiparametric Survival Model with Longitudinal Markers," Biometrics, The International Biometric Society, vol. 66(2), pages 435-443, June.
    10. Miśkiewicz, Janusz, 2016. "Improving quality of sample entropy estimation for continuous distribution probability functions," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 450(C), pages 473-485.
    11. Mairech, Hanene & López-Bernal, Álvaro & Moriondo, Marco & Dibari, Camilla & Regni, Luca & Proietti, Primo & Villalobos, Francisco J. & Testi, Luca, 2020. "Is new olive farming sustainable? A spatial comparison of productive and environmental performances between traditional and new olive orchards with the model OliveCan," Agricultural Systems, Elsevier, vol. 181(C).
    12. Luz Adriana Pereira & Daniel Taylor‐Rodríguez & Luis Gutiérrez, 2020. "A Bayesian nonparametric testing procedure for paired samples," Biometrics, The International Biometric Society, vol. 76(4), pages 1133-1146, December.
    13. Zhuang, Haoxin & Diao, Liqun & Yi, Grace Y., 2023. "Polya tree Monte Carlo method," Computational Statistics & Data Analysis, Elsevier, vol. 180(C).
    14. Haiming Zhou & Timothy Hanson & Jiajia Zhang, 2017. "Generalized accelerated failure time spatial frailty model for arbitrarily censored data," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 23(3), pages 495-515, July.
    15. Meijuan Li & Cavan Reilly & Tim Hanson, 2010. "Association Tests for a Censored Quantitative Trait and Candidate Genes in Structured Populations with Multilevel Genetic Relatedness," Biometrics, The International Biometric Society, vol. 66(3), pages 925-933, September.
    16. Cipolli III, William & Hanson, Timothy & McLain, Alexander C., 2016. "Bayesian nonparametric multiple testing," Computational Statistics & Data Analysis, Elsevier, vol. 101(C), pages 64-79.
    17. Rafael Carvalho Ceregatti & Rafael Izbicki & Luis Ernesto Bueno Salasar, 2021. "WIKS: a general Bayesian nonparametric index for quantifying differences between two populations," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 30(1), pages 274-291, March.
    18. Feng, Long & Zhang, Xiaoxu & Liu, Binghui, 2020. "A high-dimensional spatial rank test for two-sample location problems," Computational Statistics & Data Analysis, Elsevier, vol. 144(C).
    19. Lei Zhang & Lun Xie & Qinkai Han & Zhiliang Wang & Chen Huang, 2020. "Probability Density Forecasting of Wind Speed Based on Quantile Regression and Kernel Density Estimation," Energies, MDPI, vol. 13(22), pages 1-24, November.
    20. Peng Bin, 2016. "Dynamic Development of Regional Disparity in Mainland China: An Experimental Study Based on a Multidimensional Index," Sustainability, MDPI, vol. 8(12), pages 1-28, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:advdac:v:13:y:2019:i:4:d:10.1007_s11634-018-0344-z. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.