IDEAS home Printed from https://ideas.repec.org/a/spr/jclass/v40y2023i1d10.1007_s00357-022-09425-9.html
   My bibliography  Save this article

A Semi-parametric Density Estimation with Application in Clustering

Author

Listed:
  • Mahdi Salehi

    (University of Neyshabur
    University of Pretoria)

  • Andriette Bekker

    (University of Pretoria)

  • Mohammad Arashi

    (Ferdowsi University of Mashhad)

Abstract

The idea behind density-based clustering is to associate groups to the connected components of the level sets of the density of the data to be estimated by a nonparametric method. This approach claims some advantages over both distance- and model-based clustering. Some researchers developed this technique by proposing a graph theory–based method for identifying local modes of the underlying density being estimated by the well-known kernel density estimation (KDE) with normal and t kernels. The present work proposes a semi-parametric KDE with a more flexible family of kernels including skew-normal (SN) and skew-t (ST). We show that the proposed estimator not only reduces boundary bias but it is also closer to the actual density compared to that of the usual estimator employing the Gaussian kernel. Finding optimal bandwidth for one-dimensional and multidimensional cases under the mentioned asymmetric kernels is another main result of this paper where we shrink the bandwidth more than the one obtained under the normal assumption. Finally, through a comprehensive numerical study, we will illustrate the application of the proposed semi-parametric KDE on the density-based clustering using some simulated and real data sets.

Suggested Citation

  • Mahdi Salehi & Andriette Bekker & Mohammad Arashi, 2023. "A Semi-parametric Density Estimation with Application in Clustering," Journal of Classification, Springer;The Classification Society, vol. 40(1), pages 52-78, April.
  • Handle: RePEc:spr:jclass:v:40:y:2023:i:1:d:10.1007_s00357-022-09425-9
    DOI: 10.1007/s00357-022-09425-9
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00357-022-09425-9
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00357-022-09425-9?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Salvatore D. Tomarchio & Antonio Punzo, 2019. "Modelling the loss given default distribution via a family of zero‐and‐one inflated mixture models," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 182(4), pages 1247-1266, October.
    2. Azzalini, Adelchi & Menardi, Giovanna, 2014. "Clustering via Nonparametric Density Estimation: The R Package pdfCluster," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 57(i11).
    3. Salvatore Ingrassia & Antonio Punzo, 2020. "Cluster Validation for Mixtures of Regressions via the Total Sum of Squares Decomposition," Journal of Classification, Springer;The Classification Society, vol. 37(2), pages 526-547, July.
    4. Marcelo Fernandes & Paulo Monteiro, 2005. "Central limit theorem for asymmetric kernel functionals," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 57(3), pages 425-442, September.
    5. Song Chen, 2000. "Probability Density Function Estimation Using Gamma Kernels," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 52(3), pages 471-480, September.
    6. Nicola Loperfido, 2019. "Finite mixtures, projection pursuit and tensor rank: a triangulation," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(1), pages 145-173, March.
    7. R. N. Rattihalli & S. B. Patil, 2021. "Data Dependent Asymmetric Kernels for Estimating the Density Function," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 83(1), pages 155-186, February.
    8. Mahdi Salehi & Mahdi Doostparast, 2015. "Expressions for moments of order statistics and records from the skew-normal distribution in terms of multivariate normal orthant probabilities," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 24(4), pages 547-568, November.
    9. Mazza, Angelo & Punzo, Antonio, 2014. "DBKGrad: An R Package for Mortality Rates Graduation by Discrete Beta Kernel Techniques," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 57(c02).
    10. Abadir, Karim M. & Lawford, Steve, 2004. "Optimal asymmetric kernels," Economics Letters, Elsevier, vol. 83(1), pages 61-68, April.
    11. Bouezmarni, Taoufik & Scaillet, Olivier, 2005. "Consistency Of Asymmetric Kernel Density Estimators And Smoothed Histograms With Application To Income Data," Econometric Theory, Cambridge University Press, vol. 21(2), pages 390-412, April.
    12. Hubert, M. & Vandervieren, E., 2008. "An adjusted boxplot for skewed distributions," Computational Statistics & Data Analysis, Elsevier, vol. 52(12), pages 5186-5201, August.
    13. Chen, Song Xi, 1999. "Beta kernel estimators for density functions," Computational Statistics & Data Analysis, Elsevier, vol. 31(2), pages 131-145, August.
    14. Lawrence Hubert & Phipps Arabie, 1985. "Comparing partitions," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 193-218, December.
    15. Mahdi Salehi & Adelchi Azzalini, 2018. "On application of the univariate Kotz distribution and some of its extensions," METRON, Springer;Sapienza Università di Roma, vol. 76(2), pages 177-201, August.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Hagmann, M. & Scaillet, O., 2007. "Local multiplicative bias correction for asymmetric kernel density estimators," Journal of Econometrics, Elsevier, vol. 141(1), pages 213-249, November.
    2. Bouezmarni, Taoufik & Rombouts, Jeroen V.K., 2010. "Nonparametric density estimation for positive time series," Computational Statistics & Data Analysis, Elsevier, vol. 54(2), pages 245-261, February.
    3. Marchant, Carolina & Bertin, Karine & Leiva, Víctor & Saulo, Helton, 2013. "Generalized Birnbaum–Saunders kernel density estimators and an analysis of financial data," Computational Statistics & Data Analysis, Elsevier, vol. 63(C), pages 1-15.
    4. Ouimet, Frédéric & Tolosana-Delgado, Raimon, 2022. "Asymptotic properties of Dirichlet kernel density estimators," Journal of Multivariate Analysis, Elsevier, vol. 187(C).
    5. Fernandes, Marcelo & Grammig, Joachim, 2005. "Nonparametric specification tests for conditional duration models," Journal of Econometrics, Elsevier, vol. 127(1), pages 35-68, July.
    6. Charpentier, Arthur & Flachaire, Emmanuel, 2015. "Log-Transform Kernel Density Estimation Of Income Distribution," L'Actualité Economique, Société Canadienne de Science Economique, vol. 91(1-2), pages 141-159, Mars-Juin.
    7. Mohammadi, Faezeh & Izadi, Muhyiddin & Lai, Chin-Diew, 2016. "On testing whether burn-in is required under the long-run average cost," Statistics & Probability Letters, Elsevier, vol. 110(C), pages 217-224.
    8. Ouimet, Frédéric, 2022. "A symmetric matrix-variate normal local approximation for the Wishart distribution and some applications," Journal of Multivariate Analysis, Elsevier, vol. 189(C).
    9. Pierre Lafaye de Micheaux & Frédéric Ouimet, 2021. "A Study of Seven Asymmetric Kernels for the Estimation of Cumulative Distribution Functions," Mathematics, MDPI, vol. 9(20), pages 1-35, October.
    10. Marcelo Fernandes & Eduardo Mendes & Olivier Scaillet, 2015. "Testing for symmetry and conditional symmetry using asymmetric kernels," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 67(4), pages 649-671, August.
    11. Ouimet, Frédéric, 2021. "Asymptotic properties of Bernstein estimators on the simplex," Journal of Multivariate Analysis, Elsevier, vol. 185(C).
    12. Renault, Olivier & Scaillet, Olivier, 2004. "On the way to recovery: A nonparametric bias free estimation of recovery rate densities," Journal of Banking & Finance, Elsevier, vol. 28(12), pages 2915-2931, December.
    13. Nikolay Gospodinov & Masayuki Hirukawa, 2008. "Time Series Nonparametric Regression Using Asymmetric Kernels with an Application to Estimation of Scalar Diffusion Processes," CIRJE F-Series CIRJE-F-573, CIRJE, Faculty of Economics, University of Tokyo.
    14. Funke, Benedikt & Hirukawa, Masayuki, 2019. "Nonparametric estimation and testing on discontinuity of positive supported densities: a kernel truncation approach," Econometrics and Statistics, Elsevier, vol. 9(C), pages 156-170.
    15. Malec, Peter & Schienle, Melanie, 2014. "Nonparametric kernel density estimation near the boundary," Computational Statistics & Data Analysis, Elsevier, vol. 72(C), pages 57-76.
    16. Hirukawa, Masayuki, 2010. "Nonparametric multiplicative bias correction for kernel-type density estimation on the unit interval," Computational Statistics & Data Analysis, Elsevier, vol. 54(2), pages 473-495, February.
    17. Bouezmarni, T. & Mesfioui, M. & Rolin, J.M., 2007. "L1-rate of convergence of smoothed histogram," Statistics & Probability Letters, Elsevier, vol. 77(14), pages 1497-1504, August.
    18. Masayuki Hirukawa & Mari Sakudo, 2016. "Testing Symmetry of Unknown Densities via Smoothing with the Generalized Gamma Kernels," Econometrics, MDPI, vol. 4(2), pages 1-27, June.
    19. BOUEZMARNI, Taoufik & ROMBOUTS, Jeroen V.K., 2007. "Nonparametric density estimation for multivariate bounded data," LIDAM Discussion Papers CORE 2007065, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE).
    20. Bouezmarni, T. & Rombouts, J.V.K., 2009. "Semiparametric multivariate density estimation for positive data using copulas," Computational Statistics & Data Analysis, Elsevier, vol. 53(6), pages 2040-2054, April.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:jclass:v:40:y:2023:i:1:d:10.1007_s00357-022-09425-9. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.