IDEAS home Printed from https://ideas.repec.org/a/eee/phsmap/v482y2017icp796-807.html
   My bibliography  Save this article

Feature extraction through parallel Probabilistic Principal Component Analysis for heart disease diagnosis

Author

Listed:
  • Shah, Syed Muhammad Saqlain
  • Batool, Safeera
  • Khan, Imran
  • Ashraf, Muhammad Usman
  • Abbas, Syed Hussnain
  • Hussain, Syed Adnan

Abstract

Automatic diagnosis of human diseases are mostly achieved through decision support systems. The performance of these systems is mainly dependent on the selection of the most relevant features. This becomes harder when the dataset contains missing values for the different features. Probabilistic Principal Component Analysis (PPCA) has reputation to deal with the problem of missing values of attributes. This research presents a methodology which uses the results of medical tests as input, extracts a reduced dimensional feature subset and provides diagnosis of heart disease. The proposed methodology extracts high impact features in new projection by using Probabilistic Principal Component Analysis (PPCA). PPCA extracts projection vectors which contribute in highest covariance and these projection vectors are used to reduce feature dimension. The selection of projection vectors is done through Parallel Analysis (PA). The feature subset with the reduced dimension is provided to radial basis function (RBF) kernel based Support Vector Machines (SVM). The RBF based SVM serves the purpose of classification into two categories i.e., Heart Patient (HP) and Normal Subject (NS). The proposed methodology is evaluated through accuracy, specificity and sensitivity over the three datasets of UCI i.e., Cleveland, Switzerland and Hungarian. The statistical results achieved through the proposed technique are presented in comparison to the existing research showing its impact. The proposed technique achieved an accuracy of 82.18%, 85.82% and 91.30% for Cleveland, Hungarian and Switzerland dataset respectively.

Suggested Citation

  • Shah, Syed Muhammad Saqlain & Batool, Safeera & Khan, Imran & Ashraf, Muhammad Usman & Abbas, Syed Hussnain & Hussain, Syed Adnan, 2017. "Feature extraction through parallel Probabilistic Principal Component Analysis for heart disease diagnosis," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 482(C), pages 796-807.
  • Handle: RePEc:eee:phsmap:v:482:y:2017:i:c:p:796-807
    DOI: 10.1016/j.physa.2017.04.113
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0378437117304260
    Download Restriction: Full text for ScienceDirect subscribers only. Journal offers the option of making the article available online on Science direct for a fee of $3,000

    File URL: https://libkey.io/10.1016/j.physa.2017.04.113?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Michael E. Tipping & Christopher M. Bishop, 1999. "Probabilistic Principal Component Analysis," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 61(3), pages 611-622.
    2. Patil, Vivek H. & Singh, Surendra N. & Mishra, Sanjay & Todd Donavan, D., 2008. "Efficient theory development and factor retention criteria: Abandon the `eigenvalue greater than one' criterion," Journal of Business Research, Elsevier, vol. 61(2), pages 162-170, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Eric Tate, 2012. "Social vulnerability indices: a comparative assessment using uncertainty and sensitivity analysis," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 63(2), pages 325-347, September.
    2. Xin Xu & Yang Lu & Yupeng Zhou & Zhiguo Fu & Yanjie Fu & Minghao Yin, 2021. "An Information-Explainable Random Walk Based Unsupervised Network Representation Learning Framework on Node Classification Tasks," Mathematics, MDPI, vol. 9(15), pages 1-14, July.
    3. Matteo Barigozzi & Marc Hallin, 2023. "Dynamic Factor Models: a Genealogy," Papers 2310.17278, arXiv.org, revised Jan 2024.
    4. Wang, Shao-Hsuan & Huang, Su-Yun, 2022. "Perturbation theory for cross data matrix-based PCA," Journal of Multivariate Analysis, Elsevier, vol. 190(C).
    5. Wentao Qu & Xianchao Xiu & Huangyue Chen & Lingchen Kong, 2023. "A Survey on High-Dimensional Subspace Clustering," Mathematics, MDPI, vol. 11(2), pages 1-39, January.
    6. Jiaju Miao & Pawel Polak, 2023. "Online Ensemble of Models for Optimal Predictive Performance with Applications to Sector Rotation Strategy," Papers 2304.09947, arXiv.org.
    7. Dorota Toczydlowska & Gareth W. Peters, 2018. "Financial Big Data Solutions for State Space Panel Regression in Interest Rate Dynamics," Econometrics, MDPI, vol. 6(3), pages 1-45, July.
    8. Jung, WoongHee & Taflanidis, Alexandros A., 2023. "Efficient global sensitivity analysis for high-dimensional outputs combining data-driven probability models and dimensionality reduction," Reliability Engineering and System Safety, Elsevier, vol. 231(C).
    9. Matteo Barigozzi, 2023. "Asymptotic equivalence of Principal Components and Quasi Maximum Likelihood estimators in Large Approximate Factor Models," Papers 2307.09864, arXiv.org, revised Sep 2023.
    10. Arthur Pewsey & Eduardo García-Portugués, 2021. "Recent advances in directional statistics," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 30(1), pages 1-58, March.
    11. Johannes Burge & Priyank Jaini, 2017. "Accuracy Maximization Analysis for Sensory-Perceptual Tasks: Computational Improvements, Filter Robustness, and Coding Advantages for Scaled Additive Noise," PLOS Computational Biology, Public Library of Science, vol. 13(2), pages 1-32, February.
    12. Tilman M. Davies & Sudipto Banerjee & Adam P. Martin & Rose E. Turnbull, 2022. "A nearest‐neighbour Gaussian process spatial factor model for censored, multi‐depth geochemical data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 71(4), pages 1014-1043, August.
    13. el Bouhaddani, Said & Uh, Hae-Won & Hayward, Caroline & Jongbloed, Geurt & Houwing-Duistermaat, Jeanine, 2018. "Probabilistic partial least squares model: Identifiability, estimation and application," Journal of Multivariate Analysis, Elsevier, vol. 167(C), pages 331-346.
    14. Stefan Sommer, 2019. "An Infinitesimal Probabilistic Model for Principal Component Analysis of Manifold Valued Data," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 81(1), pages 37-62, February.
    15. R Asvat & CA Bisschoff & CJ Botha, 2018. "Factors to Measure the Performance of Private Business Schools in South Africa," Journal of Economics and Behavioral Studies, AMH International, vol. 10(6), pages 50-69.
    16. Seppo Pulkkinen & Marko Mäkelä & Napsu Karmitsa, 2014. "A generative model and a generalized trust region Newton method for noise reduction," Computational Optimization and Applications, Springer, vol. 57(1), pages 129-165, January.
    17. Gen Li & Sungkyu Jung, 2017. "Incorporating covariates into integrated factor analysis of multi‐view data," Biometrics, The International Biometric Society, vol. 73(4), pages 1433-1442, December.
    18. Jeff Goldsmith & Vadim Zipunnikov & Jennifer Schrack, 2015. "Generalized multilevel function-on-scalar regression and principal component analysis," Biometrics, The International Biometric Society, vol. 71(2), pages 344-353, June.
    19. Frydman, Carola & Papanikolaou, Dimitris, 2018. "In search of ideas: Technological innovation and executive pay inequality," Journal of Financial Economics, Elsevier, vol. 130(1), pages 1-24.
    20. Joanna Radomska & Aleksandra Szpulak & Przemysław Wołczek, 2023. "A multi-item scale for open strategy measurement," DECISION: Official Journal of the Indian Institute of Management Calcutta, Springer;Indian Institute of Management Calcutta, vol. 50(1), pages 51-71, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:phsmap:v:482:y:2017:i:c:p:796-807. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.journals.elsevier.com/physica-a-statistical-mechpplications/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.