IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v9y2021i9p954-d542693.html
   My bibliography  Save this article

Skewness-Kurtosis Model-Based Projection Pursuit with Application to Summarizing Gene Expression Data

Author

Listed:
  • Jorge M. Arevalillo

    (Department of Statistics and Operational Research, University Nacional Educación a Distancia (UNED), 28040 Madrid, Spain)

  • Hilario Navarro

    (Department of Statistics and Operational Research, University Nacional Educación a Distancia (UNED), 28040 Madrid, Spain)

Abstract

Non-normality is a usual fact when dealing with gene expression data. Thus, flexible models are needed in order to account for the underlying asymmetry and heavy tails of multivariate gene expression measures. This paper addresses the issue by exploring the projection pursuit problem under a flexible framework where the underlying model is assumed to follow a multivariate skew-t distribution. Under this assumption, projection pursuit with skewness and kurtosis indices is addressed as a natural approach for data reduction. The work examines its properties giving some theoretical insights and delving into the computational side in regards to the application to real gene expression data. The results of the theory are illustrated by means of a simulation study; the outputs of the simulation are used in combination with the theoretical insights to shed light on the usefulness of skewness-kurtosis projection pursuit for summarizing multivariate gene expression data. The application to gene expression measures of patients diagnosed with triple-negative breast cancer gives promising findings that may contribute to explain the heterogeneity of this type of tumors.

Suggested Citation

  • Jorge M. Arevalillo & Hilario Navarro, 2021. "Skewness-Kurtosis Model-Based Projection Pursuit with Application to Summarizing Gene Expression Data," Mathematics, MDPI, vol. 9(9), pages 1-18, April.
  • Handle: RePEc:gam:jmathe:v:9:y:2021:i:9:p:954-:d:542693
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/9/9/954/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/9/9/954/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Nicola Loperfido, 2010. "Canonical transformations of skew-normal variates," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 19(1), pages 146-165, May.
    2. Loperfido, Nicola, 2014. "A note on the fourth cumulant of a finite mixture distribution," Journal of Multivariate Analysis, Elsevier, vol. 123(C), pages 386-394.
    3. Jorge M. Arevalillo & Hilario Navarro, 2019. "A stochastic ordering based on the canonical transformation of skew-normal vectors," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 28(2), pages 475-498, June.
    4. Kim, Hyoung-Moon & Mallick, Bani K., 2003. "Moments of random vectors with skew t distribution and their quadratic forms," Statistics & Probability Letters, Elsevier, vol. 63(4), pages 417-423, July.
    5. Wang, Jin, 2009. "A family of kurtosis orderings for multivariate distributions," Journal of Multivariate Analysis, Elsevier, vol. 100(3), pages 509-517, March.
    6. A. Azzalini & A. Capitanio, 1999. "Statistical applications of the multivariate skew normal distribution," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 61(3), pages 579-602.
    7. Branco, Márcia D. & Dey, Dipak K., 2001. "A General Class of Multivariate Skew-Elliptical Distributions," Journal of Multivariate Analysis, Elsevier, vol. 79(1), pages 99-113, October.
    8. Pena D. & Prieto F.J., 2001. "Cluster Identification Using Projections," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1433-1445, December.
    9. Balakrishnan, N. & Scarpa, Bruno, 2012. "Multivariate measures of skewness for the skew-normal distribution," Journal of Multivariate Analysis, Elsevier, vol. 104(1), pages 73-87, February.
    10. Loperfido, Nicola, 2018. "Skewness-based projection pursuit: A computational approach," Computational Statistics & Data Analysis, Elsevier, vol. 120(C), pages 42-57.
    11. Jurgen A. Doornik & Henrik Hansen, 2008. "An Omnibus Test for Univariate and Multivariate Normality," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 70(s1), pages 927-939, December.
    12. Loperfido, Nicola, 2013. "Skewness and the linear discriminant function," Statistics & Probability Letters, Elsevier, vol. 83(1), pages 93-99.
    13. Jorge M. Arevalillo & Hilario Navarro, 2020. "Data projections by skewness maximization under scale mixtures of skew-normal vectors," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 14(2), pages 435-461, June.
    14. Nicola Loperfido, 2020. "Kurtosis-based projection pursuit for outlier detection in financial time series," The European Journal of Finance, Taylor & Francis Journals, vol. 26(2-3), pages 142-164, February.
    15. Hyoung-Moon Kim & Chiwhan Kim, 2017. "Moments of scale mixtures of skew-normal distributions and their quadratic forms," Communications in Statistics - Theory and Methods, Taylor & Francis Journals, vol. 46(3), pages 1117-1126, February.
    16. Balakrishnan, N. & Capitanio, A. & Scarpa, B., 2014. "A test for multivariate skew-normality based on its canonical form," Journal of Multivariate Analysis, Elsevier, vol. 128(C), pages 19-32.
    17. Arevalillo, Jorge M. & Navarro, Hilario, 2012. "A study of the effect of kurtosis on discriminant analysis under elliptical populations," Journal of Multivariate Analysis, Elsevier, vol. 107(C), pages 53-63.
    18. Hothorn, Torsten & Hornik, Kurt & van de Wiel, Mark A. & Zeileis, Achim, 2006. "A Lego System for Conditional Inference," The American Statistician, American Statistical Association, vol. 60, pages 257-263, August.
    19. Nicholas F Marko & Robert J Weil, 2012. "Non-Gaussian Distributions Affect Identification of Expression Patterns, Functional Annotation, and Prospective Classification in Human Cancer Genomes," PLOS ONE, Public Library of Science, vol. 7(10), pages 1-15, October.
    20. Nordhausen, Klaus & Oja, Hannu & Tyler, David E., 2008. "Tools for Exploring Multivariate Data: The Package ICS," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 28(i06).
    21. Adelchi Azzalini & Antonella Capitanio, 2003. "Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t‐distribution," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 65(2), pages 367-389, May.
    22. Adelchi Azzalini, 2005. "The Skew‐normal Distribution and Related Multivariate Families," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 32(2), pages 159-188, June.
    23. Arevalillo, Jorge M. & Navarro, Hilario, 2015. "A note on the direction maximizing skewness in multivariate skew-t vectors," Statistics & Probability Letters, Elsevier, vol. 96(C), pages 328-332.
    24. Joaquim Casellas & Luis Varona, 2012. "Modeling Skewness in Human Transcriptomes," PLOS ONE, Public Library of Science, vol. 7(6), pages 1-5, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jorge M. Arevalillo & Hilario Navarro, 2020. "Data projections by skewness maximization under scale mixtures of skew-normal vectors," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 14(2), pages 435-461, June.
    2. Jorge M. Arevalillo & Hilario Navarro, 2019. "A stochastic ordering based on the canonical transformation of skew-normal vectors," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 28(2), pages 475-498, June.
    3. Loperfido, Nicola, 2018. "Skewness-based projection pursuit: A computational approach," Computational Statistics & Data Analysis, Elsevier, vol. 120(C), pages 42-57.
    4. Nicola Loperfido, 2019. "Finite mixtures, projection pursuit and tensor rank: a triangulation," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(1), pages 145-173, March.
    5. Arevalillo, Jorge M. & Navarro, Hilario, 2015. "A note on the direction maximizing skewness in multivariate skew-t vectors," Statistics & Probability Letters, Elsevier, vol. 96(C), pages 328-332.
    6. Loperfido, Nicola, 2021. "Some theoretical properties of two kurtosis matrices, with application to invariant coordinate selection," Journal of Multivariate Analysis, Elsevier, vol. 186(C).
    7. Shushi, Tomer, 2018. "A proof for the existence of multivariate singular generalized skew-elliptical density functions," Statistics & Probability Letters, Elsevier, vol. 141(C), pages 50-55.
    8. Arellano-Valle, Reinaldo B. & Azzalini, Adelchi, 2013. "The centred parameterization and related quantities of the skew-t distribution," Journal of Multivariate Analysis, Elsevier, vol. 113(C), pages 73-90.
    9. Kahrari, F. & Rezaei, M. & Yousefzadeh, F. & Arellano-Valle, R.B., 2016. "On the multivariate skew-normal-Cauchy distribution," Statistics & Probability Letters, Elsevier, vol. 117(C), pages 80-88.
    10. Sreenivasa Rao Jammalamadaka & Emanuele Taufer & Gyorgy H. Terdik, 2021. "On Multivariate Skewness and Kurtosis," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 83(2), pages 607-644, August.
    11. Lee, Sharon X. & McLachlan, Geoffrey J., 2022. "An overview of skew distributions in model-based clustering," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    12. Zinoviy Landsman & Udi Makov & Tomer Shushi, 2017. "Extended Generalized Skew-Elliptical Distributions and their Moments," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 79(1), pages 76-100, February.
    13. Abdi, Me’raj & Madadi, Mohsen & Balakrishnan, Narayanaswamy & Jamalizadeh, Ahad, 2021. "Family of mean-mixtures of multivariate normal distributions: Properties, inference and assessment of multivariate skewness," Journal of Multivariate Analysis, Elsevier, vol. 181(C).
    14. Abe, Toshihiro & Fujisawa, Hironori & Kawashima, Takayuki & Ley, Christophe, 2021. "EM algorithm using overparameterization for the multivariate skew-normal distribution," Econometrics and Statistics, Elsevier, vol. 19(C), pages 151-168.
    15. Hossein Negarestani & Ahad Jamalizadeh & Sobhan Shafiei & Narayanaswamy Balakrishnan, 2019. "Mean mixtures of normal distributions: properties, inference and application," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 82(4), pages 501-528, May.
    16. Giorgi, Emanuele & McNeil, Alexander J., 2016. "On the computation of multivariate scenario sets for the skew-t and generalized hyperbolic families," Computational Statistics & Data Analysis, Elsevier, vol. 100(C), pages 205-220.
    17. Thomas J. DiCiccio & Anna Clara Monti, 2018. "Testing for sub-models of the skew t-distribution," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 27(1), pages 25-44, March.
    18. Chowdhury, Joydeep & Dutta, Subhajit & Arellano-Valle, Reinaldo B. & Genton, Marc G., 2022. "Sub-dimensional Mardia measures of multivariate skewness and kurtosis," Journal of Multivariate Analysis, Elsevier, vol. 192(C).
    19. Baishuai Zuo & Narayanaswamy Balakrishnan & Chuancun Yin, 2023. "An analysis of multivariate measures of skewness and kurtosis of skew-elliptical distributions," Papers 2311.18176, arXiv.org.
    20. Tsung-I Lin & Pal Wu & Geoffrey McLachlan & Sharon Lee, 2015. "A robust factor analysis model using the restricted skew- $$t$$ t distribution," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 24(3), pages 510-531, September.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:9:y:2021:i:9:p:954-:d:542693. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.