IDEAS home Printed from https://ideas.repec.org/a/spr/advdac/v17y2023i1d10.1007_s11634-022-00498-3.html
   My bibliography  Save this article

Kurtosis removal for data pre-processing

Author

Listed:
  • Nicola Loperfido

    (Università degli Studi di Urbino “Carlo Bo”)

Abstract

Mesokurtic projections are linear projections with null fourth cumulants. They might be useful data pre-processing tools when nonnormality, as measured by the fourth cumulants, is either an opportunity or a challenge. Nonnull fourth cumulants are opportunities when projections with extreme kurtosis are used to identify interesting nonnormal features, as for example clusters and outliers. Unfortunately, this approach suffers from the curse of dimensionality, which may be addressed by projecting the data onto the subspace orthogonal to mesokurtic projections. Nonnull fourth cumulants are challenges when using statistical methods whose sampling properties heavily depend on the fourth cumulant themselves. Mesokurtic projections ease the problem by allowing to use the inferential properties of the same methods under normality. The paper shows necessary and sufficient conditions for the existence of mesokurtic projections and compares them with other gaussianization methods. Theoretical and empirical results suggest that mesokurtic transformations are particularly useful when sampling from finite normal mixtures. The practical use of mesokurtic projections is illustrated with the AIS and the RANDU datasets.

Suggested Citation

  • Nicola Loperfido, 2023. "Kurtosis removal for data pre-processing," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 17(1), pages 239-267, March.
  • Handle: RePEc:spr:advdac:v:17:y:2023:i:1:d:10.1007_s11634-022-00498-3
    DOI: 10.1007/s11634-022-00498-3
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11634-022-00498-3
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11634-022-00498-3?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Peña, Daniel & Prieto, Francisco J., 2000. "The kurtosis coefficient and the linear discriminant function," Statistics & Probability Letters, Elsevier, vol. 49(3), pages 257-261, September.
    2. Alashwali, Fatimah & Kent, John T., 2016. "The use of a common location measure in the invariant coordinate selection and projection pursuit," Journal of Multivariate Analysis, Elsevier, vol. 152(C), pages 145-161.
    3. Arevalillo, Jorge M. & Navarro, Hilario, 2012. "A study of the effect of kurtosis on discriminant analysis under elliptical populations," Journal of Multivariate Analysis, Elsevier, vol. 107(C), pages 53-63.
    4. Eric Jondeau & Michael Rockinger, 2006. "Optimal Portfolio Allocation under Higher Moments," European Financial Management, European Financial Management Association, vol. 12(1), pages 29-55, January.
    5. Genton, Marc G. & He, Li & Liu, Xiangwei, 2001. "Moments of skew-normal random vectors and their quadratic forms," Statistics & Probability Letters, Elsevier, vol. 51(4), pages 319-325, February.
    6. Nicola Loperfido, 2019. "Finite mixtures, projection pursuit and tensor rank: a triangulation," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(1), pages 145-173, March.
    7. Peña, Daniel & Prieto, Francisco J. & Viladomat, Júlia, 2010. "Eigenvectors of a kurtosis matrix as interesting directions to reveal cluster structure," Journal of Multivariate Analysis, Elsevier, vol. 101(9), pages 1995-2007, October.
    8. Yanagihara, Hirokazu & Tonda, Tetsuji & Matsumoto, Chieko, 2005. "The effects of nonnormality on asymptotic distributions of some likelihood ratio criteria for testing covariance structures under normal assumption," Journal of Multivariate Analysis, Elsevier, vol. 96(2), pages 237-264, October.
    9. Tzy-Chy Lin & Tsung-I Lin, 2010. "Supervised learning of multivariate skew normal mixture models with missing information," Computational Statistics, Springer, vol. 25(2), pages 183-201, June.
    10. Ursula Laa & Dianne Cook, 2020. "Using tours to visually investigate properties of new projection pursuit indexes with application to problems in physics," Computational Statistics, Springer, vol. 35(3), pages 1171-1205, September.
    11. Tsai, Arthur C. & Liou, Michelle & Simak, Maria & Cheng, Philip E., 2017. "On hyperbolic transformations to normality," Computational Statistics & Data Analysis, Elsevier, vol. 115(C), pages 250-266.
    12. Loperfido, Nicola, 2014. "Linear transformations to symmetry," Journal of Multivariate Analysis, Elsevier, vol. 129(C), pages 186-192.
    13. Yanagihara, Hirokazu, 2007. "A family of estimators for multivariate kurtosis in a nonnormal linear regression model," Journal of Multivariate Analysis, Elsevier, vol. 98(1), pages 1-29, January.
    14. Galeano, Pedro & Pena, Daniel & Tsay, Ruey S., 2006. "Outlier Detection in Multivariate Time Series by Projection Pursuit," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 654-669, June.
    15. David Blough, 1989. "Multivariate symmetry via projection pursuit," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 41(3), pages 461-475, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Nicola Loperfido & Tomer Shushi, 2023. "Optimal Portfolio Projections for Skew-Elliptically Distributed Portfolio Returns," Journal of Optimization Theory and Applications, Springer, vol. 199(1), pages 143-166, October.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Javed, Farrukh & Loperfido, Nicola & Mazur, Stepan, 2020. "Edgeworth Expansions for Multivariate Random Sums," Working Papers 2020:9, Örebro University, School of Business.
    2. Loperfido, Nicola, 2014. "A note on the fourth cumulant of a finite mixture distribution," Journal of Multivariate Analysis, Elsevier, vol. 123(C), pages 386-394.
    3. Loperfido, Nicola, 2020. "Some remarks on Koziol’s kurtosis," Journal of Multivariate Analysis, Elsevier, vol. 175(C).
    4. Loperfido, Nicola, 2021. "Some theoretical properties of two kurtosis matrices, with application to invariant coordinate selection," Journal of Multivariate Analysis, Elsevier, vol. 186(C).
    5. Loperfido, Nicola, 2018. "Skewness-based projection pursuit: A computational approach," Computational Statistics & Data Analysis, Elsevier, vol. 120(C), pages 42-57.
    6. Nicola Loperfido, 2019. "Finite mixtures, projection pursuit and tensor rank: a triangulation," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(1), pages 145-173, March.
    7. Nordhausen, Klaus & Ruiz-Gazen, Anne, 2021. "On the usage of joint diagonalization in multivariate statistics," TSE Working Papers 21-1268, Toulouse School of Economics (TSE).
    8. Klaus Nordhausen & Anne Ruiz-Gazen, 2022. "On the usage of joint diagonalization in multivariate statistics," Post-Print hal-04296111, HAL.
    9. Nicola Loperfido & Tomer Shushi, 2023. "Optimal Portfolio Projections for Skew-Elliptically Distributed Portfolio Returns," Journal of Optimization Theory and Applications, Springer, vol. 199(1), pages 143-166, October.
    10. Loperfido, Nicola, 2015. "Vector-valued skewness for model-based clustering," Statistics & Probability Letters, Elsevier, vol. 99(C), pages 230-237.
    11. Jorge M. Arevalillo & Hilario Navarro, 2020. "Data projections by skewness maximization under scale mixtures of skew-normal vectors," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 14(2), pages 435-461, June.
    12. Nordhausen, Klaus & Ruiz-Gazen, Anne, 2022. "On the usage of joint diagonalization in multivariate statistics," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    13. Loperfido, Nicola, 2013. "Skewness and the linear discriminant function," Statistics & Probability Letters, Elsevier, vol. 83(1), pages 93-99.
    14. François Desmoulins-Lebeault & Jean-François Gajewski & Luc Meunier, 2018. "Personality and Risk Aversion," Economics Bulletin, AccessEcon, vol. 38(1), pages 472-489.
    15. Bajgrowicz, Pierre & Scaillet, Olivier, 2012. "Technical trading revisited: False discoveries, persistence tests, and transaction costs," Journal of Financial Economics, Elsevier, vol. 106(3), pages 473-491.
    16. Sreenivasa Rao Jammalamadaka & Emanuele Taufer & György H. Terdik, 2021. "Asymptotic theory for statistics based on cumulant vectors with applications," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 48(2), pages 708-728, June.
    17. Vexler, Albert & Zou, Li, 2022. "Linear projections of joint symmetry and independence applied to exact testing treatment effects based on multidimensional outcomes," Journal of Multivariate Analysis, Elsevier, vol. 190(C).
    18. Kei Nakagawa & Yusuke Uchiyama, 2020. "GO-GJRSK Model with Application to Higher Order Risk-Based Portfolio," Mathematics, MDPI, vol. 8(11), pages 1-12, November.
    19. Lin, Edward M.H. & Sun, Edward W. & Yu, Min-Teh, 2020. "Behavioral data-driven analysis with Bayesian method for risk management of financial services," International Journal of Production Economics, Elsevier, vol. 228(C).
    20. Grané, Aurea & Veiga, Helena, 2010. "Outliers in Garch models and the estimation of risk measures," DES - Working Papers. Statistics and Econometrics. WS ws100502, Universidad Carlos III de Madrid. Departamento de Estadística.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:advdac:v:17:y:2023:i:1:d:10.1007_s11634-022-00498-3. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.