IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v115y2017icp250-266.html

On hyperbolic transformations to normality

Author

Listed:
  • Tsai, Arthur C.
  • Liou, Michelle
  • Simak, Maria
  • Cheng, Philip E.

Abstract

In biological and social sciences, it is essential to consider data transformations to normality for detecting structural effects and for better data representation and interpretation. An array of transformations to normality has been derived for data exhibiting skewed, leptokurtic and unimodal shapes, but is less amenable to data exhibiting platykurtic shapes, such as a nearly bimodal distribution. This study proposes and constructs a new family of hyperbolic power transformations for improving normality of raw data with varying degrees of skewness and kurtosis. An advantage this new family has is its effectiveness in transforming platykurtic or bimodal data distributions to normal. A simulation study and a real data example on mathematics achievement test scores are used to illustrate the wide-ranging applications of the proposed family of transformations. As a cautionary note, usefulness and limitations of the proposed method will be discussed for stabilizing the variance of DNA microarray data and for symmetrizing the data distribution towards normality. The empirical applications also illustrate an example of conservative t- and ANOVA F-tests when the assumption of normality is violated.

Suggested Citation

  • Tsai, Arthur C. & Liou, Michelle & Simak, Maria & Cheng, Philip E., 2017. "On hyperbolic transformations to normality," Computational Statistics & Data Analysis, Elsevier, vol. 115(C), pages 250-266.
  • Handle: RePEc:eee:csdana:v:115:y:2017:i:c:p:250-266
    DOI: 10.1016/j.csda.2017.06.001
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947317301408
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2017.06.001?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to

    for a different version of it.

    References listed on IDEAS

    as
    1. J. A. John & N. R. Draper, 1980. "An Alternative Family of Transformations," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 29(2), pages 190-197, June.
    2. Parrish, Rudolph S. & Spencer III, Horace J. & Xu, Ping, 2009. "Distribution modeling and simulation of gene expression data," Computational Statistics & Data Analysis, Elsevier, vol. 53(5), pages 1650-1660, March.
    3. Dieter Rasch & Klaus Kubinger & Karl Moder, 2011. "The two-sample t test: pre-testing its assumptions does not pay off," Statistical Papers, Springer, vol. 52(1), pages 219-231, February.
    4. Gel, Yulia R. & Gastwirth, Joseph L., 2008. "A robust modification of the Jarque-Bera test of normality," Economics Letters, Elsevier, vol. 99(1), pages 30-32, April.
    5. Filidor Vilca & Mariana Rodrigues-Motta & V�ctor Leiva, 2013. "On a variance stabilizing model and its application to genomic data," Journal of Applied Statistics, Taylor & Francis Journals, vol. 40(11), pages 2354-2371, November.
    6. M. C. Jones & Arthur Pewsey, 2009. "Sinh-arcsinh distributions," Biometrika, Biometrika Trust, vol. 96(4), pages 761-780.
    7. Ambroise, Jerome & Bearzatto, Bertrand & Robert, Annie & Govaerts, Bernadette & Macq, Benoit & Gala, Jean-Luc, 2011. "Impact of the spotted microarray preprocessing method on fold-change compression and variance stability," LIDAM Reprints ISBA 2011054, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    8. Purdom Elizabeth & Holmes Susan P, 2005. "Error Distribution for Gene Expression Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 4(1), pages 1-35, July.
    9. Greenacre, Michael, 2009. "Power transformations in correspondence analysis," Computational Statistics & Data Analysis, Elsevier, vol. 53(8), pages 3107-3116, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Michael A. Clemens & Hannah M. Postel, 2018. "Deterring Emigration with Foreign Aid: An Overview of Evidence from Low‐Income Countries," Population and Development Review, The Population Council, Inc., vol. 44(4), pages 667-693, December.
    2. Geovanny Marulanda & Antonio Bello & Jenny Cifuentes & Javier Reneses, 2020. "Wind Power Long-Term Scenario Generation Considering Spatial-Temporal Dependencies in Coupled Electricity Markets," Energies, MDPI, vol. 13(13), pages 1-19, July.
    3. Nicola Loperfido, 2023. "Kurtosis removal for data pre-processing," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 17(1), pages 239-267, March.
    4. Priddle, Jacob W. & Drovandi, Christopher, 2023. "Transformations in semi-parametric Bayesian synthetic likelihood," Computational Statistics & Data Analysis, Elsevier, vol. 187(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Snezhana Gocheva-Ilieva & Iliycho Iliev, 2016. "Using Generalized PathSeeker Regularized Regression for Modeling and Prediction of Output Power of CuBr Laser," Proceedings of International Academic Conferences 4006523, International Institute of Social and Economic Sciences.
    2. Colling, Benjamin & Van Keilegom, Ingrid, 2016. "Goodness-of-fit tests in semiparametric transformation models using the integrated regression function," LIDAM Discussion Papers ISBA 2016031, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    3. Geoffrey Ngene & Kenneth A. Tah & Ali F. Darrat, 2017. "Long memory or structural breaks: Some evidence for African stock markets," Review of Financial Economics, John Wiley & Sons, vol. 34(1), pages 61-73, September.
    4. Debarsy, Nicolas & Gnabo, Jean-Yves & Kerkour, Malik, 2017. "Sovereign wealth funds’ cross-border investments: Assessing the role of country-level drivers and spatial competition," Journal of International Money and Finance, Elsevier, vol. 76(C), pages 68-87.
    5. Punathumparambath, Bindu & Kulathinal, Sangita & George, Sebastian, 2012. "Asymmetric type II compound Laplace distribution and its application to microarray gene expression," Computational Statistics & Data Analysis, Elsevier, vol. 56(6), pages 1396-1404.
    6. McKeague, Ian W., 2015. "Central limit theorems under special relativity," Statistics & Probability Letters, Elsevier, vol. 99(C), pages 149-155.
    7. Atkinson, Anthony C. & Riani, Marco & Corbellini, Aldo, 2021. "The box-cox transformation: review and extensions," LSE Research Online Documents on Economics 103537, London School of Economics and Political Science, LSE Library.
    8. Rojas-Perilla, Natalia & Pannier, Sören & Schmid, Timo & Tzavidis, Nikos, 2017. "Data-driven transformations in small area estimation," Discussion Papers 2017/30, Free University Berlin, School of Business & Economics.
    9. Pak, Tae-Young, 2021. "What are the effects of expanding social pension on health? Evidence from the Basic Pension in South Korea," The Journal of the Economics of Ageing, Elsevier, vol. 18(C).
    10. Christophe Ley, 2014. "Flexible Modelling in Statistics: Past, present and Future," Working Papers ECARES ECARES 2014-42, ULB -- Universite Libre de Bruxelles.
    11. Ambroise Jérôme & Bearzatto Bertrand & Robert Annie & Macq Benoit & Gala Jean-Luc, 2012. "Combining Multiple Laser Scans of Spotted Microarrays by Means of a Two-Way ANOVA Model," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(3), pages 1-20, February.
    12. Huixia Judy Wang & Leonard A. Stefanski & Zhongyi Zhu, 2012. "Corrected-loss estimation for quantile regression with covariate measurement errors," Biometrika, Biometrika Trust, vol. 99(2), pages 405-421.
    13. Alina Bărbulescu & Cristian Ștefan Dumitriu, 2021. "On the Connection between the GEP Performances and the Time Series Properties," Mathematics, MDPI, vol. 9(16), pages 1-19, August.
    14. Gomes-Gonçalves, Erika & Gzyl, Henryk & Mayoral, Silvia, 2015. "Two maxentropic approaches to determine the probability density of compound risk losses," Insurance: Mathematics and Economics, Elsevier, vol. 62(C), pages 42-53.
    15. Tom Broekel & Thomas Brenner, 2011. "Regional factors and innovativeness: an empirical analysis of four German industries," The Annals of Regional Science, Springer;Western Regional Science Association, vol. 47(1), pages 169-194, August.
    16. Daniel Hadley & Harry Joe & Natalia Nolde, 2021. "On the Selection of Loss Severity Distributions to Model Operational Risk," Papers 2107.03979, arXiv.org.
    17. Blasius, J. & Greenacre, M. & Groenen, P.J.F. & van de Velden, M., 2009. "Special issue on correspondence analysis and related methods," Computational Statistics & Data Analysis, Elsevier, vol. 53(8), pages 3103-3106, June.
    18. Pora, Pierre & Wilner, Lionel, 2020. "A decomposition of labor earnings growth: Recovering Gaussianity?," Labour Economics, Elsevier, vol. 63(C).
    19. Sladana Babic & Laetitia Gelbgras & Marc Hallin & Christophe Ley, 2019. "Optimal tests for elliptical symmetry: specified and unspecified location," Working Papers ECARES 2019-26, ULB -- Universite Libre de Bruxelles.
    20. Mendonça, Suzielli M. & Cabella, Brenno C.T. & Martinez, Alexandre S., 2024. "A Multifractal Detrended Fluctuation Analysis approach using generalized functions," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 637(C).

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:115:y:2017:i:c:p:250-266. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.