IDEAS home Printed from https://ideas.repec.org/p/ehl/lserod/103537.html
   My bibliography  Save this paper

The box-cox transformation: review and extensions

Author

Listed:
  • Atkinson, Anthony C.
  • Riani, Marco
  • Corbellini, Aldo

Abstract

The Box-Cox power transformation family for non-negative responses in linear models has a long and interesting history in both statistical practice and theory, which we summarize. The relationship between generalized linear models and log transformed data is illustrated. Extensions investigated include the transform both sides model and the Yeo-Johnson transformation for observations that can be positive or negative. The paper also describes an extended Yeo-Johnson transformation that allows positive and negative responses to have different power transformations. Analyses of data show this to be necessary. Robustness enters in the fan plot for which the forward search provides an ordering of the data. Plausible transformations are checked with an extended fan plot. These procedures are used to compare parametric power transformations with nonparametric transformations produced by smoothing.

Suggested Citation

  • Atkinson, Anthony C. & Riani, Marco & Corbellini, Aldo, 2021. "The box-cox transformation: review and extensions," LSE Research Online Documents on Economics 103537, London School of Economics and Political Science, LSE Library.
  • Handle: RePEc:ehl:lserod:103537
    as

    Download full text from publisher

    File URL: http://eprints.lse.ac.uk/103537/
    File Function: Open access version.
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. J. A. John & N. R. Draper, 1980. "An Alternative Family of Transformations," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 29(2), pages 190-197, June.
    2. Atkinson, A. C. & Koopman, S. J. & Shephard, N., 1997. "Detecting shocks: Outliers and breaks in time series," Journal of Econometrics, Elsevier, vol. 80(2), pages 387-422, October.
    3. Anthony C. Atkinson & Marco Riani & Aldo Corbellini, 2020. "The analysis of transformations for profit‐and‐loss data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 69(2), pages 251-275, April.
    4. Tianxi Cai & Lu Tian & L. J. Wei, 2005. "Semiparametric Box–Cox power transformation models for censored survival observations," Biometrika, Biometrika Trust, vol. 92(3), pages 619-632, September.
    5. Marco Riani, 2009. "Robust Transformations in Univariate and Multivariate Time Series," Econometric Reviews, Taylor & Francis Journals, vol. 28(1-3), pages 262-278.
    6. Yang, Zhenlin, 2006. "A modified family of power transformations," Economics Letters, Elsevier, vol. 92(1), pages 14-19, July.
    7. Foster A. M. & Tian L. & Wei L. J., 2001. "Estimation for the Box-Cox Transformation Model Without Assuming Parametric Error Distribution," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1097-1101, September.
    8. Duolao Wang & Michael Murphy, 2005. "Identifying Nonlinear Relationships in Regression using the ACE Algorithm," Journal of Applied Statistics, Taylor & Francis Journals, vol. 32(3), pages 243-258.
    9. Breusch, T S & Pagan, A R, 1979. "A Simple Test for Heteroscedasticity and Random Coefficient Variation," Econometrica, Econometric Society, vol. 47(5), pages 1287-1294, September.
    10. Tommaso Proietti & Marco Riani, 2009. "Transformations and seasonal adjustment," Journal of Time Series Analysis, Wiley Blackwell, vol. 30(1), pages 47-69, January.
    11. Patrick Royston & Douglas G. Altman, 1994. "Regression Using Fractional Polynomials of Continuous Covariates: Parsimonious Parametric Modelling," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 43(3), pages 429-453, September.
    12. Daryl Pregibon, 1980. "Goodness of Link Tests for Generalized Linear Models," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 29(1), pages 15-24, March.
    13. Domenico Perrotta & Marco Riani & Francesca Torti, 2009. "New robust dynamic plots for regression mixture detection," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 3(3), pages 263-279, December.
    14. Marazzi, Alfio & Villar, Ana J. & Yohai, Victor J., 2009. "Robust Response Transformations Based on Optimal Prediction," Journal of the American Statistical Association, American Statistical Association, vol. 104(485), pages 360-370.
    15. Marco Riani & Anthony C. Atkinson & Andrea Cerioli, 2009. "Finding an unknown number of multivariate outliers," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(2), pages 447-466, April.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Riani, Marco & Atkinson, Anthony Curtis & Corbellini, Aldo & Farcomeni, Alessio & Laurini, Fabrizio, 2024. "Information Criteria for Outlier Detection Avoiding Arbitrary Significance Levels," Econometrics and Statistics, Elsevier, vol. 29(C), pages 189-205.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Marco Riani & Anthony C. Atkinson & Aldo Corbellini, 2023. "Automatic robust Box–Cox and extended Yeo–Johnson transformations in regression," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 32(1), pages 75-102, March.
    2. Torti, Francesca & Corbellini, Aldo & Atkinson, Anthony C., 2021. "fsdaSAS: a package for robust regression for very large datasets including the batch forward search," LSE Research Online Documents on Economics 109895, London School of Economics and Political Science, LSE Library.
    3. Francesca Torti & Aldo Corbellini & Anthony C. Atkinson, 2021. "fsdaSAS: A Package for Robust Regression for Very Large Datasets Including the Batch Forward Search," Stats, MDPI, vol. 4(2), pages 1-21, April.
    4. Jones, A.M, 2010. "Models For Health Care," Health, Econometrics and Data Group (HEDG) Working Papers 10/01, HEDG, c/o Department of Economics, University of York.
    5. Anthony C. Atkinson & Marco Riani & Aldo Corbellini, 2020. "The analysis of transformations for profit‐and‐loss data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 69(2), pages 251-275, April.
    6. Pacula Rosalie Liccardo & Kilmer Beau & Grossman Michael & Chaloupka Frank J, 2010. "Risks and Prices: The Role of User Sanctions in Marijuana Markets," The B.E. Journal of Economic Analysis & Policy, De Gruyter, vol. 10(1), pages 1-38, February.
    7. Adrian Lüthge & Ulrich Pidun & Dodo zu Knyphausen-Aufseß, 2021. "Approximating relatedness from a business model perspective: towards a taxonomic approach," Review of Managerial Science, Springer, vol. 15(3), pages 813-846, April.
    8. Terence C. Cheng & Alfons Palangkaraya & Jongsay Yong, 2014. "Hospital utilization in mixed public--private system: evidence from Australian hospital data," Applied Economics, Taylor & Francis Journals, vol. 46(8), pages 859-870, March.
    9. Huapeng Li & Yukun Liu & Riquan Zhang, 2019. "Small area estimation under transformed nested-error regression models," Statistical Papers, Springer, vol. 60(4), pages 1397-1418, August.
    10. Elsa Marques & Sian Noble & Ashley W. Blom & William Hollingworth, 2014. "Disclosing Total Waiting Times For Joint Replacement: Evidence From The English Nhs Using Linked Hes Data," Health Economics, John Wiley & Sons, Ltd., vol. 23(7), pages 806-820, July.
    11. Natalia Rojas‐Perilla & Sören Pannier & Timo Schmid & Nikos Tzavidis, 2020. "Data‐driven transformations in small area estimation," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 183(1), pages 121-148, January.
    12. Riani, Marco & Atkinson, Anthony Curtis & Corbellini, Aldo & Farcomeni, Alessio & Laurini, Fabrizio, 2024. "Information Criteria for Outlier Detection Avoiding Arbitrary Significance Levels," Econometrics and Statistics, Elsevier, vol. 29(C), pages 189-205.
    13. Rojas-Perilla, Natalia & Pannier, Sören & Schmid, Timo & Tzavidis, Nikos, 2017. "Data-driven transformations in small area estimation," Discussion Papers 2017/30, Free University Berlin, School of Business & Economics.
    14. Hübler, Olaf, 2017. "Health and weight – gender-specific linkages under heterogeneity, interdependence and resilience factors," Economics & Human Biology, Elsevier, vol. 26(C), pages 96-111.
    15. Y. K. Tse & Z. L. Yang, 2004. "Tests of Functional Form and Heteroscedasticity," Econometric Society 2004 Far Eastern Meetings 424, Econometric Society.
    16. Noémi Kreif & Richard Grieve & Iván Díaz & David Harrison, 2015. "Evaluation of the Effect of a Continuous Treatment: A Machine Learning Approach with an Application to Treatment for Traumatic Brain Injury," Health Economics, John Wiley & Sons, Ltd., vol. 24(9), pages 1213-1228, September.
    17. Anika Reichert & Rowena Jacobs, 2018. "The impact of waiting time on patient outcomes: Evidence from early intervention in psychosis services in England," Health Economics, John Wiley & Sons, Ltd., vol. 27(11), pages 1772-1787, November.
    18. Mehzabin Tuli, Farzana & Mitra, Suman & Crews, Mariah B., 2021. "Factors influencing the usage of shared E-scooters in Chicago," Transportation Research Part A: Policy and Practice, Elsevier, vol. 154(C), pages 164-185.
    19. Hideki Murakami & Yukari Matsuse & Koji Mukaigawa & Yushi Tsunoda, 2013. "Product lifecycle and choice of transportation modes: Japan' s evidence of import and export," Discussion Papers 2013-28, Kobe University, Graduate School of Business Administration.
    20. Proto, Eugenio & Rustichini, Aldo, 2012. "Life Satisfaction, Household Income and Personality Traits," CAGE Online Working Paper Series 86, Competitive Advantage in the Global Economy (CAGE).

    More about this item

    Keywords

    ACE; AVAS; constructed variable; extended Yeo-Johnson transformation; forward search; linked plots; robust methods; Department of Statistics;
    All these keywords.

    JEL classification:

    • C1 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:ehl:lserod:103537. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: LSERO Manager (email available below). General contact details of provider: https://edirc.repec.org/data/lsepsuk.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.