IDEAS home Printed from https://ideas.repec.org/a/spr/jagbes/v28y2023i2d10.1007_s13253-023-00527-4.html
   My bibliography  Save this article

An Approach for Specifying Trimming and Winsorization Cutoffs

Author

Listed:
  • Kedai Cheng

    (University of North Carolina - Asheville)

  • Derek S. Young

    (University of Kentucky)

Abstract

Outliers and extreme values are common in the era of big data, especially in the collection of survey data and real analysis. Clearly, care needs to be taken with how such values are treated in the calculation of statistical summaries, such as those involving the sample mean and sample variance. Robust alternatives based on trimming or Winsorization are often employed to mitigate the effect of those outlying points. An aspect critical to these methods, however, is in the determination of the cutoff locations. One classic approach is g-and-g-times trimming/Winsorization, which takes a proportion g off from both tails. However, this method does not carry any confidence statement, such as one finds with the calculation of statistical intervals. We propose the application of nonparametric statistical tolerance intervals, which captures a specified proportion of the sampled population at a confidence level, to determine cutoff locations for trimming and Winsorization. Extensive simulation studies show that this approach yields better coverage than the g-and-g-times method, even though the latter was not designed as a confidence procedure. Census of Agriculture data since 1982 is analyzed to highlight the impact on statistical summaries regarding farm land. Supplementary materials accompanying this paper appear online.

Suggested Citation

  • Kedai Cheng & Derek S. Young, 2023. "An Approach for Specifying Trimming and Winsorization Cutoffs," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 28(2), pages 299-323, June.
  • Handle: RePEc:spr:jagbes:v:28:y:2023:i:2:d:10.1007_s13253-023-00527-4
    DOI: 10.1007/s13253-023-00527-4
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s13253-023-00527-4
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s13253-023-00527-4?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Silvia Lui & James Mitchell & Martin Weale, 2011. "Qualitative business surveys: signal or noise?," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 174(2), pages 327-348, April.
    2. Derek S. Young & Thomas Mathew, 2014. "Improved nonparametric tolerance intervals based on interpolated and extrapolated order statistics," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 26(3), pages 415-432, September.
    3. Dimitri, Carolyn & Effland, Anne & Conklin, Neilson C., 2005. "The 20th Century Transformation of U.S. Agriculture and Farm Policy," Economic Information Bulletin 59390, United States Department of Agriculture, Economic Research Service.
    4. Catherine Hausman & Maximilian Auffhammer & Peter Berck, 2012. "Farm Acreage Shocks and Crop Prices: An SVAR Approach to Understanding the Impacts of Biofuels," Environmental & Resource Economics, Springer;European Association of Environmental and Resource Economists, vol. 53(1), pages 117-136, September.
    5. Di Bucchianico, A. & Einmahl, J.H.J. & Mushkudiani, N.A., 2001. "Smallest nonparametric tolerance regions," Other publications TiSEM 436f9be2-d0ad-49af-b6df-9, Tilburg University, School of Economics and Management.
    6. Mukhopadhyay, Nitai D. & Chatterjee, Snigdhansu, 2011. "High dimensional data analysis using multivariate generalized spatial quantiles," Journal of Multivariate Analysis, Elsevier, vol. 102(4), pages 768-780, April.
    7. Young, Derek S., 2010. "tolerance: An R Package for Estimating Tolerance Intervals," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 36(i05).
    8. Jesse Frey, 2010. "Data-driven nonparametric tolerance sets," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 22(2), pages 169-180.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ilaria Lucrezia Amerise, 2023. "A direct method for constructing distribution-free tolerance regions," Quality & Quantity: International Journal of Methodology, Springer, vol. 57(5), pages 3941-3954, October.
    2. Kyung Serk Cho & Hon Keung Tony Ng, 2021. "Tolerance intervals in statistical software and robustness under model misspecification," Journal of Statistical Distributions and Applications, Springer, vol. 8(1), pages 1-49, December.
    3. Frey, Jesse, 2014. "Shorter nonparametric prediction intervals for an order statistic from a future sample," Statistics & Probability Letters, Elsevier, vol. 91(C), pages 69-75.
    4. Coleman, Jane A. & Shaik, Saleem, 2009. "Time-Varying Estimation of Crop Insurance Program in Altering North Dakota Farm Economic Structure," 2009 Annual Meeting, July 26-28, 2009, Milwaukee, Wisconsin 49516, Agricultural and Applied Economics Association.
    5. Christiane Baumeister & Lutz Kilian, 2014. "Do oil price increases cause higher food prices? [Biofuels, binding constraints, and agricultural commodity price volatility]," Economic Policy, CEPR, CESifo, Sciences Po;CES;MSH, vol. 29(80), pages 691-747.
    6. Scott A. Carson, 2017. "Assessing Cumulative Net Nutrition and the Transition from 19th Century Bound to Free-Labor by Ethnic Status," CESifo Working Paper Series 6813, CESifo.
    7. Roberts, Michael J. & Tran, A. Nam, 2013. "Conditional Suspension of the US Ethanol Mandate using Threshold Price inside a Competitive Storage Model," 2013 Annual Meeting, August 4-6, 2013, Washington, D.C. 150717, Agricultural and Applied Economics Association.
    8. Jeremy G. Weber & Conor Wall & Jason Brown & Tom Hertz, 2015. "Crop Prices, Agricultural Revenues, and the Rural Economy," Applied Economic Perspectives and Policy, Agricultural and Applied Economics Association, vol. 37(3), pages 459-476.
    9. Dalheimer, Bernhard & Herwartz, Helmut & Lange, Alexander, 2021. "The threat of oil market turmoils to food price stability in Sub-Saharan Africa," Energy Economics, Elsevier, vol. 93(C).
    10. Glauber, Joseph W. & Effland, Anne, 2016. "United States agricultural policy: Its evolution and impact:," IFPRI discussion papers 1543, International Food Policy Research Institute (IFPRI).
    11. Michele Caivano & Andrew Harvey, 2014. "Time-series models with an EGB2 conditional distribution," Journal of Time Series Analysis, Wiley Blackwell, vol. 35(6), pages 558-571, November.
    12. Breitung, Jörg & Schmeling, Maik, 2013. "Quantifying survey expectations: What’s wrong with the probability approach?," International Journal of Forecasting, Elsevier, vol. 29(1), pages 142-154.
    13. Elanor Starmer & Aimee Witteman & Timothy A. Wise, "undated". "Feeding the Factory Farm: Implicit Subsidies to the Broiler Chicken Industry," GDAE Working Papers 06-03, GDAE, Tufts University.
    14. Jesse Frey & Yimin Zhang, 2017. "What Do Interpolated Nonparametric Confidence Intervals for Population Quantiles Guarantee?," The American Statistician, Taylor & Francis Journals, vol. 71(4), pages 305-309, October.
    15. Rachael D. Garrett & Meredith Niles & Juliana Gil & Philip Dy & Julio Reis & Judson Valentim, 2017. "Policies for Reintegrating Crop and Livestock Systems: A Comparative Analysis," Sustainability, MDPI, vol. 9(3), pages 1-22, March.
    16. Ujjayant Chakravorty & Marie‐Hélène Hubert & Beyza Ural Marchand, 2019. "Food for fuel: The effect of the US biofuel mandate on poverty in India," Quantitative Economics, Econometric Society, vol. 10(3), pages 1153-1193, July.
    17. Helena Kahiluoto & Janne Kaseva, 2016. "No Evidence of Trade-Off between Farm Efficiency and Resilience: Dependence of Resource-Use Efficiency on Land-Use Diversity," PLOS ONE, Public Library of Science, vol. 11(9), pages 1-16, September.
    18. Colin A. Carter & Gordon C. Rausser & Aaron Smith, 2017. "Commodity Storage and the Market Effects of Biofuel Policies," American Journal of Agricultural Economics, Agricultural and Applied Economics Association, vol. 99(4), pages 1027-1055.
    19. Kaufmann, Daniel & Scheufele, Rolf, 2017. "Business tendency surveys and macroeconomic fluctuations," International Journal of Forecasting, Elsevier, vol. 33(4), pages 878-893.
    20. Barnett, Barry J., 2014. "The Last Farm Bill?," Journal of Agricultural and Applied Economics, Southern Agricultural Economics Association, vol. 46(3), pages 1-9, August.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:jagbes:v:28:y:2023:i:2:d:10.1007_s13253-023-00527-4. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.