Author
Listed:
- Xiaowen Yang
(Biostatistics and Data Science, School of Public Health, Louisiana State University-Health Sciences Center, New Orleans, LA 70112, USA
These authors contributed equally to this work.)
- Amjila Bam
(Biostatistics and Data Science, School of Public Health, Louisiana State University-Health Sciences Center, New Orleans, LA 70112, USA
These authors contributed equally to this work.)
- Nubaira Rizvi
(Biostatistics and Data Science, School of Public Health, Louisiana State University-Health Sciences Center, New Orleans, LA 70112, USA)
- Xiao-Cheng Wu
(Louisiana Tumor Registry, Louisiana State University-Health Sciences Center, New Orleans, LA 70112, USA)
- Donald Mercante
(Biostatistics and Data Science, School of Public Health, Louisiana State University-Health Sciences Center, New Orleans, LA 70112, USA)
- Qingzhao Yu
(Biostatistics and Data Science, School of Public Health, Louisiana State University-Health Sciences Center, New Orleans, LA 70112, USA)
Abstract
Outlier detection is a fundamental component of data preprocessing and quality monitoring across diverse scientific domains, including engineering, biomedical sciences, and finance. While many variables in controlled environments approximate a normal distribution, real-world data, particularly biological, environmental, and epidemiological measures, are frequently characterized by pronounced right-skewness. To address the shortcomings of conventional methods, this study introduces the Dynamic Threshold for Outlier Detection (DTOD), which reframes outlier detection as a concrete operational workflow. The DTOD framework dynamically adjusts detection thresholds based on a functional relationship between skewness and tail morphology. Validation through large-scale simulation experiments across light-, middle-, and high-skewness levels confirms the method’s versatility. The DTOD proves particularly effective at two ends of the spectrum: enhancing sensitivity for detecting subtle anomalies in light-skewed data while serving as a conservative, high-confidence screening tool that controls false positives in high-skewness environments. In real-world application to North American Association of Central Cancer Registries (NAACCR) data, the method successfully identified outliers with abnormally high unknown tumor size rates in colorectal cancer and maintained a low misclassification rate in highly skewed lung cancer data. Ultimately, the DTOD provides a promising, interpretable solution for improving data quality in skewed scenarios.
Suggested Citation
Xiaowen Yang & Amjila Bam & Nubaira Rizvi & Xiao-Cheng Wu & Donald Mercante & Qingzhao Yu, 2026.
"An Adaptive Method to Identify Outliers in Skewed Observations: Application to Assess NAACCR Cancer Registry Data Usage,"
Stats, MDPI, vol. 9(2), pages 1-17, March.
Handle:
RePEc:gam:jstats:v:9:y:2026:i:2:p:33-:d:1900959
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jstats:v:9:y:2026:i:2:p:33-:d:1900959. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.