IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0196937.html
   My bibliography  Save this article

High throughput nonparametric probability density estimation

Author

Listed:
  • Jenny Farmer
  • Donald Jacobs

Abstract

In high throughput applications, such as those found in bioinformatics and finance, it is important to determine accurate probability distribution functions despite only minimal information about data characteristics, and without using human subjectivity. Such an automated process for univariate data is implemented to achieve this goal by merging the maximum entropy method with single order statistics and maximum likelihood. The only required properties of the random variables are that they are continuous and that they are, or can be approximated as, independent and identically distributed. A quasi-log-likelihood function based on single order statistics for sampled uniform random data is used to empirically construct a sample size invariant universal scoring function. Then a probability density estimate is determined by iteratively improving trial cumulative distribution functions, where better estimates are quantified by the scoring function that identifies atypical fluctuations. This criterion resists under and over fitting data as an alternative to employing the Bayesian or Akaike information criterion. Multiple estimates for the probability density reflect uncertainties due to statistical fluctuations in random samples. Scaled quantile residual plots are also introduced as an effective diagnostic to visualize the quality of the estimated probability densities. Benchmark tests show that estimates for the probability density function (PDF) converge to the true PDF as sample size increases on particularly difficult test probability densities that include cases with discontinuities, multi-resolution scales, heavy tails, and singularities. These results indicate the method has general applicability for high throughput statistical inference.

Suggested Citation

  • Jenny Farmer & Donald Jacobs, 2018. "High throughput nonparametric probability density estimation," PLOS ONE, Public Library of Science, vol. 13(5), pages 1-29, May.
  • Handle: RePEc:plo:pone00:0196937
    DOI: 10.1371/journal.pone.0196937
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0196937
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0196937&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0196937?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Song Chen, 2000. "Probability Density Function Estimation Using Gamma Kernels," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 52(3), pages 471-480, September.
    2. Wu, Ximing, 2003. "Calculation of maximum entropy densities with application to income distribution," Journal of Econometrics, Elsevier, vol. 115(2), pages 347-354, August.
    3. Golan, Amos & Judge, George G. & Miller, Douglas, 1996. "Maximum Entropy Econometrics," Staff General Research Papers Archive 1488, Iowa State University, Department of Economics.
    4. Joakim Munkhammar & Lars Mattsson & Jesper Rydén, 2017. "Polynomial probability distribution estimation using the method of moments," PLOS ONE, Public Library of Science, vol. 12(4), pages 1-14, April.
    5. Malec, Peter & Schienle, Melanie, 2014. "Nonparametric kernel density estimation near the boundary," Computational Statistics & Data Analysis, Elsevier, vol. 72(C), pages 57-76.
    6. Jin Zhang & Xueren Wang, 2009. "Robust normal reference bandwidth for kernel density estimation," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 63(1), pages 13-23, February.
    7. Brito, Margarida & Moreira Freitas, Ana Cristina, 2006. "Weak convergence of a bootstrap geometric-type estimator with applications to risk theory," Insurance: Mathematics and Economics, Elsevier, vol. 38(3), pages 571-584, June.
    8. Hartmann, P. & Straetmans, S. & de Vries, C.G., 2010. "Heavy tails and currency crises," Journal of Empirical Finance, Elsevier, vol. 17(2), pages 241-254, March.
    9. Peter J. Diggle, 2015. "Statistics: a data science for the 21st century," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 178(4), pages 793-813, October.
    10. Wu, Ximing, 2010. "Exponential Series Estimator of multivariate densities," Journal of Econometrics, Elsevier, vol. 156(2), pages 354-366, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lee Tae-Hwy & Ullah Aman & Mao Millie Yi, 2021. "Maximum Entropy Analysis of Consumption-based Capital Asset Pricing Model and Volatility," Journal of Econometric Methods, De Gruyter, vol. 10(1), pages 1-19, January.
    2. Ouimet, Frédéric & Tolosana-Delgado, Raimon, 2022. "Asymptotic properties of Dirichlet kernel density estimators," Journal of Multivariate Analysis, Elsevier, vol. 187(C).
    3. Jesse B. Tack & David Ubilava, 2015. "Climate and agricultural risk: measuring the effect of ENSO on U.S. crop insurance," Agricultural Economics, International Association of Agricultural Economists, vol. 46(2), pages 245-257, March.
    4. Berry, Tyrus & Sauer, Timothy, 2017. "Density estimation on manifolds with boundary," Computational Statistics & Data Analysis, Elsevier, vol. 107(C), pages 1-17.
    5. Wu, Ximing & Perloff, Jeffrey M., 2004. "China's Income Distribution Over Time: Reasons for Rising Inequality," Institute for Research on Labor and Employment, Working Paper Series qt9jw2v939, Institute of Industrial Relations, UC Berkeley.
    6. Wu, Ximing & Perloff, Jeffrey M., 2007. "Information-Theoretic Deconvolution Approximation of Treatment Effect Distribution," Department of Agricultural & Resource Economics, UC Berkeley, Working Paper Series qt6bm6n30x, Department of Agricultural & Resource Economics, UC Berkeley.
    7. Jesse Tack & Ardian Harri & Keith Coble, 2012. "More than Mean Effects: Modeling the Effect of Climate on the Higher Order Moments of Crop Yields," American Journal of Agricultural Economics, Agricultural and Applied Economics Association, vol. 94(5), pages 1037-1054.
    8. Ximing Wu & Andreas Savvides & Thanasis Stengos, 2008. "The Global Joint Distribution of Income and Health," Working Papers 0807, University of Guelph, Department of Economics and Finance.
    9. Ximing Wu & Jeffrey M. Perloff, 2005. "China's Income Distribution, 1985-2001," The Review of Economics and Statistics, MIT Press, vol. 87(4), pages 763-775, November.
    10. Stephen Satchell & Susan Thorp & Oliver Williams, 2012. "Estimating Consumption Plans for Recursive Utility by Maximum Entropy Methods," Research Paper Series 300, Quantitative Finance Research Centre, University of Technology, Sydney.
    11. Thanasis Stengos & Ximing Wu, 2010. "Information-Theoretic Distribution Test with Application to Normality," Econometric Reviews, Taylor & Francis Journals, vol. 29(3), pages 307-329.
    12. Ba Chu & Stephen Satchell, 2016. "Recovering the Most Entropic Copulas from Preliminary Knowledge of Dependence," Econometrics, MDPI, vol. 4(2), pages 1-21, March.
    13. Weiran Lin & Qiuqin He, 2021. "The Influence of Potential Infection on the Relationship between Temperature and Confirmed Cases of COVID-19 in China," Sustainability, MDPI, vol. 13(15), pages 1-11, July.
    14. Millie Yi Mao & Aman Ullah, 2019. "Information Theoretic Estimation of Econometric Functions," Working Papers 201923, University of California at Riverside, Department of Economics.
    15. Masayuki Hirukawa & Mari Sakudo, 2015. "Family of the generalised gamma kernels: a generator of asymmetric kernels for nonnegative data," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 27(1), pages 41-63, March.
    16. Bao, Xing & Tang, Ou & Ji, Jianhua, 2008. "Applying the minimum relative entropy method for bimodal distribution in a remanufacturing system," International Journal of Production Economics, Elsevier, vol. 113(2), pages 969-979, June.
    17. Kairat Mynbaev & Carlos Martins-Filho, 2019. "Unified estimation of densities on bounded and unbounded domains," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 71(4), pages 853-887, August.
    18. Yichen Gao & Yu Zhang & Ximing Wu, 2015. "Penalized exponential series estimation of copula densities with an application to intergenerational dependence of body mass index," Empirical Economics, Springer, vol. 48(1), pages 61-81, February.
    19. repec:rim:rimwps:25-08 is not listed on IDEAS
    20. Wu, Ximing & Perloff, Jeffrey M., 2005. "GMM Estimation of a Maximum Distribution With Interval Data," Department of Agricultural & Resource Economics, UC Berkeley, Working Paper Series qt7jf5w1ht, Department of Agricultural & Resource Economics, UC Berkeley.
    21. Mohammadi, Faezeh & Izadi, Muhyiddin & Lai, Chin-Diew, 2016. "On testing whether burn-in is required under the long-run average cost," Statistics & Probability Letters, Elsevier, vol. 110(C), pages 217-224.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0196937. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.