IDEAS home Printed from https://ideas.repec.org/a/spr/compst/v38y2023i4d10.1007_s00180-022-01299-0.html
   My bibliography  Save this article

The computing of the Poisson multinomial distribution and applications in ecological inference and machine learning

Author

Listed:
  • Zhengzhi Lin

    (Virginia Tech)

  • Yueyao Wang

    (Virginia Tech)

  • Yili Hong

    (Virginia Tech)

Abstract

The Poisson multinomial distribution (PMD) describes the distribution of the sum of n independent but non-identically distributed random vectors, in which each random vector is of length m with 0/1 valued elements and only one of its elements can take value 1 with a certain probability. Those probabilities are different for the m elements across the n random vectors, and form an $$n \times m$$ n × m matrix with row sum equals to 1. We call this $$n\times m$$ n × m matrix the success probability matrix (SPM). Each SPM uniquely defines a $${ \text {PMD}}$$ PMD . The $${ \text {PMD}}$$ PMD is useful in many areas such as, voting theory, ecological inference, and machine learning. The distribution functions of $${ \text {PMD}}$$ PMD , however, are usually difficult to compute and there is no efficient algorithm available for computing it. In this paper, we develop efficient methods to compute the probability mass function (pmf) for the PMD using multivariate Fourier transform, normal approximation, and simulations. We study the accuracy and efficiency of those methods and give recommendations for which methods to use under various scenarios. We also illustrate the use of the $${ \text {PMD}}$$ PMD via three applications, namely, in ecological inference, uncertainty quantification in classification, and voting probability calculation. We build an R package that implements the proposed methods, and illustrate the package with examples. This paper has online supplementary materials.

Suggested Citation

  • Zhengzhi Lin & Yueyao Wang & Yili Hong, 2023. "The computing of the Poisson multinomial distribution and applications in ecological inference and machine learning," Computational Statistics, Springer, vol. 38(4), pages 1851-1877, December.
  • Handle: RePEc:spr:compst:v:38:y:2023:i:4:d:10.1007_s00180-022-01299-0
    DOI: 10.1007/s00180-022-01299-0
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00180-022-01299-0
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00180-022-01299-0?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to

    for a different version of it.

    References listed on IDEAS

    as
    1. Biscarri, William & Zhao, Sihai Dave & Brunner, Robert J., 2018. "A simple and fast method for computing the Poisson binomial distribution function," Computational Statistics & Data Analysis, Elsevier, vol. 122(C), pages 92-100.
    2. Hong, Yili, 2013. "On computing the distribution function for the Poisson binomial distribution," Computational Statistics & Data Analysis, Elsevier, vol. 59(C), pages 41-51.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Cho, Youngjin & Hong, Yili & Du, Pang, 2025. "An accurate computational approach for partial likelihood using Poisson-binomial distributions," Computational Statistics & Data Analysis, Elsevier, vol. 208(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Xiaoyu Shen & Fang Fang & Chengguang Liu, 2024. "The Fourier Cosine Method for Discrete Probability Distributions," Papers 2410.04487, arXiv.org, revised Oct 2024.
    2. Cho, Youngjin & Hong, Yili & Du, Pang, 2025. "An accurate computational approach for partial likelihood using Poisson-binomial distributions," Computational Statistics & Data Analysis, Elsevier, vol. 208(C).
    3. Bos, Hayo & Baas, Stef & Boucherie, Richard J. & Hans, Erwin W. & Leeftink, Gréanne, 2025. "Bed census prediction combining expert opinion and patient statistics," Omega, Elsevier, vol. 133(C).
    4. Arun G. Chandrasekhar & Robert Townsend & Juan Pablo Xandri, 2018. "Financial Centrality and Liquidity Provision," NBER Working Papers 24406, National Bureau of Economic Research, Inc.
    5. Deligiannis, Michalis & Liberopoulos, George, 2023. "Dynamic ordering and buyer selection policies when service affects future demand," Omega, Elsevier, vol. 118(C).
    6. Neal, Zachary & Domagalski, Rachel & Yan, Xiaoqin, 2020. "Party Control as a Context for Homophily in Collaborations among US House Representatives, 1981 -- 2015," OSF Preprints qwdxs, Center for Open Science.
    7. Róbert Pethes & Levente Kovács, 2023. "An Exact and an Approximation Method to Compute the Degree Distribution of Inhomogeneous Random Graph Using Poisson Binomial Distribution," Mathematics, MDPI, vol. 11(6), pages 1-24, March.
    8. Van der Auweraer, Sarah & Boute, Robert, 2019. "Forecasting spare part demand using service maintenance information," International Journal of Production Economics, Elsevier, vol. 213(C), pages 138-149.
    9. Bahar Cennet Okumuşoğlu & Beste Basciftci & Burak Kocuk, 2024. "An Integrated Predictive Maintenance and Operations Scheduling Framework for Power Systems Under Failure Uncertainty," INFORMS Journal on Computing, INFORMS, vol. 36(5), pages 1335-1358, September.
    10. Van der Auweraer, Sarah & Zhu, Sha & Boute, Robert N., 2021. "The value of installed base information for spare part inventory control," International Journal of Production Economics, Elsevier, vol. 239(C).
    11. Damba Lkhagvasuren & Erdenebat Bataa, 2023. "Finite-State Markov Chains with Flexible Distributions," Computational Economics, Springer;Society for Computational Economics, vol. 61(2), pages 611-644, February.
    12. Arun Chandrasekhar & Robert Townsend & Juan Pablo Pablo Xandri, 2019. "Financial Centrality and the Value of Key Players," Working Papers 2019-26, Princeton University. Economics Department..
    13. Stanislao Gualdi & Giulio Cimini & Kevin Primicerio & Riccardo Di Clemente & Damien Challet, 2016. "Statistically validated network of portfolio overlaps and systemic risk," Papers 1603.05914, arXiv.org, revised Sep 2016.
    14. Samuel Davis & Nasser Fard, 2020. "Theoretical bounds and approximation of the probability mass function of future hospital bed demand," Health Care Management Science, Springer, vol. 23(1), pages 20-33, March.
    15. Alessio Farcomeni & Monia Ranalli & Sara Viviani, 2021. "Dimension reduction for longitudinal multivariate data by optimizing class separation of projected latent Markov models," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 30(2), pages 462-480, June.
    16. Peizhou Liao & Hao Wu & Tianwei Yu, 2017. "ROC Curve Analysis in the Presence of Imperfect Reference Standards," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 9(1), pages 91-104, June.
    17. Musa Çağlar & Sinan Gürel, 2017. "Public R&D project portfolio selection problem with cancellations," OR Spectrum: Quantitative Approaches in Management, Springer;Gesellschaft für Operations Research e.V., vol. 39(3), pages 659-687, July.
    18. Volker Nocke & Roland Strausz, 2023. "Collective Brand Reputation," Journal of Political Economy, University of Chicago Press, vol. 131(1), pages 1-58.
    19. repec:osf:osfxxx:qwdxs_v1 is not listed on IDEAS
    20. Toyin Clottey & W. C. Benton, 2021. "On Sharing Part Dimensions Information and Its Impact on Design Tolerances In Fixed‐Bin Selective Assembly," Production and Operations Management, Production and Operations Management Society, vol. 30(11), pages 4089-4104, November.
    21. Mauricio Romero & Ã lvaro Riascos & Diego Jara, 2015. "On the Optimality of Answer-Copying Indices," Journal of Educational and Behavioral Statistics, , vol. 40(5), pages 435-453, October.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:compst:v:38:y:2023:i:4:d:10.1007_s00180-022-01299-0. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.