IDEAS home Printed from https://ideas.repec.org/a/jss/jstsof/v077i02.html
   My bibliography  Save this article

Bayesian Network Constraint-Based Structure Learning Algorithms: Parallel and Optimized Implementations in the bnlearn R Package

Author

Listed:
  • Scutari, Marco

Abstract

It is well known in the literature that the problem of learning the structure of Bayesian networks is very hard to tackle: Its computational complexity is super-exponential in the number of nodes in the worst case and polynomial in most real-world scenarios. Efficient implementations of score-based structure learning benefit from past and current research in optimization theory, which can be adapted to the task by using the network score as the objective function to maximize. This is not true for approaches based on conditional independence tests, called constraint-based learning algorithms. The only optimization in widespread use, backtracking, leverages the symmetries implied by the definitions of neighborhood and Markov blanket. In this paper we illustrate how backtracking is implemented in recent versions of the bnlearn R package, and how it degrades the stability of Bayesian network structure learning for little gain in terms of speed. As an alternative, we describe a software architecture and framework that can be used to parallelize constraint-based structure learning algorithms (also implemented in bnlearn) and we demonstrate its performance using four reference networks and two real-world data sets from genetics and systems biology. We show that on modern multi-core or multiprocessor hardware parallel implementations are preferable over backtracking, which was developed when single-processor machines were the norm.

Suggested Citation

  • Scutari, Marco, 2017. "Bayesian Network Constraint-Based Structure Learning Algorithms: Parallel and Optimized Implementations in the bnlearn R Package," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 77(i02).
  • Handle: RePEc:jss:jstsof:v:077:i02
    DOI: http://hdl.handle.net/10.18637/jss.v077.i02
    as

    Download full text from publisher

    File URL: https://www.jstatsoft.org/index.php/jss/article/view/v077i02/v77i02.pdf
    Download Restriction: no

    File URL: https://www.jstatsoft.org/index.php/jss/article/downloadSuppFile/v077i02/bnlearn_4.1.1.tar.gz
    Download Restriction: no

    File URL: https://www.jstatsoft.org/index.php/jss/article/downloadSuppFile/v077i02/v77i02-replication.zip
    Download Restriction: no

    File URL: https://libkey.io/http://hdl.handle.net/10.18637/jss.v077.i02?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Rau Andrea & Jaffrézic Florence & Foulley Jean-Louis & Doerge Rebecca W, 2010. "An Empirical Bayesian Method for Estimating Biological Networks from Temporal Microarray Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 9(1), pages 1-28, January.
    2. Kalisch, Markus & Mächler, Martin & Colombo, Diego & Maathuis, Marloes H. & Bühlmann, Peter, 2012. "Causal Inference Using Graphical Models with the R Package pcalg," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 47(i11).
    3. Scutari, Marco, 2010. "Learning Bayesian Networks with the bnlearn R Package," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 35(i03).
    4. Boettcher, Susanne G. & Dethlefsen, Claus, 2003. "deal: A Package for Learning Bayesian Networks," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 8(i20).
    5. Ahmed Rebai (ed.), 2010. "Bayesian Network," Books, IntechOpen, number 786.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Prabal Das & D. A. Sachindra & Kironmala Chanda, 2022. "Machine Learning-Based Rainfall Forecasting with Multiple Non-Linear Feature Selection Algorithms," Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA), Springer;European Water Resources Association (EWRA), vol. 36(15), pages 6043-6071, December.
    2. Guofa Li & Yuan Liao & Qiangqiang Guo & Caixiong Shen & Weijian Lai, 2021. "Traffic Crash Characteristics in Shenzhen, China from 2014 to 2016," IJERPH, MDPI, vol. 18(3), pages 1-24, January.
    3. Alberto Borraccino & Paola Berchialla & Paola Dalmasso & Veronica Sciannameo & Alessio Vieno & Giacomo Lazzeri & Lorena Charrier & Patrizia Lemma, 2020. "Connectedness as a protective factor in immigrant youth: results from the Health Behaviours in School-aged Children (HBSC) Italian study," International Journal of Public Health, Springer;Swiss School of Public Health (SSPH+), vol. 65(3), pages 303-312, April.
    4. Babak Fazelabdolabadi, 2019. "Uncertainty and energy-sector equity returns in Iran: a Bayesian and quasi-Monte Carlo time-varying analysis," Financial Innovation, Springer;Southwestern University of Finance and Economics, vol. 5(1), pages 1-20, December.
    5. Wang, Yuhong & Zhang, Fan & Yang, Zhisen & Yang, Zaili, 2021. "Incorporation of deficiency data into the analysis of the dependency and interdependency among the risk factors influencing port state control inspection," Reliability Engineering and System Safety, Elsevier, vol. 206(C).
    6. Alberto Borraccino & Paola Berchialla & Paola Dalmasso & Veronica Sciannameo & Alessio Vieno & Giacomo Lazzeri & Lorena Charrier & Patrizia Lemma, 0. "Connectedness as a protective factor in immigrant youth: results from the Health Behaviours in School-aged Children (HBSC) Italian study," International Journal of Public Health, Springer;Swiss School of Public Health (SSPH+), vol. 0, pages 1-10.
    7. Sangsung Park & Sunghae Jun, 2020. "Patent Keyword Analysis of Disaster Artificial Intelligence Using Bayesian Network Modeling and Factor Analysis," Sustainability, MDPI, vol. 12(2), pages 1-11, January.
    8. Christopher Hagedorn & Johannes Huegle & Rainer Schlosser, 2022. "Understanding unforeseen production downtimes in manufacturing processes using log data-driven causal reasoning," Journal of Intelligent Manufacturing, Springer, vol. 33(7), pages 2027-2043, October.
    9. Babak Fazelabdolabadi, 2019. "A hybrid Bayesian-network proposition for forecasting the crude oil price," Financial Innovation, Springer;Southwestern University of Finance and Economics, vol. 5(1), pages 1-21, December.
    10. Pham, Hung Vuong & Sperotto, Anna & Furlan, Elisa & Torresan, Silvia & Marcomini, Antonio & Critto, Andrea, 2021. "Integrating Bayesian Networks into ecosystem services assessment to support water management at the river basin scale," Ecosystem Services, Elsevier, vol. 50(C).
    11. Pedro Bonilla-Nadal & Andrés Cano & Manuel Gómez-Olmedo & Serafín Moral & Ofelia Paula Retamero, 2022. "Using Value-Based Potentials for Making Approximate Inference on Probabilistic Graphical Models," Mathematics, MDPI, vol. 10(14), pages 1-27, July.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ronja Foraita & Juliane Friemel & Kathrin Günther & Thomas Behrens & Jörn Bullerdiek & Rolf Nimzyk & Wolfgang Ahrens & Vanessa Didelez, 2020. "Causal discovery of gene regulation with incomplete data," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 183(4), pages 1747-1775, October.
    2. Hobæk Haff, Ingrid & Aas, Kjersti & Frigessi, Arnoldo & Lacal, Virginia, 2016. "Structure learning in Bayesian Networks using regular vines," Computational Statistics & Data Analysis, Elsevier, vol. 101(C), pages 186-208.
    3. Sagnik Datta & Ghislaine Gayraud & Eric Leclerc & Frederic Y. Bois, 2017. "Graph_sampler: a simple tool for fully Bayesian analyses of DAG-models," Computational Statistics, Springer, vol. 32(2), pages 691-716, June.
    4. Michael J McGeachie & Hsun-Hsien Chang & Scott T Weiss, 2014. "CGBayesNets: Conditional Gaussian Bayesian Network Learning and Inference with Mixed Discrete and Continuous Data," PLOS Computational Biology, Public Library of Science, vol. 10(6), pages 1-7, June.
    5. Epskamp, Sacha & Cramer, Angélique O.J. & Waldorp, Lourens J. & Schmittmann, Verena D. & Borsboom, Denny, 2012. "qgraph: Network Visualizations of Relationships in Psychometric Data," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 48(i04).
    6. Prabal Das & D. A. Sachindra & Kironmala Chanda, 2022. "Machine Learning-Based Rainfall Forecasting with Multiple Non-Linear Feature Selection Algorithms," Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA), Springer;European Water Resources Association (EWRA), vol. 36(15), pages 6043-6071, December.
    7. Vuong, Quan-Hoang & La, Viet-Phuong, 2019. "The bayesvl R package. User guide v0.8.1," OSF Preprints w5dx6, Center for Open Science.
    8. F. Cugnata & G. Perucca & S. Salini, 2017. "Bayesian networks and the assessment of universities' value added," Journal of Applied Statistics, Taylor & Francis Journals, vol. 44(10), pages 1785-1806, July.
    9. Roland R. Ramsahai, 2020. "Connecting actuarial judgment to probabilistic learning techniques with graph theory," Papers 2007.15475, arXiv.org.
    10. Tang, Kayu & Parsons, David J. & Jude, Simon, 2019. "Comparison of automatic and guided learning for Bayesian networks to analyse pipe failures in the water distribution system," Reliability Engineering and System Safety, Elsevier, vol. 186(C), pages 24-36.
    11. Bettendorf, Timo & Heinlein, Reinhold, 2019. "Connectedness between G10 currencies: Searching for the causal structure," Discussion Papers 06/2019, Deutsche Bundesbank.
    12. Myriam Patricia Cifuentes & Clara Mercedes Suarez & Ricardo Cifuentes & Noel Malod-Dognin & Sam Windels & Jose Fernando Valderrama & Paul D. Juarez & R. Burciaga Valdez & Cynthia Colen & Charles Phill, 2022. "Big Data to Knowledge Analytics Reveals the Zika Virus Epidemic as Only One of Multiple Factors Contributing to a Year-Over-Year 28-Fold Increase in Microcephaly Incidence," IJERPH, MDPI, vol. 19(15), pages 1-21, July.
    13. Nicholson, Ann E. & Flores, M. Julia, 2011. "Combining state and transition models with dynamic Bayesian networks," Ecological Modelling, Elsevier, vol. 222(3), pages 555-566.
    14. Silvia de Juan & Maria Dulce Subida & Andres Ospina-Alvarez & Ainara Aguilar & Miriam Fernandez, 2020. "Disentangling the socio-ecological drivers behind illegal fishing in a small-scale fishery managed by a TURF system," Papers 2012.08970, arXiv.org.
    15. Meineri, Eric & Dahlberg, C. Johan & Hylander, Kristoffer, 2015. "Using Gaussian Bayesian Networks to disentangle direct and indirect associations between landscape physiography, environmental variables and species distribution," Ecological Modelling, Elsevier, vol. 313(C), pages 127-136.
    16. Michail Tsagris, 2021. "A New Scalable Bayesian Network Learning Algorithm with Applications to Economics," Computational Economics, Springer;Society for Computational Economics, vol. 57(1), pages 341-367, January.
    17. Bouncken, Ricarda B. & Ratzmann, Martin & Kraus, Sascha, 2021. "Anti-aging: How innovation is shaped by firm age and mutual knowledge creation in an alliance," Journal of Business Research, Elsevier, vol. 137(C), pages 422-429.
    18. Leonard Henckel & Emilija Perković & Marloes H. Maathuis, 2022. "Graphical criteria for efficient total effect estimation via adjustment in causal linear models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(2), pages 579-599, April.
    19. Michael J. Brusco & Douglas Steinley & Ashley L. Watts, 2022. "Disentangling relationships in symptom networks using matrix permutation methods," Psychometrika, Springer;The Psychometric Society, vol. 87(1), pages 133-155, March.
    20. Sangsung Park & Sunghae Jun, 2020. "Patent Keyword Analysis of Disaster Artificial Intelligence Using Bayesian Network Modeling and Factor Analysis," Sustainability, MDPI, vol. 12(2), pages 1-11, January.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:jss:jstsof:v:077:i02. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Christopher F. Baum (email available below). General contact details of provider: http://www.jstatsoft.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.