Author
Listed:
- Fereshteh R. Dastjerdi
(School of Computing, University of Georgia, Athens, GA 30602, USA)
- Liming Cai
(School of Computing, University of Georgia, Athens, GA 30602, USA)
Abstract
The Bayesian network is a directed, acyclic graphical model that can offer a structured description for probabilistic dependencies among random variables. As powerful tools for classification tasks, Bayesian classifiers often require computing joint probability distributions, which can be computationally intractable due to potential full dependencies among feature variables. On the other hand, Naïve Bayes, which presumes zero dependencies among features, trades accuracy for efficiency and often comes with underperformance. As a result, non-zero dependency structures, such as trees, are often used as more feasible probabilistic graph approximations; in particular, Tree Augmented Naïve Bayes (TAN) has been demonstrated to outperform Naïve Bayes and has become a popular choice. For applications where a variable is strongly influenced by multiple other features, TAN has been further extended to the k -dependency Bayesian classifier (KDB), where one feature can depend on up to k other features (for a given k ≥ 2 ). In such cases, however, the selection of the k parent features for each variable is often made through heuristic search methods (such as sorting), which do not guarantee an optimal approximation of network topology. In this paper, the novel notion of k -tree Augmented Naïve Bayes ( k -TAN) is introduced to augment Naïve Bayesian classifiers with k -tree topology as an approximation of Bayesian networks. It is proved that, under the Kullback–Leibler divergence measurement, k -tree topology approximation of Bayesian classifiers loses the minimum information with the topology of a maximum spanning k -tree, where the edge weights of the graph are mutual information between random variables conditional upon the class label. In addition, while in general finding a maximum spanning k -tree is NP-hard for fixed k ≥ 2 , this work shows that the approximation problem can be solved in time O ( n k + 1 ) if the spanning k -tree also desires to retain a given Hamiltonian path in the graph. Therefore, this algorithm can be employed to ensure efficient approximation of Bayesian networks with k -tree augmented Naïve Bayesian classifiers of the guaranteed minimum loss of information.
Suggested Citation
Fereshteh R. Dastjerdi & Liming Cai, 2025.
"Augmenting Naïve Bayes Classifiers with k -Tree Topology,"
Mathematics, MDPI, vol. 13(13), pages 1-16, July.
Handle:
RePEc:gam:jmathe:v:13:y:2025:i:13:p:2185-:d:1694586
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:13:y:2025:i:13:p:2185-:d:1694586. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.