An introduction to the Bayesian Network
About the Author: https://samanemami.github.io/
Abstract
In the following article, I introduce some of the basic concepts of the Bayesian Network, which comes from the probability theory. I continue with the definitions of probability theory, joint probability distribution, and graph theory in the Bayesian Network. Later, the inference and estimation of the variables’ probability are explained. Moreover, I include some examples of the Bayesian implementation in python.
1. Introduction
A graphical model is a tool that is used to show the conditional dependency between different variables. And the direction and edges show the dependency between the parent and child variable. We say that variable A is conditional dependent on variable B as $P(A|B)$.
Each variable in the Network has its probability distribution function, which based on the values is discrete or continuous. And if there is a dependency between two variables then the probability distribution of the child variable is dependent on the parent. And in $P(A|B)$ the A depended on B.
2. Bayesian Network
The Bayesian Network is a specific type of Graphical model, which is Acyclic Directed Graph or DAG. All the arcs in this type of graph are directed so we have a digraph instead of the graph and there is no turnover or path in the graph.
2.1. Example of Bayesian Network
Assume the Heart Failure problem by considering eight features as follows;
In this dataset, the Death Event is the Discrete feature and our binary target. To have the Bayesian Network over this example, we need to define dependencies between pairwise variables. There are two different approaches to this matter; First, we can write the dependency by specifying the edges. Or we may use the structure learning models, which try to find the best Network based on the given data points. For the structure learning approaches, we have different models such as Hill Climb search, Structure score, Tree search, Exhaustive search, etc. The one I used for this example was the Tree search.
2.2. Tree Search Structure learning model
This is based on the Chow–Liu tree search method. It works by approximating the joint probability distribution on second-order approximation [Chow & Liu (1968)]. It constructs a tree based on the approximated distribution in which the second-order approximation has the minimum divergence from the actual one (Considering the Kullback–Leibler divergence measurement).
2.3. DAG
Figure 1, illustrates A Bayesian Form for the heart failure dataset and features. We can see that between each variable there are some arcs and they all are directed that show the dependencies. For instance, there is a conditional dependency between High Blood pressure and Death events. Note that the Deat event is the target in our example. The edges in this Bayesian Network are written in the following table as well.
2.4. Inference
The procedure to approximate the probability of a node based on the known probabilities of the rest of the variables is called inference. Likewise, there are various approaches for inference and each has different complexity and computational budget. Inference approaches such as; Variable elimination, Belief propagation, MPLP, Gibbs sampling, etc.
If you are interested in a real-life example of Bayesian Network and Python code, you may enjoy reading my Jupyter-NoteBook which is published on Kaggle (link).
2.4.1. Belief propagation
In Belief Propagation the belief (Probabilities) of each variable updates with the different data points.
bp = BeliefPropagation(model)
bp.calibrate()
bp.get_clique_beliefs()
Conclusion
In this article, we introduced the Bayesian Network by referring to the Directed Graph Theory. To investigate more, the heart failure dataset was considered as an example to build a DAG over it and train the Bayesian Network and applied the inference and estimation.