I need to learn a Bayesian Network Structure from a dataset. I read the book titled "Learning Bayesian Networks" written Neapolitan and Richard but I have no clear idea.
According to the book from the data i can:
1) Create all the DAG Pattern, where a DAG Pattern is an equivalence class of DAG (in the respect of Markov Equivalence).
2) I can create all multinomial augmented bayesian newtowrk correlated to any of the equivalence class;
3) I use a score function to find the best multinomial augmented bayesian newtowrk;
Now i have not understood how to work this scoring function. In the literature, there is more than one? Can you help me understand precisely how to work the main scoring function?
I have also read that this research is hyper-exponential compared to the number of variables N, is that right? instead, there is some other method more efficient?
Best Answer
The score function measures whether the DAG structure that has been learnt is a good fit to the dataset. Of course, you can define the score function in several ways, depending on the dataset, and the ultimate objective of learning the DAG structure. One commonly used score function is the log-posterior.
Given dataset $D$ and a vector $mathbf{X}$ of variables, the log posterior score function $S(D,G)$ is defined as $$ S(D,G) := log{p_{pr}(G)} + log{p(D|G)} $$ where $p_{pr}$ is the prior over the DAGs. Let the set of parameters be $theta in Theta$. $p(D|G)$ is the marginal likelihood $$ p(D|G)= int_{Theta}{p(D|G, theta) cdot p_{pr}(theta)dtheta} $$
The bnlearn
R Package defines several score functions depending on the nature of the data (whether it is categorical, continuous or mixed).
Categorical data (multinomial distribution):
- the multinomial log-likelihood;
- the Akaike Information Criterion (AIC);
- the Bayesian Information Criterion (BIC);
- a score equivalent Dirichlet posterior density (BDe);
- a sparse Dirichlet posterior density (BDs);
- a Dirichlet posterior density based on Jeffrey's prior (BDJ);
- a modified Bayesian Dirichlet for mixtures of interventional and observational data;
- the K2 score;
Continuous data (multivariate normal distribution):
- the multivariate Gaussian log-likelihood;
- the corresponding Akaike Information Criterion (AIC);
- the corresponding Bayesian Information Criterion (BIC);
- a score equivalent Gaussian posterior density (BGe);
Mixed data (conditional Gaussian distribution):
- the conditional Gaussian log-likelihood;
- the corresponding Akaike Information Criterion (AIC);
- the corresponding Bayesian Information Criterion (BIC).
For $n$ variables, the number of possible DAGs is super-exponential. Here is a link to the integer sequence. As you can see, the number grows very fast. https://oeis.org/A003024
Similar Posts:
- Solved – Posterior mode, posterior mean and posterior variance of a posterior distribution of dirichlet form
- Solved – Posterior Predictive Distribution as Expectation of Likelihood
- Solved – Hypothesis testing multiple choice question with single answer
- Solved – Maximum likelihood estimation of a Dirichlet distribution multivariate parameters
- Solved – Posterior predictive distribution – dirichlet multinomial model