Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprende Learning Parameters in Bayesian Networks | Bayesian Networks: Directed Models
Practice
Projects
Quizzes & Challenges
Quizzes
Challenges
/
Probabilistic Graphical Models Essentials

bookLearning Parameters in Bayesian Networks

When you have a Bayesian network, you need to specify the conditional probability tables (CPTs) that define the relationships between variables. Parameter learning is the process of estimating these CPT entries directly from observed data. Instead of assigning probabilities by hand, you count how often each variable configuration occurs in your dataset and use these frequencies to fill in the CPTs. This approach is especially practical for discrete variables, where you can simply tally up cases for each possible parent-child combination.

12345678910111213141516171819202122232425262728293031323334353637
import pandas as pd # Example dataset: columns are Rain, Sprinkler, WetGrass (all binary: 1=True, 0=False) data = pd.DataFrame([ {"Rain": 1, "Sprinkler": 0, "WetGrass": 1}, {"Rain": 1, "Sprinkler": 1, "WetGrass": 1}, {"Rain": 0, "Sprinkler": 1, "WetGrass": 1}, {"Rain": 0, "Sprinkler": 1, "WetGrass": 0}, {"Rain": 0, "Sprinkler": 0, "WetGrass": 0}, {"Rain": 1, "Sprinkler": 0, "WetGrass": 0}, ]) # CPT for Rain (no parents) rain_counts = data["Rain"].value_counts().sort_index() rain_cpt = rain_counts / len(data) print("P(Rain):") print(rain_cpt) # CPT for Sprinkler | Rain sprinkler_cpt = ( data.groupby("Rain")["Sprinkler"] .value_counts(normalize=True) .unstack() .fillna(0) ) print("\nP(Sprinkler | Rain):") print(sprinkler_cpt) # CPT for WetGrass | Rain, Sprinkler wetgrass_cpt = ( data.groupby(["Rain", "Sprinkler"])["WetGrass"] .value_counts(normalize=True) .unstack() .fillna(0) ) print("\nP(WetGrass | Rain, Sprinkler):") print(wetgrass_cpt)
copy

The code above demonstrates how to use frequency counting to estimate CPTs for each variable in a Bayesian network. For a variable with no parents, like Rain, you simply count how often each value appears and divide by the total number of samples. For variables with parents, such as Sprinkler (parent: Rain) or WetGrass (parents: Rain and Sprinkler), you count how often each child value occurs for every parent configuration, then normalize within each group to get conditional probabilities. This approach assumes that your data is fully observed (no missing values) and that the dataset is representative of the true distribution—meaning every relevant combination of parent values appears often enough to estimate probabilities reliably.

question mark

Which of the following data patterns would make parameter learning using frequency counting unreliable?

Select the correct answer

¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

Sección 2. Capítulo 4

Pregunte a AI

expand

Pregunte a AI

ChatGPT

Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla

Suggested prompts:

Can you explain how to handle missing data when estimating CPTs?

What if my dataset is very small or some parent-child combinations never appear?

Can you show how to use this approach for variables with more than two possible values?

bookLearning Parameters in Bayesian Networks

Desliza para mostrar el menú

When you have a Bayesian network, you need to specify the conditional probability tables (CPTs) that define the relationships between variables. Parameter learning is the process of estimating these CPT entries directly from observed data. Instead of assigning probabilities by hand, you count how often each variable configuration occurs in your dataset and use these frequencies to fill in the CPTs. This approach is especially practical for discrete variables, where you can simply tally up cases for each possible parent-child combination.

12345678910111213141516171819202122232425262728293031323334353637
import pandas as pd # Example dataset: columns are Rain, Sprinkler, WetGrass (all binary: 1=True, 0=False) data = pd.DataFrame([ {"Rain": 1, "Sprinkler": 0, "WetGrass": 1}, {"Rain": 1, "Sprinkler": 1, "WetGrass": 1}, {"Rain": 0, "Sprinkler": 1, "WetGrass": 1}, {"Rain": 0, "Sprinkler": 1, "WetGrass": 0}, {"Rain": 0, "Sprinkler": 0, "WetGrass": 0}, {"Rain": 1, "Sprinkler": 0, "WetGrass": 0}, ]) # CPT for Rain (no parents) rain_counts = data["Rain"].value_counts().sort_index() rain_cpt = rain_counts / len(data) print("P(Rain):") print(rain_cpt) # CPT for Sprinkler | Rain sprinkler_cpt = ( data.groupby("Rain")["Sprinkler"] .value_counts(normalize=True) .unstack() .fillna(0) ) print("\nP(Sprinkler | Rain):") print(sprinkler_cpt) # CPT for WetGrass | Rain, Sprinkler wetgrass_cpt = ( data.groupby(["Rain", "Sprinkler"])["WetGrass"] .value_counts(normalize=True) .unstack() .fillna(0) ) print("\nP(WetGrass | Rain, Sprinkler):") print(wetgrass_cpt)
copy

The code above demonstrates how to use frequency counting to estimate CPTs for each variable in a Bayesian network. For a variable with no parents, like Rain, you simply count how often each value appears and divide by the total number of samples. For variables with parents, such as Sprinkler (parent: Rain) or WetGrass (parents: Rain and Sprinkler), you count how often each child value occurs for every parent configuration, then normalize within each group to get conditional probabilities. This approach assumes that your data is fully observed (no missing values) and that the dataset is representative of the true distribution—meaning every relevant combination of parent values appears often enough to estimate probabilities reliably.

question mark

Which of the following data patterns would make parameter learning using frequency counting unreliable?

Select the correct answer

¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

Sección 2. Capítulo 4
some-alt