Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lära Learning Parameters in Bayesian Networks | Bayesian Networks: Directed Models
Probabilistic Graphical Models Essentials

bookLearning Parameters in Bayesian Networks

When you have a Bayesian network, you need to specify the conditional probability tables (CPTs) that define the relationships between variables. Parameter learning is the process of estimating these CPT entries directly from observed data. Instead of assigning probabilities by hand, you count how often each variable configuration occurs in your dataset and use these frequencies to fill in the CPTs. This approach is especially practical for discrete variables, where you can simply tally up cases for each possible parent-child combination.

12345678910111213141516171819202122232425262728293031323334353637
import pandas as pd # Example dataset: columns are Rain, Sprinkler, WetGrass (all binary: 1=True, 0=False) data = pd.DataFrame([ {"Rain": 1, "Sprinkler": 0, "WetGrass": 1}, {"Rain": 1, "Sprinkler": 1, "WetGrass": 1}, {"Rain": 0, "Sprinkler": 1, "WetGrass": 1}, {"Rain": 0, "Sprinkler": 1, "WetGrass": 0}, {"Rain": 0, "Sprinkler": 0, "WetGrass": 0}, {"Rain": 1, "Sprinkler": 0, "WetGrass": 0}, ]) # CPT for Rain (no parents) rain_counts = data["Rain"].value_counts().sort_index() rain_cpt = rain_counts / len(data) print("P(Rain):") print(rain_cpt) # CPT for Sprinkler | Rain sprinkler_cpt = ( data.groupby("Rain")["Sprinkler"] .value_counts(normalize=True) .unstack() .fillna(0) ) print("\nP(Sprinkler | Rain):") print(sprinkler_cpt) # CPT for WetGrass | Rain, Sprinkler wetgrass_cpt = ( data.groupby(["Rain", "Sprinkler"])["WetGrass"] .value_counts(normalize=True) .unstack() .fillna(0) ) print("\nP(WetGrass | Rain, Sprinkler):") print(wetgrass_cpt)
copy

The code above demonstrates how to use frequency counting to estimate CPTs for each variable in a Bayesian network. For a variable with no parents, like Rain, you simply count how often each value appears and divide by the total number of samples. For variables with parents, such as Sprinkler (parent: Rain) or WetGrass (parents: Rain and Sprinkler), you count how often each child value occurs for every parent configuration, then normalize within each group to get conditional probabilities. This approach assumes that your data is fully observed (no missing values) and that the dataset is representative of the true distribution—meaning every relevant combination of parent values appears often enough to estimate probabilities reliably.

question mark

Which of the following data patterns would make parameter learning using frequency counting unreliable?

Select the correct answer

Var allt tydligt?

Hur kan vi förbättra det?

Tack för dina kommentarer!

Avsnitt 2. Kapitel 4

Fråga AI

expand

Fråga AI

ChatGPT

Fråga vad du vill eller prova någon av de föreslagna frågorna för att starta vårt samtal

bookLearning Parameters in Bayesian Networks

Svep för att visa menyn

When you have a Bayesian network, you need to specify the conditional probability tables (CPTs) that define the relationships between variables. Parameter learning is the process of estimating these CPT entries directly from observed data. Instead of assigning probabilities by hand, you count how often each variable configuration occurs in your dataset and use these frequencies to fill in the CPTs. This approach is especially practical for discrete variables, where you can simply tally up cases for each possible parent-child combination.

12345678910111213141516171819202122232425262728293031323334353637
import pandas as pd # Example dataset: columns are Rain, Sprinkler, WetGrass (all binary: 1=True, 0=False) data = pd.DataFrame([ {"Rain": 1, "Sprinkler": 0, "WetGrass": 1}, {"Rain": 1, "Sprinkler": 1, "WetGrass": 1}, {"Rain": 0, "Sprinkler": 1, "WetGrass": 1}, {"Rain": 0, "Sprinkler": 1, "WetGrass": 0}, {"Rain": 0, "Sprinkler": 0, "WetGrass": 0}, {"Rain": 1, "Sprinkler": 0, "WetGrass": 0}, ]) # CPT for Rain (no parents) rain_counts = data["Rain"].value_counts().sort_index() rain_cpt = rain_counts / len(data) print("P(Rain):") print(rain_cpt) # CPT for Sprinkler | Rain sprinkler_cpt = ( data.groupby("Rain")["Sprinkler"] .value_counts(normalize=True) .unstack() .fillna(0) ) print("\nP(Sprinkler | Rain):") print(sprinkler_cpt) # CPT for WetGrass | Rain, Sprinkler wetgrass_cpt = ( data.groupby(["Rain", "Sprinkler"])["WetGrass"] .value_counts(normalize=True) .unstack() .fillna(0) ) print("\nP(WetGrass | Rain, Sprinkler):") print(wetgrass_cpt)
copy

The code above demonstrates how to use frequency counting to estimate CPTs for each variable in a Bayesian network. For a variable with no parents, like Rain, you simply count how often each value appears and divide by the total number of samples. For variables with parents, such as Sprinkler (parent: Rain) or WetGrass (parents: Rain and Sprinkler), you count how often each child value occurs for every parent configuration, then normalize within each group to get conditional probabilities. This approach assumes that your data is fully observed (no missing values) and that the dataset is representative of the true distribution—meaning every relevant combination of parent values appears often enough to estimate probabilities reliably.

question mark

Which of the following data patterns would make parameter learning using frequency counting unreliable?

Select the correct answer

Var allt tydligt?

Hur kan vi förbättra det?

Tack för dina kommentarer!

Avsnitt 2. Kapitel 4
some-alt