Learning Parameters in Bayesian Networks
When you have a Bayesian network, you need to specify the conditional probability tables (CPTs) that define the relationships between variables. Parameter learning is the process of estimating these CPT entries directly from observed data. Instead of assigning probabilities by hand, you count how often each variable configuration occurs in your dataset and use these frequencies to fill in the CPTs. This approach is especially practical for discrete variables, where you can simply tally up cases for each possible parent-child combination.
12345678910111213141516171819202122232425262728293031323334353637import pandas as pd # Example dataset: columns are Rain, Sprinkler, WetGrass (all binary: 1=True, 0=False) data = pd.DataFrame([ {"Rain": 1, "Sprinkler": 0, "WetGrass": 1}, {"Rain": 1, "Sprinkler": 1, "WetGrass": 1}, {"Rain": 0, "Sprinkler": 1, "WetGrass": 1}, {"Rain": 0, "Sprinkler": 1, "WetGrass": 0}, {"Rain": 0, "Sprinkler": 0, "WetGrass": 0}, {"Rain": 1, "Sprinkler": 0, "WetGrass": 0}, ]) # CPT for Rain (no parents) rain_counts = data["Rain"].value_counts().sort_index() rain_cpt = rain_counts / len(data) print("P(Rain):") print(rain_cpt) # CPT for Sprinkler | Rain sprinkler_cpt = ( data.groupby("Rain")["Sprinkler"] .value_counts(normalize=True) .unstack() .fillna(0) ) print("\nP(Sprinkler | Rain):") print(sprinkler_cpt) # CPT for WetGrass | Rain, Sprinkler wetgrass_cpt = ( data.groupby(["Rain", "Sprinkler"])["WetGrass"] .value_counts(normalize=True) .unstack() .fillna(0) ) print("\nP(WetGrass | Rain, Sprinkler):") print(wetgrass_cpt)
The code above demonstrates how to use frequency counting to estimate CPTs for each variable in a Bayesian network. For a variable with no parents, like Rain, you simply count how often each value appears and divide by the total number of samples. For variables with parents, such as Sprinkler (parent: Rain) or WetGrass (parents: Rain and Sprinkler), you count how often each child value occurs for every parent configuration, then normalize within each group to get conditional probabilities. This approach assumes that your data is fully observed (no missing values) and that the dataset is representative of the true distribution—meaning every relevant combination of parent values appears often enough to estimate probabilities reliably.
Tack för dina kommentarer!
Fråga AI
Fråga AI
Fråga vad du vill eller prova någon av de föreslagna frågorna för att starta vårt samtal
Fantastiskt!
Completion betyg förbättrat till 10
Learning Parameters in Bayesian Networks
Svep för att visa menyn
When you have a Bayesian network, you need to specify the conditional probability tables (CPTs) that define the relationships between variables. Parameter learning is the process of estimating these CPT entries directly from observed data. Instead of assigning probabilities by hand, you count how often each variable configuration occurs in your dataset and use these frequencies to fill in the CPTs. This approach is especially practical for discrete variables, where you can simply tally up cases for each possible parent-child combination.
12345678910111213141516171819202122232425262728293031323334353637import pandas as pd # Example dataset: columns are Rain, Sprinkler, WetGrass (all binary: 1=True, 0=False) data = pd.DataFrame([ {"Rain": 1, "Sprinkler": 0, "WetGrass": 1}, {"Rain": 1, "Sprinkler": 1, "WetGrass": 1}, {"Rain": 0, "Sprinkler": 1, "WetGrass": 1}, {"Rain": 0, "Sprinkler": 1, "WetGrass": 0}, {"Rain": 0, "Sprinkler": 0, "WetGrass": 0}, {"Rain": 1, "Sprinkler": 0, "WetGrass": 0}, ]) # CPT for Rain (no parents) rain_counts = data["Rain"].value_counts().sort_index() rain_cpt = rain_counts / len(data) print("P(Rain):") print(rain_cpt) # CPT for Sprinkler | Rain sprinkler_cpt = ( data.groupby("Rain")["Sprinkler"] .value_counts(normalize=True) .unstack() .fillna(0) ) print("\nP(Sprinkler | Rain):") print(sprinkler_cpt) # CPT for WetGrass | Rain, Sprinkler wetgrass_cpt = ( data.groupby(["Rain", "Sprinkler"])["WetGrass"] .value_counts(normalize=True) .unstack() .fillna(0) ) print("\nP(WetGrass | Rain, Sprinkler):") print(wetgrass_cpt)
The code above demonstrates how to use frequency counting to estimate CPTs for each variable in a Bayesian network. For a variable with no parents, like Rain, you simply count how often each value appears and divide by the total number of samples. For variables with parents, such as Sprinkler (parent: Rain) or WetGrass (parents: Rain and Sprinkler), you count how often each child value occurs for every parent configuration, then normalize within each group to get conditional probabilities. This approach assumes that your data is fully observed (no missing values) and that the dataset is representative of the true distribution—meaning every relevant combination of parent values appears often enough to estimate probabilities reliably.
Tack för dina kommentarer!