Learn Joint Distributions and Factorization | Foundations of Probabilistic Graphical Models

Swipe to show menu

When you model a collection of random variables, the joint probability distribution describes the likelihood of every possible combination of their values. For three binary variables — say, $A$ , $B$ , and $C$ — the joint table must include a probability for each of the eight possible outcomes. As you increase the number of variables, the size of the joint table grows exponentially (doubling with each new binary variable). This quickly becomes infeasible, both to store and to estimate from data.

Probabilistic graphical models solve this problem by allowing you to factorize the joint distribution according to the dependencies shown in a graph. Instead of specifying every entry in the joint table, you break it into smaller, conditional distributions that are much easier to handle. For example, if the graph structure says $A$ influences $B$ , and $B$ influences $C$ , you can write the joint as $P(A)P(B|A)P(C|B)$ . This factorization dramatically reduces the number of parameters you need.


              1234567891011121314151617181920212223242526272829
            
import numpy as np
import pandas as pd

# Define binary variables: 0 = False, 1 = True
# Probabilities for P(A)
P_A = {0: 0.6, 1: 0.4}

# Probabilities for P(B|A)
P_B_given_A = {
    0: {0: 0.7, 1: 0.3},  # P(B|A=0)
    1: {0: 0.2, 1: 0.8}   # P(B|A=1)
}

# Probabilities for P(C|B)
P_C_given_B = {
    0: {0: 0.9, 1: 0.1},  # P(C|B=0)
    1: {0: 0.4, 1: 0.6}   # P(C|B=1)
}

# Compute the full joint table using the factorization P(A)P(B|A)P(C|B)
rows = []
for a in [0, 1]:
    for b in [0, 1]:
        for c in [0, 1]:
            prob = P_A[a] * P_B_given_A[a][b] * P_C_given_B[b][c]
            rows.append({'A': a, 'B': b, 'C': c, 'P(A,B,C)': prob})

joint_table = pd.DataFrame(rows)
print(joint_table)

The code above constructs a joint probability table for three binary variables, but instead of listing all eight probabilities directly, it calculates each one using the factorization implied by the graph structure. The graph here has $A$ as a parent of $B$ , and $B$ as a parent of $C$ . This means:

The probability of $A$ is given by $P(A)$ ;
The probability of $B$ depends only on $A$ , so you use $P(B|A)$ ;
The probability of $C$ depends only on $B$ , so you use $P(C|B)$ .

By multiplying these factors for each possible value of $A$ , $B$ , and $C$ , you efficiently fill out the joint table. This approach only requires the probabilities for each factor (one for $A$ , two for $B|A$ , and two for $C|B$ ), instead of specifying all eight joint probabilities individually. This is the power of factorization—guided by the structure of the graphical model, you reduce complexity and make modeling large systems feasible.

Everything was clear?

Thanks for your feedback!

Section 1. Chapter 3

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Section 1. Chapter 3