Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Joint Distributions and Factorization | Foundations of Probabilistic Graphical Models
Practice
Projects
Quizzes & Challenges
Quizzes
Challenges
/
Probabilistic Graphical Models Essentials

bookJoint Distributions and Factorization

When you model a collection of random variables, the joint probability distribution describes the likelihood of every possible combination of their values. For three binary variables β€” say, AA, BB, and CC β€” the joint table must include a probability for each of the eight possible outcomes. As you increase the number of variables, the size of the joint table grows exponentially (doubling with each new binary variable). This quickly becomes infeasible, both to store and to estimate from data.

Probabilistic graphical models solve this problem by allowing you to factorize the joint distribution according to the dependencies shown in a graph. Instead of specifying every entry in the joint table, you break it into smaller, conditional distributions that are much easier to handle. For example, if the graph structure says AA influences BB, and BB influences CC, you can write the joint as P(A)P(B∣A)P(C∣B)P(A)P(B|A)P(C|B). This factorization dramatically reduces the number of parameters you need.

1234567891011121314151617181920212223242526272829
import numpy as np import pandas as pd # Define binary variables: 0 = False, 1 = True # Probabilities for P(A) P_A = {0: 0.6, 1: 0.4} # Probabilities for P(B|A) P_B_given_A = { 0: {0: 0.7, 1: 0.3}, # P(B|A=0) 1: {0: 0.2, 1: 0.8} # P(B|A=1) } # Probabilities for P(C|B) P_C_given_B = { 0: {0: 0.9, 1: 0.1}, # P(C|B=0) 1: {0: 0.4, 1: 0.6} # P(C|B=1) } # Compute the full joint table using the factorization P(A)P(B|A)P(C|B) rows = [] for a in [0, 1]: for b in [0, 1]: for c in [0, 1]: prob = P_A[a] * P_B_given_A[a][b] * P_C_given_B[b][c] rows.append({'A': a, 'B': b, 'C': c, 'P(A,B,C)': prob}) joint_table = pd.DataFrame(rows) print(joint_table)
copy

The code above constructs a joint probability table for three binary variables, but instead of listing all eight probabilities directly, it calculates each one using the factorization implied by the graph structure. The graph here has AA as a parent of BB, and BB as a parent of CC. This means:

  • The probability of AA is given by P(A)P(A);
  • The probability of BB depends only on AA, so you use P(B∣A)P(B|A);
  • The probability of CC depends only on BB, so you use P(C∣B)P(C|B).

By multiplying these factors for each possible value of AA, BB, and CC, you efficiently fill out the joint table. This approach only requires the probabilities for each factor (one for AA, two for B∣AB|A, and two for C∣BC|B), instead of specifying all eight joint probabilities individually. This is the power of factorizationβ€”guided by the structure of the graphical model, you reduce complexity and make modeling large systems feasible.

question mark

Suppose you have a graph where AA points to BB, and BB points to CC (A β†’ B β†’ C), as in the example above. Which factorization correctly represents the joint probability for this graph?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 1. ChapterΒ 3

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Suggested prompts:

Can you explain how the factorization reduces the number of parameters needed?

How would the joint table look if the variables were not independent?

Can you show how to compute marginal or conditional probabilities from this joint table?

bookJoint Distributions and Factorization

Swipe to show menu

When you model a collection of random variables, the joint probability distribution describes the likelihood of every possible combination of their values. For three binary variables β€” say, AA, BB, and CC β€” the joint table must include a probability for each of the eight possible outcomes. As you increase the number of variables, the size of the joint table grows exponentially (doubling with each new binary variable). This quickly becomes infeasible, both to store and to estimate from data.

Probabilistic graphical models solve this problem by allowing you to factorize the joint distribution according to the dependencies shown in a graph. Instead of specifying every entry in the joint table, you break it into smaller, conditional distributions that are much easier to handle. For example, if the graph structure says AA influences BB, and BB influences CC, you can write the joint as P(A)P(B∣A)P(C∣B)P(A)P(B|A)P(C|B). This factorization dramatically reduces the number of parameters you need.

1234567891011121314151617181920212223242526272829
import numpy as np import pandas as pd # Define binary variables: 0 = False, 1 = True # Probabilities for P(A) P_A = {0: 0.6, 1: 0.4} # Probabilities for P(B|A) P_B_given_A = { 0: {0: 0.7, 1: 0.3}, # P(B|A=0) 1: {0: 0.2, 1: 0.8} # P(B|A=1) } # Probabilities for P(C|B) P_C_given_B = { 0: {0: 0.9, 1: 0.1}, # P(C|B=0) 1: {0: 0.4, 1: 0.6} # P(C|B=1) } # Compute the full joint table using the factorization P(A)P(B|A)P(C|B) rows = [] for a in [0, 1]: for b in [0, 1]: for c in [0, 1]: prob = P_A[a] * P_B_given_A[a][b] * P_C_given_B[b][c] rows.append({'A': a, 'B': b, 'C': c, 'P(A,B,C)': prob}) joint_table = pd.DataFrame(rows) print(joint_table)
copy

The code above constructs a joint probability table for three binary variables, but instead of listing all eight probabilities directly, it calculates each one using the factorization implied by the graph structure. The graph here has AA as a parent of BB, and BB as a parent of CC. This means:

  • The probability of AA is given by P(A)P(A);
  • The probability of BB depends only on AA, so you use P(B∣A)P(B|A);
  • The probability of CC depends only on BB, so you use P(C∣B)P(C|B).

By multiplying these factors for each possible value of AA, BB, and CC, you efficiently fill out the joint table. This approach only requires the probabilities for each factor (one for AA, two for B∣AB|A, and two for C∣BC|B), instead of specifying all eight joint probabilities individually. This is the power of factorizationβ€”guided by the structure of the graphical model, you reduce complexity and make modeling large systems feasible.

question mark

Suppose you have a graph where AA points to BB, and BB points to CC (A β†’ B β†’ C), as in the example above. Which factorization correctly represents the joint probability for this graph?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 1. ChapterΒ 3
some-alt