Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Query-By-Committee (QBC) | Query Strategies
Active Learning with Python

bookQuery-By-Committee (QBC)

Query-By-Committee (QBC) is an active learning strategy that leverages the collective wisdom of several models, called a committee, to identify which unlabeled samples are most informative. Instead of relying on a single model's uncertainty, QBC selects samples where the committee members most disagree, under the assumption that such disagreement indicates areas where the models are uncertain or lack sufficient information. The rationale is that by presenting these contentious samples to an oracle (such as a human annotator), you can quickly resolve the points of confusion and accelerate learning. Disagreement can be measured in several ways, but a common approach is to look at how committee members vote on the predicted label for each sample.

1234567891011121314151617181920212223242526272829303132333435363738394041
import numpy as np from sklearn.datasets import make_classification from sklearn.ensemble import RandomForestClassifier from sklearn.linear_model import LogisticRegression from sklearn.svm import SVC from sklearn.model_selection import train_test_split from scipy.stats import entropy # Generate a toy classification dataset X, y = make_classification(n_samples=200, n_features=10, n_informative=5, random_state=42) X_train, X_pool, y_train, y_pool = train_test_split(X, y, test_size=0.5, random_state=42) # Initialize committee of classifiers committee = [ RandomForestClassifier(n_estimators=10, random_state=0), LogisticRegression(max_iter=1000, random_state=1), SVC(probability=True, random_state=2) ] # Train each classifier on the labeled training set for clf in committee: clf.fit(X_train, y_train) # Get committee predictions on the pool set predictions = np.array([clf.predict(X_pool) for clf in committee]) # shape: (n_committee, n_samples) # For each sample, count votes for each class n_classes = len(np.unique(y)) vote_counts = np.zeros((X_pool.shape[0], n_classes)) for i in range(X_pool.shape[0]): for pred in predictions[:, i]: vote_counts[i, pred] += 1 # Calculate vote entropy for each sample (higher = more disagreement) vote_probs = vote_counts / len(committee) vote_entropies = entropy(vote_probs.T) # Show top 5 samples with highest disagreement top_indices = np.argsort(-vote_entropies)[:5] for idx in top_indices: print(f"Sample {idx}: Vote Entropy = {vote_entropies[idx]:.3f}, Votes = {vote_counts[idx]}")
copy
Note
Note

While QBC can provide a richer measure of uncertainty by harnessing diverse model perspectives, it comes at a computational cost. Training and maintaining multiple models increases resource usage, especially as the committee grows or as models become more complex. Striking a balance between committee diversity (which improves disagreement detection) and computational efficiency is crucial for practical active learning with QBC.

1. What is the main advantage of QBC over single-model uncertainty sampling?

2. Which metric can be used to quantify committee disagreement?

question mark

What is the main advantage of QBC over single-model uncertainty sampling?

Select the correct answer

question mark

Which metric can be used to quantify committee disagreement?

Select the correct answer

Var alt klart?

Hvordan kan vi forbedre det?

Tak for dine kommentarer!

Sektion 2. Kapitel 3

Spørg AI

expand

Spørg AI

ChatGPT

Spørg om hvad som helst eller prøv et af de foreslåede spørgsmål for at starte vores chat

bookQuery-By-Committee (QBC)

Stryg for at vise menuen

Query-By-Committee (QBC) is an active learning strategy that leverages the collective wisdom of several models, called a committee, to identify which unlabeled samples are most informative. Instead of relying on a single model's uncertainty, QBC selects samples where the committee members most disagree, under the assumption that such disagreement indicates areas where the models are uncertain or lack sufficient information. The rationale is that by presenting these contentious samples to an oracle (such as a human annotator), you can quickly resolve the points of confusion and accelerate learning. Disagreement can be measured in several ways, but a common approach is to look at how committee members vote on the predicted label for each sample.

1234567891011121314151617181920212223242526272829303132333435363738394041
import numpy as np from sklearn.datasets import make_classification from sklearn.ensemble import RandomForestClassifier from sklearn.linear_model import LogisticRegression from sklearn.svm import SVC from sklearn.model_selection import train_test_split from scipy.stats import entropy # Generate a toy classification dataset X, y = make_classification(n_samples=200, n_features=10, n_informative=5, random_state=42) X_train, X_pool, y_train, y_pool = train_test_split(X, y, test_size=0.5, random_state=42) # Initialize committee of classifiers committee = [ RandomForestClassifier(n_estimators=10, random_state=0), LogisticRegression(max_iter=1000, random_state=1), SVC(probability=True, random_state=2) ] # Train each classifier on the labeled training set for clf in committee: clf.fit(X_train, y_train) # Get committee predictions on the pool set predictions = np.array([clf.predict(X_pool) for clf in committee]) # shape: (n_committee, n_samples) # For each sample, count votes for each class n_classes = len(np.unique(y)) vote_counts = np.zeros((X_pool.shape[0], n_classes)) for i in range(X_pool.shape[0]): for pred in predictions[:, i]: vote_counts[i, pred] += 1 # Calculate vote entropy for each sample (higher = more disagreement) vote_probs = vote_counts / len(committee) vote_entropies = entropy(vote_probs.T) # Show top 5 samples with highest disagreement top_indices = np.argsort(-vote_entropies)[:5] for idx in top_indices: print(f"Sample {idx}: Vote Entropy = {vote_entropies[idx]:.3f}, Votes = {vote_counts[idx]}")
copy
Note
Note

While QBC can provide a richer measure of uncertainty by harnessing diverse model perspectives, it comes at a computational cost. Training and maintaining multiple models increases resource usage, especially as the committee grows or as models become more complex. Striking a balance between committee diversity (which improves disagreement detection) and computational efficiency is crucial for practical active learning with QBC.

1. What is the main advantage of QBC over single-model uncertainty sampling?

2. Which metric can be used to quantify committee disagreement?

question mark

What is the main advantage of QBC over single-model uncertainty sampling?

Select the correct answer

question mark

Which metric can be used to quantify committee disagreement?

Select the correct answer

Var alt klart?

Hvordan kan vi forbedre det?

Tak for dine kommentarer!

Sektion 2. Kapitel 3
some-alt