Leer Uncertainty Sampling | Query Strategies

Veeg om het menu te tonen

Uncertainty sampling is a fundamental strategy in active learning that focuses on selecting data points for labeling where your model is least certain about its predictions. By querying these most ambiguous instances, you can efficiently improve your model's performance with fewer labeled examples. The core idea is to identify samples for which the model's predicted probability is closest to being equally split among classes—meaning the model is unsure which class to assign. This is typically measured by looking at the maximum predicted probability for each sample: the lower this value, the less confident the model is in its prediction.


              123456789101112131415161718192021222324252627
            
import numpy as np
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier

# Load a simple dataset
X, y = load_iris(return_X_y=True)

# Assume only a small labeled set is available
n_initial = 10
X_labeled, y_labeled = X[:n_initial], y[:n_initial]
X_unlabeled = X[n_initial:]

# Train a classifier on the labeled data
clf = RandomForestClassifier(random_state=42)
clf.fit(X_labeled, y_labeled)

# Predict class probabilities for the unlabeled pool
probs = clf.predict_proba(X_unlabeled)

# Compute the maximum probability for each sample
max_probs = probs.max(axis=1)

# Select the index of the sample with the lowest max probability
most_uncertain_idx = np.argmin(max_probs)

print("Most uncertain sample index in unlabeled pool:", most_uncertain_idx)
print("Prediction probabilities for this sample:", probs[most_uncertain_idx])

Note

Uncertainty sampling is most effective during the early and middle stages of active learning, especially when the model has not yet seen enough diverse examples to make confident predictions. It works best when the model's uncertainty is a good proxy for its errors, such as with well-calibrated probabilistic classifiers. However, if the model is systematically overconfident or underconfident, or if the data distribution is highly imbalanced, uncertainty sampling may not always select the most informative samples.

Was alles duidelijk?

Bedankt voor je feedback!

Sectie 2. Hoofdstuk 1

Vraag AI

Vraag wat u wilt of probeer een van de voorgestelde vragen om onze chat te starten.

Sectie 2. Hoofdstuk 1