Lære Uncertainty Sampling | Query Strategies

Uncertainty sampling is a fundamental strategy in active learning that focuses on selecting data points for labeling where your model is least certain about its predictions. By querying these most ambiguous instances, you can efficiently improve your model's performance with fewer labeled examples. The core idea is to identify samples for which the model's predicted probability is closest to being equally split among classes—meaning the model is unsure which class to assign. This is typically measured by looking at the maximum predicted probability for each sample: the lower this value, the less confident the model is in its prediction.


              123456789101112131415161718192021222324252627
            
import numpy as np
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier

# Load a simple dataset
X, y = load_iris(return_X_y=True)

# Assume only a small labeled set is available
n_initial = 10
X_labeled, y_labeled = X[:n_initial], y[:n_initial]
X_unlabeled = X[n_initial:]

# Train a classifier on the labeled data
clf = RandomForestClassifier(random_state=42)
clf.fit(X_labeled, y_labeled)

# Predict class probabilities for the unlabeled pool
probs = clf.predict_proba(X_unlabeled)

# Compute the maximum probability for each sample
max_probs = probs.max(axis=1)

# Select the index of the sample with the lowest max probability
most_uncertain_idx = np.argmin(max_probs)

print("Most uncertain sample index in unlabeled pool:", most_uncertain_idx)
print("Prediction probabilities for this sample:", probs[most_uncertain_idx])

Note

Uncertainty sampling is most effective during the early and middle stages of active learning, especially when the model has not yet seen enough diverse examples to make confident predictions. It works best when the model's uncertainty is a good proxy for its errors, such as with well-calibrated probabilistic classifiers. However, if the model is systematically overconfident or underconfident, or if the data distribution is highly imbalanced, uncertainty sampling may not always select the most informative samples.

Alt var klart?

Takk for tilbakemeldingene dine!

Seksjon 2. Kapittel 1

Spør AI

Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår

Sveip for å vise menyen


              123456789101112131415161718192021222324252627
            
import numpy as np
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier

# Load a simple dataset
X, y = load_iris(return_X_y=True)

# Assume only a small labeled set is available
n_initial = 10
X_labeled, y_labeled = X[:n_initial], y[:n_initial]
X_unlabeled = X[n_initial:]

# Train a classifier on the labeled data
clf = RandomForestClassifier(random_state=42)
clf.fit(X_labeled, y_labeled)

# Predict class probabilities for the unlabeled pool
probs = clf.predict_proba(X_unlabeled)

# Compute the maximum probability for each sample
max_probs = probs.max(axis=1)

# Select the index of the sample with the lowest max probability
most_uncertain_idx = np.argmin(max_probs)

print("Most uncertain sample index in unlabeled pool:", most_uncertain_idx)
print("Prediction probabilities for this sample:", probs[most_uncertain_idx])

Note

Alt var klart?

Takk for tilbakemeldingene dine!

Seksjon 2. Kapittel 1