Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Uncertainty Sampling | Query Strategies
Quizzes & Challenges
Quizzes
Challenges
/
Active Learning with Python

bookUncertainty Sampling

Uncertainty sampling is a fundamental strategy in active learning that focuses on selecting data points for labeling where your model is least certain about its predictions. By querying these most ambiguous instances, you can efficiently improve your model's performance with fewer labeled examples. The core idea is to identify samples for which the model's predicted probability is closest to being equally split among classes—meaning the model is unsure which class to assign. This is typically measured by looking at the maximum predicted probability for each sample: the lower this value, the less confident the model is in its prediction.

123456789101112131415161718192021222324252627
import numpy as np from sklearn.datasets import load_iris from sklearn.ensemble import RandomForestClassifier # Load a simple dataset X, y = load_iris(return_X_y=True) # Assume only a small labeled set is available n_initial = 10 X_labeled, y_labeled = X[:n_initial], y[:n_initial] X_unlabeled = X[n_initial:] # Train a classifier on the labeled data clf = RandomForestClassifier(random_state=42) clf.fit(X_labeled, y_labeled) # Predict class probabilities for the unlabeled pool probs = clf.predict_proba(X_unlabeled) # Compute the maximum probability for each sample max_probs = probs.max(axis=1) # Select the index of the sample with the lowest max probability most_uncertain_idx = np.argmin(max_probs) print("Most uncertain sample index in unlabeled pool:", most_uncertain_idx) print("Prediction probabilities for this sample:", probs[most_uncertain_idx])
copy
Note
Note

Uncertainty sampling is most effective during the early and middle stages of active learning, especially when the model has not yet seen enough diverse examples to make confident predictions. It works best when the model's uncertainty is a good proxy for its errors, such as with well-calibrated probabilistic classifiers. However, if the model is systematically overconfident or underconfident, or if the data distribution is highly imbalanced, uncertainty sampling may not always select the most informative samples.

question mark

In which situations is uncertainty sampling most effective in active learning

Select the correct answer

Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 2. Kapittel 1

Spør AI

expand

Spør AI

ChatGPT

Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår

bookUncertainty Sampling

Sveip for å vise menyen

Uncertainty sampling is a fundamental strategy in active learning that focuses on selecting data points for labeling where your model is least certain about its predictions. By querying these most ambiguous instances, you can efficiently improve your model's performance with fewer labeled examples. The core idea is to identify samples for which the model's predicted probability is closest to being equally split among classes—meaning the model is unsure which class to assign. This is typically measured by looking at the maximum predicted probability for each sample: the lower this value, the less confident the model is in its prediction.

123456789101112131415161718192021222324252627
import numpy as np from sklearn.datasets import load_iris from sklearn.ensemble import RandomForestClassifier # Load a simple dataset X, y = load_iris(return_X_y=True) # Assume only a small labeled set is available n_initial = 10 X_labeled, y_labeled = X[:n_initial], y[:n_initial] X_unlabeled = X[n_initial:] # Train a classifier on the labeled data clf = RandomForestClassifier(random_state=42) clf.fit(X_labeled, y_labeled) # Predict class probabilities for the unlabeled pool probs = clf.predict_proba(X_unlabeled) # Compute the maximum probability for each sample max_probs = probs.max(axis=1) # Select the index of the sample with the lowest max probability most_uncertain_idx = np.argmin(max_probs) print("Most uncertain sample index in unlabeled pool:", most_uncertain_idx) print("Prediction probabilities for this sample:", probs[most_uncertain_idx])
copy
Note
Note

Uncertainty sampling is most effective during the early and middle stages of active learning, especially when the model has not yet seen enough diverse examples to make confident predictions. It works best when the model's uncertainty is a good proxy for its errors, such as with well-calibrated probabilistic classifiers. However, if the model is systematically overconfident or underconfident, or if the data distribution is highly imbalanced, uncertainty sampling may not always select the most informative samples.

question mark

In which situations is uncertainty sampling most effective in active learning

Select the correct answer

Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 2. Kapittel 1
some-alt