Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Margin And Entropy Sampling | Query Strategies
Quizzes & Challenges
Quizzes
Challenges
/
Active Learning with Python

bookMargin And Entropy Sampling

Margin sampling and entropy sampling are two widely used query strategies in active learning, both designed to identify the most informative unlabeled samples for labeling. Margin sampling focuses on the difference between the highest and the second-highest predicted class probabilities for each sample. The smaller this margin, the less confident the model is about its prediction, signaling a more uncertain and potentially informative example. In contrast, entropy sampling quantifies uncertainty using the entropy of the predicted class probability distribution for each sample. Entropy measures the amount of uncertainty or randomness; higher entropy values indicate that the model is less certain about its prediction across all possible classes, rather than just the top two.

1234567891011121314151617181920212223242526272829303132333435363738394041424344
import numpy as np from sklearn.datasets import make_classification from sklearn.linear_model import LogisticRegression # Create a more complex, noisy dataset X, y = make_classification( n_samples=600, n_features=6, n_informative=3, n_redundant=1, n_clusters_per_class=2, flip_y=0.15, # adds label noise → much more uncertainty class_sep=0.6, # higher overlap between classes random_state=42 ) # Train a weaker classifier to increase uncertainty clf = LogisticRegression(max_iter=2000) clf.fit(X, y) # Take a batch from the dataset probs = clf.predict_proba(X[:5]) # Margin sampling margins = [] for prob in probs: sorted_probs = np.sort(prob)[::-1] margin = sorted_probs[0] - sorted_probs[1] margins.append(margin) # Entropy sampling entropies = [] for prob in probs: entropy = -np.sum(prob * np.log(prob + 1e-12)) entropies.append(entropy) print("Class probabilities for each sample:") print(probs.round(4)) print("\nMargin values (smaller = more uncertain):") print([round(m, 4) for m in margins]) print("\nEntropy values (higher = more uncertain):") print([round(e, 4) for e in entropies])
copy
question mark

Which query strategy — margin sampling or entropy sampling — is generally more sensitive to the overall distribution of class probabilities?

Select the correct answer

Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 2. Kapittel 2

Spør AI

expand

Spør AI

ChatGPT

Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår

bookMargin And Entropy Sampling

Sveip for å vise menyen

Margin sampling and entropy sampling are two widely used query strategies in active learning, both designed to identify the most informative unlabeled samples for labeling. Margin sampling focuses on the difference between the highest and the second-highest predicted class probabilities for each sample. The smaller this margin, the less confident the model is about its prediction, signaling a more uncertain and potentially informative example. In contrast, entropy sampling quantifies uncertainty using the entropy of the predicted class probability distribution for each sample. Entropy measures the amount of uncertainty or randomness; higher entropy values indicate that the model is less certain about its prediction across all possible classes, rather than just the top two.

1234567891011121314151617181920212223242526272829303132333435363738394041424344
import numpy as np from sklearn.datasets import make_classification from sklearn.linear_model import LogisticRegression # Create a more complex, noisy dataset X, y = make_classification( n_samples=600, n_features=6, n_informative=3, n_redundant=1, n_clusters_per_class=2, flip_y=0.15, # adds label noise → much more uncertainty class_sep=0.6, # higher overlap between classes random_state=42 ) # Train a weaker classifier to increase uncertainty clf = LogisticRegression(max_iter=2000) clf.fit(X, y) # Take a batch from the dataset probs = clf.predict_proba(X[:5]) # Margin sampling margins = [] for prob in probs: sorted_probs = np.sort(prob)[::-1] margin = sorted_probs[0] - sorted_probs[1] margins.append(margin) # Entropy sampling entropies = [] for prob in probs: entropy = -np.sum(prob * np.log(prob + 1e-12)) entropies.append(entropy) print("Class probabilities for each sample:") print(probs.round(4)) print("\nMargin values (smaller = more uncertain):") print([round(m, 4) for m in margins]) print("\nEntropy values (higher = more uncertain):") print([round(e, 4) for e in entropies])
copy
question mark

Which query strategy — margin sampling or entropy sampling — is generally more sensitive to the overall distribution of class probabilities?

Select the correct answer

Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 2. Kapittel 2
some-alt