Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lära Density-Weighted Sampling | Query Strategies
Quizzes & Challenges
Quizzes
Challenges
/
Active Learning with Python

bookDensity-Weighted Sampling

Density-weighted sampling is a strategy in active learning that helps you select the most valuable data points for labeling. Unlike pure uncertainty sampling, which focuses only on how uncertain the model is about each sample, density-weighted sampling also considers how representative a sample is within the data distribution. The intuition is simple: you want to prioritize not just uncertain points, but also those that are typical of the dataset, avoiding rare outliers that may not help the model generalize. By combining informativeness (such as uncertainty) with sample density, you can focus your labeling effort on data points that both challenge the model and represent common patterns in your data.

12345678910111213141516171819202122232425262728293031
import numpy as np from sklearn.neighbors import NearestNeighbors from sklearn.datasets import make_classification from sklearn.ensemble import RandomForestClassifier # Generate a synthetic dataset X, y = make_classification(n_samples=200, n_features=2, n_informative=2, n_redundant=0, n_clusters_per_class=1, random_state=42) # Train a simple classifier clf = RandomForestClassifier(random_state=42) clf.fit(X, y) # Compute uncertainty: use predicted class probabilities probs = clf.predict_proba(X) uncertainty = 1 - np.max(probs, axis=1) # Least confident score # Estimate sample density using k-nearest neighbors k = 5 nbrs = NearestNeighbors(n_neighbors=k+1) # +1 because the point itself is included nbrs.fit(X) distances, _ = nbrs.kneighbors(X) density = 1 / (np.mean(distances[:, 1:], axis=1) + 1e-10) # Avoid division by zero # Combine uncertainty and density (simple product) density_weighted_score = uncertainty * density # Select the top 10 samples by density-weighted score top_indices = np.argsort(-density_weighted_score)[:10] print("Indices of top 10 density-weighted samples:", top_indices)
copy
question mark

Why is density weighting important in Active Learning?

Select the correct answer

Var allt tydligt?

Hur kan vi förbättra det?

Tack för dina kommentarer!

Avsnitt 2. Kapitel 4

Fråga AI

expand

Fråga AI

ChatGPT

Fråga vad du vill eller prova någon av de föreslagna frågorna för att starta vårt samtal

Suggested prompts:

Can you explain how the density is calculated in this example?

What does the uncertainty score represent in this context?

How would you adjust this approach for a different dataset or model?

bookDensity-Weighted Sampling

Svep för att visa menyn

Density-weighted sampling is a strategy in active learning that helps you select the most valuable data points for labeling. Unlike pure uncertainty sampling, which focuses only on how uncertain the model is about each sample, density-weighted sampling also considers how representative a sample is within the data distribution. The intuition is simple: you want to prioritize not just uncertain points, but also those that are typical of the dataset, avoiding rare outliers that may not help the model generalize. By combining informativeness (such as uncertainty) with sample density, you can focus your labeling effort on data points that both challenge the model and represent common patterns in your data.

12345678910111213141516171819202122232425262728293031
import numpy as np from sklearn.neighbors import NearestNeighbors from sklearn.datasets import make_classification from sklearn.ensemble import RandomForestClassifier # Generate a synthetic dataset X, y = make_classification(n_samples=200, n_features=2, n_informative=2, n_redundant=0, n_clusters_per_class=1, random_state=42) # Train a simple classifier clf = RandomForestClassifier(random_state=42) clf.fit(X, y) # Compute uncertainty: use predicted class probabilities probs = clf.predict_proba(X) uncertainty = 1 - np.max(probs, axis=1) # Least confident score # Estimate sample density using k-nearest neighbors k = 5 nbrs = NearestNeighbors(n_neighbors=k+1) # +1 because the point itself is included nbrs.fit(X) distances, _ = nbrs.kneighbors(X) density = 1 / (np.mean(distances[:, 1:], axis=1) + 1e-10) # Avoid division by zero # Combine uncertainty and density (simple product) density_weighted_score = uncertainty * density # Select the top 10 samples by density-weighted score top_indices = np.argsort(-density_weighted_score)[:10] print("Indices of top 10 density-weighted samples:", top_indices)
copy
question mark

Why is density weighting important in Active Learning?

Select the correct answer

Var allt tydligt?

Hur kan vi förbättra det?

Tack för dina kommentarer!

Avsnitt 2. Kapitel 4
some-alt