Density-Weighted Sampling
Density-weighted sampling is a strategy in active learning that helps you select the most valuable data points for labeling. Unlike pure uncertainty sampling, which focuses only on how uncertain the model is about each sample, density-weighted sampling also considers how representative a sample is within the data distribution. The intuition is simple: you want to prioritize not just uncertain points, but also those that are typical of the dataset, avoiding rare outliers that may not help the model generalize. By combining informativeness (such as uncertainty) with sample density, you can focus your labeling effort on data points that both challenge the model and represent common patterns in your data.
12345678910111213141516171819202122232425262728293031import numpy as np from sklearn.neighbors import NearestNeighbors from sklearn.datasets import make_classification from sklearn.ensemble import RandomForestClassifier # Generate a synthetic dataset X, y = make_classification(n_samples=200, n_features=2, n_informative=2, n_redundant=0, n_clusters_per_class=1, random_state=42) # Train a simple classifier clf = RandomForestClassifier(random_state=42) clf.fit(X, y) # Compute uncertainty: use predicted class probabilities probs = clf.predict_proba(X) uncertainty = 1 - np.max(probs, axis=1) # Least confident score # Estimate sample density using k-nearest neighbors k = 5 nbrs = NearestNeighbors(n_neighbors=k+1) # +1 because the point itself is included nbrs.fit(X) distances, _ = nbrs.kneighbors(X) density = 1 / (np.mean(distances[:, 1:], axis=1) + 1e-10) # Avoid division by zero # Combine uncertainty and density (simple product) density_weighted_score = uncertainty * density # Select the top 10 samples by density-weighted score top_indices = np.argsort(-density_weighted_score)[:10] print("Indices of top 10 density-weighted samples:", top_indices)
Дякуємо за ваш відгук!
Запитати АІ
Запитати АІ
Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат
Awesome!
Completion rate improved to 10
Density-Weighted Sampling
Свайпніть щоб показати меню
Density-weighted sampling is a strategy in active learning that helps you select the most valuable data points for labeling. Unlike pure uncertainty sampling, which focuses only on how uncertain the model is about each sample, density-weighted sampling also considers how representative a sample is within the data distribution. The intuition is simple: you want to prioritize not just uncertain points, but also those that are typical of the dataset, avoiding rare outliers that may not help the model generalize. By combining informativeness (such as uncertainty) with sample density, you can focus your labeling effort on data points that both challenge the model and represent common patterns in your data.
12345678910111213141516171819202122232425262728293031import numpy as np from sklearn.neighbors import NearestNeighbors from sklearn.datasets import make_classification from sklearn.ensemble import RandomForestClassifier # Generate a synthetic dataset X, y = make_classification(n_samples=200, n_features=2, n_informative=2, n_redundant=0, n_clusters_per_class=1, random_state=42) # Train a simple classifier clf = RandomForestClassifier(random_state=42) clf.fit(X, y) # Compute uncertainty: use predicted class probabilities probs = clf.predict_proba(X) uncertainty = 1 - np.max(probs, axis=1) # Least confident score # Estimate sample density using k-nearest neighbors k = 5 nbrs = NearestNeighbors(n_neighbors=k+1) # +1 because the point itself is included nbrs.fit(X) distances, _ = nbrs.kneighbors(X) density = 1 / (np.mean(distances[:, 1:], axis=1) + 1e-10) # Avoid division by zero # Combine uncertainty and density (simple product) density_weighted_score = uncertainty * density # Select the top 10 samples by density-weighted score top_indices = np.argsort(-density_weighted_score)[:10] print("Indices of top 10 density-weighted samples:", top_indices)
Дякуємо за ваш відгук!