Learning Efficiency Curves
Understanding how efficiently an Active Learning (AL) system improves with more labeled data is crucial for evaluating its effectiveness. Learning curves provide a visual tool for this purpose: they plot model accuracy (or another performance metric) against the number of labeled samples acquired during AL iterations. These curves help you see how quickly your model benefits from new information, and how much data is needed to reach a desired level of performance. In AL, the goal is to achieve high accuracy with as few labeled samples as possible, so the shape and steepness of your learning curve can reveal how well your sampling strategy is working.
123456789101112131415161718192021222324252627282930313233343536373839import numpy as np import matplotlib.pyplot as plt from sklearn.datasets import make_classification from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score # Simulate a pool of unlabeled data X, y = make_classification(n_samples=1200, n_features=20, n_informative=15, n_classes=2, random_state=42) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) # Start with a small labeled set initial_idx = np.random.choice(range(len(X_train)), size=20, replace=False) labeled_idx = list(initial_idx) unlabeled_idx = list(set(range(len(X_train))) - set(labeled_idx)) accuracies = [] labeled_set_sizes = [] # Simulate AL iterations for i in range(10): clf = RandomForestClassifier(random_state=42) clf.fit(X_train[labeled_idx], y_train[labeled_idx]) y_pred = clf.predict(X_test) acc = accuracy_score(y_test, y_pred) accuracies.append(acc) labeled_set_sizes.append(len(labeled_idx)) # Select 20 most uncertain samples (simulate with random selection here) if len(unlabeled_idx) >= 20: new_samples = np.random.choice(unlabeled_idx, size=20, replace=False) labeled_idx.extend(new_samples) unlabeled_idx = list(set(unlabeled_idx) - set(new_samples)) plt.plot(labeled_set_sizes, accuracies, marker='o') plt.xlabel('Number of Labeled Samples') plt.ylabel('Accuracy') plt.title('Learning Curve: Accuracy vs. Labeled Set Size') plt.grid(True) plt.show()
A learning curve in Active Learning shows how efficiently a model improves as more labeled data is added. A steep curve means rapid accuracy gains from each new label—this is ideal. A flat curve suggests new labels add little value. Comparing curves helps you see which AL strategy achieves high accuracy with fewer labels.
1. What does a steeper learning curve indicate in the context of Active Learning?
2. Which metric is most relevant for comparing AL strategies?
Grazie per i tuoi commenti!
Chieda ad AI
Chieda ad AI
Chieda pure quello che desidera o provi una delle domande suggerite per iniziare la nostra conversazione
Awesome!
Completion rate improved to 10
Learning Efficiency Curves
Scorri per mostrare il menu
Understanding how efficiently an Active Learning (AL) system improves with more labeled data is crucial for evaluating its effectiveness. Learning curves provide a visual tool for this purpose: they plot model accuracy (or another performance metric) against the number of labeled samples acquired during AL iterations. These curves help you see how quickly your model benefits from new information, and how much data is needed to reach a desired level of performance. In AL, the goal is to achieve high accuracy with as few labeled samples as possible, so the shape and steepness of your learning curve can reveal how well your sampling strategy is working.
123456789101112131415161718192021222324252627282930313233343536373839import numpy as np import matplotlib.pyplot as plt from sklearn.datasets import make_classification from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score # Simulate a pool of unlabeled data X, y = make_classification(n_samples=1200, n_features=20, n_informative=15, n_classes=2, random_state=42) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) # Start with a small labeled set initial_idx = np.random.choice(range(len(X_train)), size=20, replace=False) labeled_idx = list(initial_idx) unlabeled_idx = list(set(range(len(X_train))) - set(labeled_idx)) accuracies = [] labeled_set_sizes = [] # Simulate AL iterations for i in range(10): clf = RandomForestClassifier(random_state=42) clf.fit(X_train[labeled_idx], y_train[labeled_idx]) y_pred = clf.predict(X_test) acc = accuracy_score(y_test, y_pred) accuracies.append(acc) labeled_set_sizes.append(len(labeled_idx)) # Select 20 most uncertain samples (simulate with random selection here) if len(unlabeled_idx) >= 20: new_samples = np.random.choice(unlabeled_idx, size=20, replace=False) labeled_idx.extend(new_samples) unlabeled_idx = list(set(unlabeled_idx) - set(new_samples)) plt.plot(labeled_set_sizes, accuracies, marker='o') plt.xlabel('Number of Labeled Samples') plt.ylabel('Accuracy') plt.title('Learning Curve: Accuracy vs. Labeled Set Size') plt.grid(True) plt.show()
A learning curve in Active Learning shows how efficiently a model improves as more labeled data is added. A steep curve means rapid accuracy gains from each new label—this is ideal. A flat curve suggests new labels add little value. Comparing curves helps you see which AL strategy achieves high accuracy with fewer labels.
1. What does a steeper learning curve indicate in the context of Active Learning?
2. Which metric is most relevant for comparing AL strategies?
Grazie per i tuoi commenti!