Learning Efficiency Curves
Understanding how efficiently an Active Learning (AL) system improves with more labeled data is crucial for evaluating its effectiveness. Learning curves provide a visual tool for this purpose: they plot model accuracy (or another performance metric) against the number of labeled samples acquired during AL iterations. These curves help you see how quickly your model benefits from new information, and how much data is needed to reach a desired level of performance. In AL, the goal is to achieve high accuracy with as few labeled samples as possible, so the shape and steepness of your learning curve can reveal how well your sampling strategy is working.
123456789101112131415161718192021222324252627282930313233343536373839import numpy as np import matplotlib.pyplot as plt from sklearn.datasets import make_classification from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score # Simulate a pool of unlabeled data X, y = make_classification(n_samples=1200, n_features=20, n_informative=15, n_classes=2, random_state=42) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) # Start with a small labeled set initial_idx = np.random.choice(range(len(X_train)), size=20, replace=False) labeled_idx = list(initial_idx) unlabeled_idx = list(set(range(len(X_train))) - set(labeled_idx)) accuracies = [] labeled_set_sizes = [] # Simulate AL iterations for i in range(10): clf = RandomForestClassifier(random_state=42) clf.fit(X_train[labeled_idx], y_train[labeled_idx]) y_pred = clf.predict(X_test) acc = accuracy_score(y_test, y_pred) accuracies.append(acc) labeled_set_sizes.append(len(labeled_idx)) # Select 20 most uncertain samples (simulate with random selection here) if len(unlabeled_idx) >= 20: new_samples = np.random.choice(unlabeled_idx, size=20, replace=False) labeled_idx.extend(new_samples) unlabeled_idx = list(set(unlabeled_idx) - set(new_samples)) plt.plot(labeled_set_sizes, accuracies, marker='o') plt.xlabel('Number of Labeled Samples') plt.ylabel('Accuracy') plt.title('Learning Curve: Accuracy vs. Labeled Set Size') plt.grid(True) plt.show()
A learning curve in Active Learning shows how efficiently a model improves as more labeled data is added. A steep curve means rapid accuracy gains from each new label—this is ideal. A flat curve suggests new labels add little value. Comparing curves helps you see which AL strategy achieves high accuracy with fewer labels.
1. What does a steeper learning curve indicate in the context of Active Learning?
2. Which metric is most relevant for comparing AL strategies?
Merci pour vos commentaires !
Demandez à l'IA
Demandez à l'IA
Posez n'importe quelle question ou essayez l'une des questions suggérées pour commencer notre discussion
Can you explain how the active learning sampling strategy works in this example?
What does the learning curve tell us about the model's performance?
How could I modify this code to use a different uncertainty sampling method?
Awesome!
Completion rate improved to 10
Learning Efficiency Curves
Glissez pour afficher le menu
Understanding how efficiently an Active Learning (AL) system improves with more labeled data is crucial for evaluating its effectiveness. Learning curves provide a visual tool for this purpose: they plot model accuracy (or another performance metric) against the number of labeled samples acquired during AL iterations. These curves help you see how quickly your model benefits from new information, and how much data is needed to reach a desired level of performance. In AL, the goal is to achieve high accuracy with as few labeled samples as possible, so the shape and steepness of your learning curve can reveal how well your sampling strategy is working.
123456789101112131415161718192021222324252627282930313233343536373839import numpy as np import matplotlib.pyplot as plt from sklearn.datasets import make_classification from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score # Simulate a pool of unlabeled data X, y = make_classification(n_samples=1200, n_features=20, n_informative=15, n_classes=2, random_state=42) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) # Start with a small labeled set initial_idx = np.random.choice(range(len(X_train)), size=20, replace=False) labeled_idx = list(initial_idx) unlabeled_idx = list(set(range(len(X_train))) - set(labeled_idx)) accuracies = [] labeled_set_sizes = [] # Simulate AL iterations for i in range(10): clf = RandomForestClassifier(random_state=42) clf.fit(X_train[labeled_idx], y_train[labeled_idx]) y_pred = clf.predict(X_test) acc = accuracy_score(y_test, y_pred) accuracies.append(acc) labeled_set_sizes.append(len(labeled_idx)) # Select 20 most uncertain samples (simulate with random selection here) if len(unlabeled_idx) >= 20: new_samples = np.random.choice(unlabeled_idx, size=20, replace=False) labeled_idx.extend(new_samples) unlabeled_idx = list(set(unlabeled_idx) - set(new_samples)) plt.plot(labeled_set_sizes, accuracies, marker='o') plt.xlabel('Number of Labeled Samples') plt.ylabel('Accuracy') plt.title('Learning Curve: Accuracy vs. Labeled Set Size') plt.grid(True) plt.show()
A learning curve in Active Learning shows how efficiently a model improves as more labeled data is added. A steep curve means rapid accuracy gains from each new label—this is ideal. A flat curve suggests new labels add little value. Comparing curves helps you see which AL strategy achieves high accuracy with fewer labels.
1. What does a steeper learning curve indicate in the context of Active Learning?
2. Which metric is most relevant for comparing AL strategies?
Merci pour vos commentaires !