Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Impara Learning Efficiency Curves | Applied AL Concepts
Active Learning with Python

bookLearning Efficiency Curves

Understanding how efficiently an Active Learning (AL) system improves with more labeled data is crucial for evaluating its effectiveness. Learning curves provide a visual tool for this purpose: they plot model accuracy (or another performance metric) against the number of labeled samples acquired during AL iterations. These curves help you see how quickly your model benefits from new information, and how much data is needed to reach a desired level of performance. In AL, the goal is to achieve high accuracy with as few labeled samples as possible, so the shape and steepness of your learning curve can reveal how well your sampling strategy is working.

123456789101112131415161718192021222324252627282930313233343536373839
import numpy as np import matplotlib.pyplot as plt from sklearn.datasets import make_classification from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score # Simulate a pool of unlabeled data X, y = make_classification(n_samples=1200, n_features=20, n_informative=15, n_classes=2, random_state=42) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) # Start with a small labeled set initial_idx = np.random.choice(range(len(X_train)), size=20, replace=False) labeled_idx = list(initial_idx) unlabeled_idx = list(set(range(len(X_train))) - set(labeled_idx)) accuracies = [] labeled_set_sizes = [] # Simulate AL iterations for i in range(10): clf = RandomForestClassifier(random_state=42) clf.fit(X_train[labeled_idx], y_train[labeled_idx]) y_pred = clf.predict(X_test) acc = accuracy_score(y_test, y_pred) accuracies.append(acc) labeled_set_sizes.append(len(labeled_idx)) # Select 20 most uncertain samples (simulate with random selection here) if len(unlabeled_idx) >= 20: new_samples = np.random.choice(unlabeled_idx, size=20, replace=False) labeled_idx.extend(new_samples) unlabeled_idx = list(set(unlabeled_idx) - set(new_samples)) plt.plot(labeled_set_sizes, accuracies, marker='o') plt.xlabel('Number of Labeled Samples') plt.ylabel('Accuracy') plt.title('Learning Curve: Accuracy vs. Labeled Set Size') plt.grid(True) plt.show()
copy
Note
Note

A learning curve in Active Learning shows how efficiently a model improves as more labeled data is added. A steep curve means rapid accuracy gains from each new label—this is ideal. A flat curve suggests new labels add little value. Comparing curves helps you see which AL strategy achieves high accuracy with fewer labels.

1. What does a steeper learning curve indicate in the context of Active Learning?

2. Which metric is most relevant for comparing AL strategies?

question mark

What does a steeper learning curve indicate in the context of Active Learning?

Select the correct answer

question mark

Which metric is most relevant for comparing AL strategies?

Select the correct answer

Tutto è chiaro?

Come possiamo migliorarlo?

Grazie per i tuoi commenti!

Sezione 3. Capitolo 2

Chieda ad AI

expand

Chieda ad AI

ChatGPT

Chieda pure quello che desidera o provi una delle domande suggerite per iniziare la nostra conversazione

bookLearning Efficiency Curves

Scorri per mostrare il menu

Understanding how efficiently an Active Learning (AL) system improves with more labeled data is crucial for evaluating its effectiveness. Learning curves provide a visual tool for this purpose: they plot model accuracy (or another performance metric) against the number of labeled samples acquired during AL iterations. These curves help you see how quickly your model benefits from new information, and how much data is needed to reach a desired level of performance. In AL, the goal is to achieve high accuracy with as few labeled samples as possible, so the shape and steepness of your learning curve can reveal how well your sampling strategy is working.

123456789101112131415161718192021222324252627282930313233343536373839
import numpy as np import matplotlib.pyplot as plt from sklearn.datasets import make_classification from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score # Simulate a pool of unlabeled data X, y = make_classification(n_samples=1200, n_features=20, n_informative=15, n_classes=2, random_state=42) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) # Start with a small labeled set initial_idx = np.random.choice(range(len(X_train)), size=20, replace=False) labeled_idx = list(initial_idx) unlabeled_idx = list(set(range(len(X_train))) - set(labeled_idx)) accuracies = [] labeled_set_sizes = [] # Simulate AL iterations for i in range(10): clf = RandomForestClassifier(random_state=42) clf.fit(X_train[labeled_idx], y_train[labeled_idx]) y_pred = clf.predict(X_test) acc = accuracy_score(y_test, y_pred) accuracies.append(acc) labeled_set_sizes.append(len(labeled_idx)) # Select 20 most uncertain samples (simulate with random selection here) if len(unlabeled_idx) >= 20: new_samples = np.random.choice(unlabeled_idx, size=20, replace=False) labeled_idx.extend(new_samples) unlabeled_idx = list(set(unlabeled_idx) - set(new_samples)) plt.plot(labeled_set_sizes, accuracies, marker='o') plt.xlabel('Number of Labeled Samples') plt.ylabel('Accuracy') plt.title('Learning Curve: Accuracy vs. Labeled Set Size') plt.grid(True) plt.show()
copy
Note
Note

A learning curve in Active Learning shows how efficiently a model improves as more labeled data is added. A steep curve means rapid accuracy gains from each new label—this is ideal. A flat curve suggests new labels add little value. Comparing curves helps you see which AL strategy achieves high accuracy with fewer labels.

1. What does a steeper learning curve indicate in the context of Active Learning?

2. Which metric is most relevant for comparing AL strategies?

question mark

What does a steeper learning curve indicate in the context of Active Learning?

Select the correct answer

question mark

Which metric is most relevant for comparing AL strategies?

Select the correct answer

Tutto è chiaro?

Come possiamo migliorarlo?

Grazie per i tuoi commenti!

Sezione 3. Capitolo 2
some-alt