The Active Learning Loop
Active Learning is an iterative process designed to efficiently improve a model by selectively choosing which data points to label. The core of this approach is the Active Learning loop, which consists of four main steps: train, query, label, and update. This loop is cyclical, meaning that after completing all four steps, you return to the beginning and repeat the process with new information.
You begin by training your model on the currently labeled dataset. This is the train step, where the model learns patterns from the data you already know. Next comes the query step, a critical decision point: here, you use a strategy (such as uncertainty sampling) to select the most informative unlabeled instance(s) from the pool. These are the points your model is most unsure about, or those that are expected to improve learning the most if labeled.
After querying, you move to the label step. In this phase, the selected instances are sent to an oracle (often a human annotator) for labeling. This step brings new, valuable information into your dataset. The final step is update, where you add the newly labeled data to your training set and prepare for another round. The loop then repeats, with each cycle aiming to make the model more accurate and efficient by focusing labeling efforts where they matter most.
12345678910111213141516171819202122232425262728293031323334353637383940414243import numpy as np from sklearn.datasets import make_classification from sklearn.linear_model import LogisticRegression from sklearn.metrics import accuracy_score # Synthetic dataset X, y = make_classification(n_samples=800, n_features=10, random_state=42) # Split into labeled (small) and unlabeled sets labeled_idx = np.random.choice(len(X), size=10, replace=False) unlabeled_idx = np.setdiff1d(np.arange(len(X)), labeled_idx) X_labeled, y_labeled = X[labeled_idx], y[labeled_idx] def uncertainty_sampling(model, X_pool): """Return index of the most uncertain sample.""" probs = model.predict_proba(X_pool) uncertainty = 1 - np.max(probs, axis=1) return np.argmax(uncertainty) # Active Learning loop for step in range(10): model = LogisticRegression(max_iter=1000).fit(X_labeled, y_labeled) # Pick 1 most uncertain unlabeled sample X_unlabeled = X[unlabeled_idx] y_unlabeled = y[unlabeled_idx] q = uncertainty_sampling(model, X_unlabeled) # Label it (oracle = ground truth) new_x, new_y = X_unlabeled[q], y_unlabeled[q] # Update datasets X_labeled = np.vstack([X_labeled, new_x]) y_labeled = np.append(y_labeled, new_y) unlabeled_idx = np.delete(unlabeled_idx, q) print(f"Step {step}: labeled={len(X_labeled)}") # Final model final_model = LogisticRegression(max_iter=1000).fit(X_labeled, y_labeled) acc = accuracy_score(y, final_model.predict(X)) print("\nFinal accuracy:", round(acc, 3))
1. Sequence the steps of the Active Learning loop:
2. Which step of the Active Learning loop involves selecting the next instance to label?
Дякуємо за ваш відгук!
Запитати АІ
Запитати АІ
Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат
Can you explain how uncertainty sampling works in this example?
What are some other query strategies besides uncertainty sampling?
How does the performance of active learning compare to random sampling?
Awesome!
Completion rate improved to 10
The Active Learning Loop
Свайпніть щоб показати меню
Active Learning is an iterative process designed to efficiently improve a model by selectively choosing which data points to label. The core of this approach is the Active Learning loop, which consists of four main steps: train, query, label, and update. This loop is cyclical, meaning that after completing all four steps, you return to the beginning and repeat the process with new information.
You begin by training your model on the currently labeled dataset. This is the train step, where the model learns patterns from the data you already know. Next comes the query step, a critical decision point: here, you use a strategy (such as uncertainty sampling) to select the most informative unlabeled instance(s) from the pool. These are the points your model is most unsure about, or those that are expected to improve learning the most if labeled.
After querying, you move to the label step. In this phase, the selected instances are sent to an oracle (often a human annotator) for labeling. This step brings new, valuable information into your dataset. The final step is update, where you add the newly labeled data to your training set and prepare for another round. The loop then repeats, with each cycle aiming to make the model more accurate and efficient by focusing labeling efforts where they matter most.
12345678910111213141516171819202122232425262728293031323334353637383940414243import numpy as np from sklearn.datasets import make_classification from sklearn.linear_model import LogisticRegression from sklearn.metrics import accuracy_score # Synthetic dataset X, y = make_classification(n_samples=800, n_features=10, random_state=42) # Split into labeled (small) and unlabeled sets labeled_idx = np.random.choice(len(X), size=10, replace=False) unlabeled_idx = np.setdiff1d(np.arange(len(X)), labeled_idx) X_labeled, y_labeled = X[labeled_idx], y[labeled_idx] def uncertainty_sampling(model, X_pool): """Return index of the most uncertain sample.""" probs = model.predict_proba(X_pool) uncertainty = 1 - np.max(probs, axis=1) return np.argmax(uncertainty) # Active Learning loop for step in range(10): model = LogisticRegression(max_iter=1000).fit(X_labeled, y_labeled) # Pick 1 most uncertain unlabeled sample X_unlabeled = X[unlabeled_idx] y_unlabeled = y[unlabeled_idx] q = uncertainty_sampling(model, X_unlabeled) # Label it (oracle = ground truth) new_x, new_y = X_unlabeled[q], y_unlabeled[q] # Update datasets X_labeled = np.vstack([X_labeled, new_x]) y_labeled = np.append(y_labeled, new_y) unlabeled_idx = np.delete(unlabeled_idx, q) print(f"Step {step}: labeled={len(X_labeled)}") # Final model final_model = LogisticRegression(max_iter=1000).fit(X_labeled, y_labeled) acc = accuracy_score(y, final_model.predict(X)) print("\nFinal accuracy:", round(acc, 3))
1. Sequence the steps of the Active Learning loop:
2. Which step of the Active Learning loop involves selecting the next instance to label?
Дякуємо за ваш відгук!