Вивчайте The Active Learning Loop | Foundations of Active Learning

Active Learning is an iterative process designed to efficiently improve a model by selectively choosing which data points to label. The core of this approach is the Active Learning loop, which consists of four main steps: train, query, label, and update. This loop is cyclical, meaning that after completing all four steps, you return to the beginning and repeat the process with new information.

You begin by training your model on the currently labeled dataset. This is the train step, where the model learns patterns from the data you already know. Next comes the query step, a critical decision point: here, you use a strategy (such as uncertainty sampling) to select the most informative unlabeled instance(s) from the pool. These are the points your model is most unsure about, or those that are expected to improve learning the most if labeled.

After querying, you move to the label step. In this phase, the selected instances are sent to an oracle (often a human annotator) for labeling. This step brings new, valuable information into your dataset. The final step is update, where you add the newly labeled data to your training set and prepare for another round. The loop then repeats, with each cycle aiming to make the model more accurate and efficient by focusing labeling efforts where they matter most.


              12345678910111213141516171819202122232425262728293031323334353637383940414243
            
import numpy as np
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Synthetic dataset
X, y = make_classification(n_samples=800, n_features=10, random_state=42)

# Split into labeled (small) and unlabeled sets
labeled_idx = np.random.choice(len(X), size=10, replace=False)
unlabeled_idx = np.setdiff1d(np.arange(len(X)), labeled_idx)

X_labeled, y_labeled = X[labeled_idx], y[labeled_idx]

def uncertainty_sampling(model, X_pool):
    """Return index of the most uncertain sample."""
    probs = model.predict_proba(X_pool)
    uncertainty = 1 - np.max(probs, axis=1)
    return np.argmax(uncertainty)

# Active Learning loop
for step in range(10):
    model = LogisticRegression(max_iter=1000).fit(X_labeled, y_labeled)
    
    # Pick 1 most uncertain unlabeled sample
    X_unlabeled = X[unlabeled_idx]
    y_unlabeled = y[unlabeled_idx]
    q = uncertainty_sampling(model, X_unlabeled)
    
    # Label it (oracle = ground truth)
    new_x, new_y = X_unlabeled[q], y_unlabeled[q]
    
    # Update datasets
    X_labeled = np.vstack([X_labeled, new_x])
    y_labeled = np.append(y_labeled, new_y)
    unlabeled_idx = np.delete(unlabeled_idx, q)

    print(f"Step {step}: labeled={len(X_labeled)}")

# Final model
final_model = LogisticRegression(max_iter=1000).fit(X_labeled, y_labeled)
acc = accuracy_score(y, final_model.predict(X))
print("\nFinal accuracy:", round(acc, 3))

1. Sequence the steps of the Active Learning loop:

2. Which step of the Active Learning loop involves selecting the next instance to label?

Все було зрозуміло?

Дякуємо за ваш відгук!

Секція 1. Розділ 3

Запитати АІ

Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат

Suggested prompts:

Can you explain how uncertainty sampling works in this example?

What are some other query strategies besides uncertainty sampling?

How does the performance of active learning compare to random sampling?

Свайпніть щоб показати меню


              12345678910111213141516171819202122232425262728293031323334353637383940414243
            
import numpy as np
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Synthetic dataset
X, y = make_classification(n_samples=800, n_features=10, random_state=42)

# Split into labeled (small) and unlabeled sets
labeled_idx = np.random.choice(len(X), size=10, replace=False)
unlabeled_idx = np.setdiff1d(np.arange(len(X)), labeled_idx)

X_labeled, y_labeled = X[labeled_idx], y[labeled_idx]

def uncertainty_sampling(model, X_pool):
    """Return index of the most uncertain sample."""
    probs = model.predict_proba(X_pool)
    uncertainty = 1 - np.max(probs, axis=1)
    return np.argmax(uncertainty)

# Active Learning loop
for step in range(10):
    model = LogisticRegression(max_iter=1000).fit(X_labeled, y_labeled)
    
    # Pick 1 most uncertain unlabeled sample
    X_unlabeled = X[unlabeled_idx]
    y_unlabeled = y[unlabeled_idx]
    q = uncertainty_sampling(model, X_unlabeled)
    
    # Label it (oracle = ground truth)
    new_x, new_y = X_unlabeled[q], y_unlabeled[q]
    
    # Update datasets
    X_labeled = np.vstack([X_labeled, new_x])
    y_labeled = np.append(y_labeled, new_y)
    unlabeled_idx = np.delete(unlabeled_idx, q)

    print(f"Step {step}: labeled={len(X_labeled)}")

# Final model
final_model = LogisticRegression(max_iter=1000).fit(X_labeled, y_labeled)
acc = accuracy_score(y, final_model.predict(X))
print("\nFinal accuracy:", round(acc, 3))

1. Sequence the steps of the Active Learning loop:

2. Which step of the Active Learning loop involves selecting the next instance to label?

Все було зрозуміло?

Дякуємо за ваш відгук!

Секція 1. Розділ 3