Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lernen Pool-Based Vs Stream-Based Sampling | Foundations of Active Learning
Active Learning with Python

bookPool-Based Vs Stream-Based Sampling

When you are designing an active learning system, one of the first choices you must make is how to select unlabeled data for annotation. Two fundamental strategies are pool-based sampling and stream-based sampling.

Pool-based sampling assumes you have access to a large pool of unlabeled data. In this approach, you can examine the entire pool and selectively choose which instances to query for labels. The workflow typically involves evaluating all available unlabeled samples using a selection criterion—such as uncertainty or informativeness—then querying the most promising ones. This method is widely used in scenarios where you can store and access the full dataset at once, such as document classification or image recognition tasks.

Stream-based sampling, in contrast, is suited for situations where data arrives sequentially, and you cannot store or review all instances at once. Each new instance is presented to the learner one at a time. For every instance, you must immediately decide whether to query its label or discard it, often based on a threshold of informativeness or relevance. This approach is common in real-time systems, such as online monitoring or sensor data analysis, where storage or immediate decision-making constraints apply.

question mark

A company receives a continuous stream of sensor data and needs to decide, in real time, which data points to label for model improvement. Which sampling approach is most appropriate?

Select the correct answer

War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 1. Kapitel 2

Fragen Sie AI

expand

Fragen Sie AI

ChatGPT

Fragen Sie alles oder probieren Sie eine der vorgeschlagenen Fragen, um unser Gespräch zu beginnen

bookPool-Based Vs Stream-Based Sampling

Swipe um das Menü anzuzeigen

When you are designing an active learning system, one of the first choices you must make is how to select unlabeled data for annotation. Two fundamental strategies are pool-based sampling and stream-based sampling.

Pool-based sampling assumes you have access to a large pool of unlabeled data. In this approach, you can examine the entire pool and selectively choose which instances to query for labels. The workflow typically involves evaluating all available unlabeled samples using a selection criterion—such as uncertainty or informativeness—then querying the most promising ones. This method is widely used in scenarios where you can store and access the full dataset at once, such as document classification or image recognition tasks.

Stream-based sampling, in contrast, is suited for situations where data arrives sequentially, and you cannot store or review all instances at once. Each new instance is presented to the learner one at a time. For every instance, you must immediately decide whether to query its label or discard it, often based on a threshold of informativeness or relevance. This approach is common in real-time systems, such as online monitoring or sensor data analysis, where storage or immediate decision-making constraints apply.

question mark

A company receives a continuous stream of sensor data and needs to decide, in real time, which data points to label for model improvement. Which sampling approach is most appropriate?

Select the correct answer

War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 1. Kapitel 2
some-alt