Course Content
Stream API
Stream API
collect() Gathering Stream Elements into a Collection
You are already familiar with terminal operations and have even used them in previous examples and exercises. Now it's time to take a closer look at how they work. First up is the collect()
method, which is one of the key terminal operations in Stream API.
The collect() Method
It is one of the most powerful tools when working with streams, allowing us to accumulate results into a List
, Set
, or Map
, as well as perform complex groupings and statistical calculations.
There are two implementations of the collect()
method—let's explore both.
Using collect() with Functional Interfaces
The collect()
method in Stream API can be used with three functional interfaces to give full control over data collection:
Supplier<R> supplier
– creates an empty collection (R
) where elements will be stored. For example,ArrayList::new
initializes a new list;BiConsumer<R, ? super T> accumulator
– adds stream elements (T
) to the collection (R
). For instance,List::add
appends items to a list;BiConsumer<R, R> combiner
– merges two collections when parallel processing is used. For example,List::addAll
combines lists into one.
All three components work together to provide flexibility in data collection. First, the supplier
creates an empty collection that will be used to accumulate elements from the stream. Then, the accumulator
adds each element as the stream processes them. This flow remains straightforward in a sequential stream.
However, when working with parallel streams (parallelStream()
), things get more complex.
The data processing is split across multiple threads, with each thread creating its own separate collection. Once the processing is complete, these individual collections need to be merged into a single result. This is where the combiner
comes in, efficiently combining the separate parts into one unified collection.
Practical Example
You work for an online store and have a list of products. Your task is to collect only the products that cost more than $500 using the collect()
method with three parameters.
Main
package com.example; import java.util.ArrayList; import java.util.List; public class Main { public static void main(String[] args) { // Initial list of products List<Product> productList = List.of( new Product("Laptop", 1200.99), new Product("Phone", 599.49), new Product("Headphones", 199.99), new Product("Monitor", 299.99), new Product("Tablet", 699.99) ); // Filtering and collecting products over $500 using `collect()` List<Product> expensiveProducts = productList.parallelStream() .filter(product -> product.getPrice() > 500) // Keep only expensive products .collect( ArrayList::new, // Create a new list (list, product) -> list.add(product), // Add each product to the list ArrayList::addAll // Merge lists (if the stream is parallel) ); // Print the result System.out.print("Products over $500: " + expensiveProducts); } } class Product { private String name; private double price; Product(String name, double price) { this.name = name; this.price = price; } public String getName() { return name; } public double getPrice() { return price; } @Override public String toString() { return name + " ($" + price + ")"; } }
The collect()
method takes three arguments, each defining a different step in collecting elements into a list:
-
ArrayList::new
(Supplier
) → creates an emptyArrayList<Product>
to store the results; -
(list, product) -> list.add(product)
(BiConsumer
) → adds eachProduct
to the list if it meets the filter condition (price > 500
); -
ArrayList::addAll
(BiConsumer
) → merges multiple lists when using parallel streams, ensuring all filtered products are combined into a single list.
Even though the third parameter is mainly for parallel processing, it’s required by collect()
.
Using collect() with the Collector Interface
In addition to working with three functional interfaces, the collect()
method in Stream API can also be used with predefined implementations of the Collector
interface.
This approach is more flexible and convenient since it provides built-in methods for working with collections.
The Collector<T, A, R>
interface consists of several key methods:
Supplier<A> supplier()
– creates an empty container for accumulating elements;BiConsumer<A, T> accumulator()
– defines how elements are added to the container;BinaryOperator<A> combiner()
– merges two containers when parallel processing is used;Function<A, R> finisher()
– transforms the container into the final result.
As you can see, this structure is similar to the collect()
method that works with functional interfaces, but it introduces the finisher()
method. This additional step allows for extra processing on the collected data before returning the final result—for example, sorting the list before returning it.
Additionally, the Collector
interface provides the characteristics()
method, which defines properties that help optimize stream execution:
These characteristics help Stream API optimize performance. For example, if a collection is inherently unordered, specifying UNORDERED
can prevent unnecessary sorting, making the operation more efficient.
Practical Example
Imagine you run an online store and need to process product prices before collecting them. For instance, you want to round each price to the nearest whole number, remove duplicates, and sort the final list.
Main
package com.example; import java.util.*; import java.util.function.*; import java.util.stream.Collector; import java.util.stream.Stream; public class Main { public static void main(String[] args) { List<Double> prices = List.of(1200.99, 599.49, 199.99, 599.49, 1200.49, 200.0); // Using a custom `Collector` to round prices, remove duplicates, and sort List<Integer> processedPrices = prices.parallelStream() .collect(new RoundedSortedCollector()); System.out.println("Processed product prices: " + processedPrices); } } // Custom `Collector` that rounds prices, removes duplicates, and sorts them class RoundedSortedCollector implements Collector<Double, Set<Integer>, List<Integer>> { @Override public Supplier<Set<Integer>> supplier() { // Creates a `HashSet` to store unique rounded values return HashSet::new; } @Override public BiConsumer<Set<Integer>, Double> accumulator() { // Rounds price and adds to the set return (set, price) -> set.add((int) Math.round(price)); } @Override public BinaryOperator<Set<Integer>> combiner() { return (set1, set2) -> { set1.addAll(set2); // Merges two sets return set1; }; } @Override public Function<Set<Integer>, List<Integer>> finisher() { return set -> set.stream() .sorted() // Sorts the final list .toList(); } @Override public Set<Characteristics> characteristics() { // Order is not important during accumulation return Set.of(Characteristics.UNORDERED); } }
You start processing the data by passing it into a custom Collector
called RoundedSortedCollector
.
This collector first accumulates all prices into a Set<Integer>
, ensuring that duplicates are automatically removed. Before adding each value, it rounds the price using Math.round(price)
and converts it to an int
. For example, both 1200.99 and 1200.49 will become 1200, while 199.99 will round up to 200.
If the stream runs in parallel mode, the combiner()
method merges two sets by adding all elements from one set into another. This step is crucial for multi-threaded environments.
In the final stage, after all prices are collected, the finisher()
method transforms the set into a sorted list. It converts the Set<Integer>
into a stream, applies sorted()
to arrange values in ascending order, and then collects them into a List<Integer>
.
As a result, you get a sorted list of unique, rounded prices that can be used for further calculations or display purposes.
1. What does the collect()
method do in Stream API?
2. What additional capability does the Collector
interface provide compared to collect()
with functional interfaces?
Thanks for your feedback!