Core Concepts: Splitting and Merging for Parallelism
Many modern data structure libraries provide "parallel" collections that bundle an execution model (like a thread pool) with the data structure itself. While convenient, this approach tightly couples data and execution, limiting flexibility.
Bifurcan takes a different approach by separating these concerns. It provides the fundamental tools for parallelism—efficient splitting and merging—but leaves the execution model up to you.
The split()
Method
Every Bifurcan collection implements the split(k)
method. This method breaks the collection into approximately k
smaller, independent sub-collections.
import io.lacuna.bifurcan.IList;
import io.lacuna.bifurcan.List;
// Create a large list
IList<Long> largeList = List.from(1_000_000, i -> i);
// Split it into roughly 8 parts for processing
IList<? extends IList<Long>> parts = largeList.split(8);
System.out.println("Number of parts: " + parts.size());
Because of Bifurcan's underlying data structures (like Relaxed Radix Balanced Trees for List
), this operation is extremely fast, often near-constant time.
Processing and Merging
Once a collection is split, you can process each part in parallel using any execution framework you choose (e.g., java.util.concurrent.ExecutorService
, Project Loom virtual threads, etc.).
After processing, the results can be efficiently recombined using methods like:
concat(IList)
for listsmerge(IMap, mergeFn)
,union(IMap)
,intersection(IMap)
, anddifference(IMap)
for mapsunion(ISet)
,intersection(ISet)
, anddifference(ISet)
for sets
Example Workflow
Here is a conceptual example of a parallel map-reduce operation using Bifurcan's tools:
import io.lacuna.bifurcan.*;
import java.util.concurrent.*;
import java.util.stream.Collectors;
// 1. Create a large data set
IMap<String, Long> data = new Map<String, Long>().linear();
for (long i = 0; i < 1_000_000; i++) {
data.put("key_" + i, i);
}
data = data.forked();
// 2. Split the data for parallel processing
IList<? extends IMap<String, Long>> parts = data.split(8);
ExecutorService executor = Executors.newFixedThreadPool(8);
// 3. Process each part in parallel (e.g., sum the values)
Callable<Long> processPart = (IMap<String, Long> part) -> {
return part.values().stream().reduce(0L, Long::sum);
};
List<Future<Long>> futures = parts.stream()
.map(part -> executor.submit(() -> processPart.call(part)))
.collect(Collectors.toList());
// 4. Merge the results
long totalSum = 0;
for (Future<Long> future : futures) {
totalSum += future.get();
}
executor.shutdown();
By providing efficient, fundamental operations for partitioning and recombination, Bifurcan gives you the flexibility to build complex parallel workflows without being tied to a specific execution model.