Collating Iterator: Candidate Algorithms

Research Artifact - Stage 2B

October 2025

Research Question

From Stage 1 and Stage 2A: “Which algorithms (if any) achieve the \(\Omega(N \log k)\) lower bound?”

We know from Stage 2A that \(\Omega(N \log k)\) comparisons are required. Now we explore which algorithms achieve this bound.

Literature Review

Sources consulted:

Algorithms discovered in literature:

  1. Linear scan (naive baseline)

  2. Binary min-heap / priority queue

  3. Winner tournament tree

  4. Loser tournament tree (Knuth preferred, used in production)

  5. D-ary heap (d=4 common in practice)

  6. Pairwise merge (divide-and-conquer)

  7. Collect-and-sort (ignores pre-sorted property)

  8. Cache-oblivious funnelsort

Candidate Approaches

Without assuming the answer, we analyze eight natural approaches:

Candidate 1: Linear Scan

Source: First principles (naive baseline)

Algorithm

next(): \(\text{minVal} \gets \infty\) \(\text{minIdx} \gets -1\) \(\text{val} \gets \text{iterators}[i].\texttt{peek()}\) \(\text{minVal} \gets \text{val}\) \(\text{minIdx} \gets i\) \(\text{iterators}[\text{minIdx}].\texttt{next()}\) \(\text{minVal}\)

Analysis

Time Complexity:

Space Complexity: \(O(k)\) (iterator references only)

Comparison to Lower Bound: \[O(Nk) \gg \Omega(N \log k) \quad \text{for } k > \log k\]

Achieves bound? NO - Sub-optimal asymptotically for \(k >\) constant

When Competitive: For small \(k\) (say \(k \leq 8\)), the simplicity and cache locality may make this competitive despite poor asymptotics.

Candidate 2: Priority Queue (Binary Min-Heap)

Source: Standard (CLRS Ch 6, TAOCP Vol 3 §5.4.1)

Algorithm Idea

Maintain a priority queue (min-heap) containing:

Operations:

Binary Min-Heap Implementation

Initialize: \(H \gets \text{empty min-heap}\) \(H.\texttt{insert}((I_i.\texttt{next}(), I_i))\) next(): \((val, iter) \gets H.\texttt{extractMin}()\) \(H.\texttt{insert}((iter.\texttt{next}(), iter))\) \(val\)

Analysis

Heap Operations:

Time Complexity:

Space Complexity: \(O(k)\) (heap storage)

Comparison to Lower Bound: \(O(N \log k) = \Omega(N \log k)\)

Achieves bound? YES ✓ - Asymptotically optimal!

Correctness

Invariant 1 (Heap Invariant). At the start of each next() call:

  1. \(H\) contains at most one element from each non-exhausted iterator

  2. Each element in \(H\) is the next unconsumed element from its source iterator

  3. All elements output so far are \(\leq\) all elements in \(H\)

Correctness Proof. Initialization: Insert first element from each iterator. By definition, first elements are minimums of their iterators. Invariant holds.

Maintenance:

Termination: \(H\) becomes empty iff all iterators exhausted. All \(N\) elements extracted in sorted order. \(\square\) ◻

Candidate 3: Winner Tournament Tree

Source: TAOCP Vol 3 §5.4.1 “tournament sort”

Algorithm Idea

Build a complete binary tree with:

Analysis

Time Complexity:

Space Complexity: \(O(k)\) (tree nodes)

Comparison to Lower Bound: \(O(N \log k) = \Omega(N \log k)\)

Achieves bound? YES ✓ - Asymptotically optimal!

Comparison to Heap: Same asymptotic complexity. Constant factors differ (see Stage 3).

Candidate 4: Loser Tournament Tree

Source: TAOCP Vol 3 §5.4.1 (Knuth’s preferred variant), Grafana Labs production use (2024)

Algorithm Idea

A variant of tournament tree where:

Key difference from winner tree: During refill, only need to compare against losers on path to root, not recompute entire subtree.

Analysis

Time Complexity:

Space Complexity: \(O(k)\) (tree nodes + root pointer)

Comparison to Lower Bound: \(O(N \log k) = \Omega(N \log k)\)

Achieves bound? YES ✓ - Asymptotically optimal!

Advantages over Winner Tree:

Disadvantages:

Candidate 5: D-ary Heap

Source: CLRS Problem 6-2, generalization of binary heap

Algorithm Idea

Generalization of binary heap where each node has \(d\) children instead of 2.

Operations:

Analysis

Time Complexity:

Space Complexity: \(O(k)\) (array storage)

Trade-off:

Practical Choice: \(d=4\) commonly used (.NET PriorityQueue uses quaternary heap)

Achieves bound? YES ✓ - Still \(O(N \log k)\)

Candidate 6: Pairwise Merge

Source: Divide-and-conquer mergesort pattern

Algorithm Idea

Recursively merge iterators in pairs: \[((I_0 \oplus I_1) \oplus (I_2 \oplus I_3)) \oplus \ldots\]

Analysis

Problem: Violates lazy evaluation requirement!

Pairwise merging forces materialization of intermediate results. For example, \((I_0 \oplus I_1)\) must complete before merging with \((I_2 \oplus I_3)\).

Time Complexity: \(O(N \log k)\) if we materialize intermediates.

Space Complexity: \(O(N)\) (must store intermediate merged sequences).

Achieves bound? NO - Violates problem constraints (lazy evaluation). Space requirement \(O(N) \gg O(k)\).

When Useful: Parallel merging (divide-and-conquer parallelizes naturally). Different problem domain.

Candidate 7: Collect-and-Sort

Source: First principles (ignores pre-sorted property)

Algorithm

Consume all \(k\) iterators into array \(A\) Sort \(A\) Return sequential iterator over \(A\)

Analysis

Time Complexity: \(O(N \log N)\)

Space Complexity: \(O(N)\) (must store all elements)

Comparison to Lower Bound: \[O(N \log N) \gg O(N \log k) \quad \text{when } k \ll N\]

Achieves bound? NO - Sub-optimal. Doesn’t exploit pre-sorted property of inputs. Violates lazy evaluation and space constraints.

Candidate 8: Cache-Oblivious Funnelsort

Source: Frigo, Leiserson, Prokop, and Ramachandran (1999)

Algorithm Idea

A cache-oblivious sorting algorithm designed for external memory:

Analysis

Time Complexity: \(O(N \log N)\) comparisons (for full sort, not just k-way merge)

I/O Complexity: \(O(\frac{N}{B} \log_{M/B} \frac{N}{B})\) where \(B\) = block size, \(M\) = memory size

Relevance to Our Problem:

Achieves bound? NO - Different problem domain (external memory sorting vs lazy k-way merge)

Note: Interesting for Stage 7 related work discussion

Summary of Candidate Algorithms

Comparison of candidate algorithms
Algorithm Time Space Lazy? Optimal? Source
Linear Scan \(O(Nk)\) \(O(k)\) Yes No Baseline
Binary Heap \(O(N \log k)\) \(O(k)\) Yes Yes CLRS/TAOCP
Winner Tree \(O(N \log k)\) \(O(k)\) Yes Yes TAOCP §5.4.1
Loser Tree \(O(N \log k)\) \(O(k)\) Yes Yes TAOCP §5.4.1
D-ary Heap \(O(N \log k)\) \(O(k)\) Yes Yes CLRS Prob 6-2
Pairwise \(O(N \log k)\) \(O(N)\) No No Divide-conquer
Collect-Sort \(O(N \log N)\) \(O(N)\) No No Naive
Funnelsort \(O(N \log N)\) Varies No No Frigo et al.

Constant Factor Analysis

For the four optimal candidates:

Constant factors for optimal algorithms
Algorithm Comparisons per next() Memory Pattern
Binary Heap \(2 \log_2 k\) Array (excellent cache)
Winner Tree \(\log_2 k\) Pointer-based (poor cache)
Loser Tree \(\log_2 k\) Pointer-based (poor cache)
D-ary Heap (d=4) \(3 \log_4 k \approx 1.5 \log_2 k\) Array (excellent cache)

Observations:

Conclusion

Discovery: Four algorithm families achieve \(\Omega(N \log k)\) lower bound:

  1. Binary min-heap (standard, simple, good cache behavior)

  2. Winner tournament tree (fewer comparisons, pointer-based)

  3. Loser tournament tree (Knuth preferred, production-proven, simpler refill)

  4. D-ary heap (trade-off parameter \(d\), d=4 common)

Linear scan competitive for \(k \leq 8\) despite poor asymptotics.

Open Question: Which is better in practice? (heap vs loser tree vs d-ary heap)

This constant factor analysis is deferred to Stage 3: Design Selection.