Collating Iterator: Lower Bound Analysis

Research Artifact - Stage 2A

October 2025

Research Question

From Stage 1: “What is the minimum number of comparisons required for any comparison-based k-way merge algorithm?”

Problem Setup

We have:

\(k\) sorted sequences (iterators)
\(N = \sum_{i=0}^{k-1} n_i\) total elements across all sequences
Each sequence \(i\) has \(n_i\) elements
Goal: Merge into single sorted sequence
Model: Comparison-based (can only use compareTo() operations)

Lower Bound via Decision Tree

Decision Tree Model

Any comparison-based algorithm can be represented as a decision tree:

Internal nodes: comparisons between elements
Leaves: output permutations
Path from root to leaf: sequence of comparisons determining output order

Counting Possible Outputs

How many possible merged outputs exist?

Lemma 1 (Output Permutations). The number of distinct merged sequences is the multinomial coefficient: \[\binom{N}{n_0, n_1, \ldots, n_{k-1}} = \frac{N!}{n_0! \cdot n_1! \cdot \ldots \cdot n_{k-1}!}\]

Proof. We must choose:

Which \(n_0\) of \(N\) positions for elements from sequence 0
Which \(n_1\) of remaining \(N-n_0\) positions for sequence 1
And so on...

Since elements within each sequence maintain relative order, only the interleaving matters. \(\square\) ◻

Decision Tree Depth

Theorem 1 (Comparison Lower Bound). Any comparison-based k-way merge algorithm requires at least \[\log_2 \binom{N}{n_0, n_1, \ldots, n_{k-1}}\] comparisons in the worst case.

Proof. The decision tree must have at least \(\binom{N}{n_0, \ldots, n_{k-1}}\) leaves (one per possible output).

A binary tree of depth \(d\) has at most \(2^d\) leaves.

Therefore: \[2^d \geq \binom{N}{n_0, n_1, \ldots, n_{k-1}}\]

Taking \(\log_2\) of both sides: \[d \geq \log_2 \binom{N}{n_0, n_1, \ldots, n_{k-1}}\]

The depth \(d\) is the number of comparisons in the worst case. \(\square\) ◻

Simplified Bounds

Equal-Length Sequences

When \(n_i = N/k\) for all \(i\) (equal-length sequences):

Theorem 2 (Equal-Length Lower Bound). For \(k\) sequences of equal length \(N/k\), any comparison-based merge requires \[\Omega(N \log k)\] comparisons.

Proof. \[\begin{aligned} \log_2 \binom{N}{N/k, \ldots, N/k} &= \log_2 \frac{N!}{(N/k!)^k} \\ &= \log_2 N! - k \log_2(N/k!) \\ &\approx N \log N - k \cdot \frac{N}{k} \log \frac{N}{k} \quad \text{(Stirling)} \\ &= N \log N - N(\log N - \log k) \\ &= N \log k \end{aligned}\]

Therefore, \(\Omega(N \log k)\) comparisons required. \(\square\) ◻

Worst-Case Bound

Even without equal lengths:

Theorem 3 (General Lower Bound). For any distribution of \(N\) elements across \(k\) sequences, a comparison-based merge requires at least \(\Omega(N \log k)\) comparisons.

Proof Sketch. At each output position, we must determine which of \(k\) sequences supplies the next element.

This is a k-way decision requiring \(\log_2 k\) comparisons per position (in general).

With \(N\) positions, total: \(\Omega(N \log k)\) comparisons. \(\square\) ◻

Intuitive Argument

Another way to see this:

Per-Element Cost: For each element output, we must identify the minimum among \(k\) candidates (one from each non-empty sequence).

Finding Minimum: Finding the minimum of \(k\) elements requires \(\log_2 k\) comparisons in a comparison-based model (this is the selection problem).

Total Cost: \(N\) elements \(\times\) \(\log_2 k\) comparisons per element \(= \Omega(N \log k)\).

Tightness

This bound is tight (achievable) because:

Sorting \(N\) elements requires \(\Omega(N \log N)\) comparisons
K-way merge is easier: we know relative order within each sequence
The \(\Omega(N \log k)\) bound captures exactly the uncertainty from interleaving \(k\) sequences
We will show in subsequent analysis that algorithms achieving this bound exist

Space Lower Bound

Question from Stage 1: Can we achieve better than \(O(N)\) auxiliary space?

Answer: Yes! We only need to track progress through the \(k\) iterators.

Theorem 4 (Space Lower Bound). Merging \(k\) sorted iterators requires at least \(\Omega(k)\) auxiliary space.

Proof. We must maintain:

State for each of \(k\) iterators (e.g., current position)
At minimum, \(k\) references/pointers

Therefore, \(\Omega(k)\) space required.

Since \(k \ll N\) typically, this is much better than \(O(N)\). \(\square\) ◻

Summary of Lower Bounds

Resource	Lower Bound
Comparisons (worst-case)	\(\Omega(N \log k)\)
Comparisons (average-case)	\(\Omega(N \log k)\)
Time (assuming \(O(1)\) comparisons)	\(\Omega(N \log k)\)
Auxiliary Space	\(\Omega(k)\)

Key Insight: Any algorithm achieving \(O(N \log k)\) time and \(O(k)\) space is asymptotically optimal for this problem.

What’s Next?

Stage 2B will explore which algorithms (if any) achieve these lower bounds.