Collating Iterator: Lower Bound Analysis

Research Artifact - Stage 2A

October 2025

Research Question

From Stage 1: “What is the minimum number of comparisons required for any comparison-based k-way merge algorithm?”

Problem Setup

We have:

Lower Bound via Decision Tree

Decision Tree Model

Any comparison-based algorithm can be represented as a decision tree:

Counting Possible Outputs

How many possible merged outputs exist?

Lemma 1 (Output Permutations). The number of distinct merged sequences is the multinomial coefficient: \[\binom{N}{n_0, n_1, \ldots, n_{k-1}} = \frac{N!}{n_0! \cdot n_1! \cdot \ldots \cdot n_{k-1}!}\]

Proof. We must choose:

Since elements within each sequence maintain relative order, only the interleaving matters. \(\square\) ◻

Decision Tree Depth

Theorem 1 (Comparison Lower Bound). Any comparison-based k-way merge algorithm requires at least \[\log_2 \binom{N}{n_0, n_1, \ldots, n_{k-1}}\] comparisons in the worst case.

Proof. The decision tree must have at least \(\binom{N}{n_0, \ldots, n_{k-1}}\) leaves (one per possible output).

A binary tree of depth \(d\) has at most \(2^d\) leaves.

Therefore: \[2^d \geq \binom{N}{n_0, n_1, \ldots, n_{k-1}}\]

Taking \(\log_2\) of both sides: \[d \geq \log_2 \binom{N}{n_0, n_1, \ldots, n_{k-1}}\]

The depth \(d\) is the number of comparisons in the worst case. \(\square\) ◻

Simplified Bounds

Equal-Length Sequences

When \(n_i = N/k\) for all \(i\) (equal-length sequences):

Theorem 2 (Equal-Length Lower Bound). For \(k\) sequences of equal length \(N/k\), any comparison-based merge requires \[\Omega(N \log k)\] comparisons.

Proof. \[\begin{aligned} \log_2 \binom{N}{N/k, \ldots, N/k} &= \log_2 \frac{N!}{(N/k!)^k} \\ &= \log_2 N! - k \log_2(N/k!) \\ &\approx N \log N - k \cdot \frac{N}{k} \log \frac{N}{k} \quad \text{(Stirling)} \\ &= N \log N - N(\log N - \log k) \\ &= N \log k \end{aligned}\]

Therefore, \(\Omega(N \log k)\) comparisons required. \(\square\) ◻

Worst-Case Bound

Even without equal lengths:

Theorem 3 (General Lower Bound). For any distribution of \(N\) elements across \(k\) sequences, a comparison-based merge requires at least \(\Omega(N \log k)\) comparisons.

Proof Sketch. At each output position, we must determine which of \(k\) sequences supplies the next element.

This is a k-way decision requiring \(\log_2 k\) comparisons per position (in general).

With \(N\) positions, total: \(\Omega(N \log k)\) comparisons. \(\square\) ◻

Intuitive Argument

Another way to see this:

Per-Element Cost: For each element output, we must identify the minimum among \(k\) candidates (one from each non-empty sequence).

Finding Minimum: Finding the minimum of \(k\) elements requires \(\log_2 k\) comparisons in a comparison-based model (this is the selection problem).

Total Cost: \(N\) elements \(\times\) \(\log_2 k\) comparisons per element \(= \Omega(N \log k)\).

Tightness

This bound is tight (achievable) because:

Space Lower Bound

Question from Stage 1: Can we achieve better than \(O(N)\) auxiliary space?

Answer: Yes! We only need to track progress through the \(k\) iterators.

Theorem 4 (Space Lower Bound). Merging \(k\) sorted iterators requires at least \(\Omega(k)\) auxiliary space.

Proof. We must maintain:

Therefore, \(\Omega(k)\) space required.

Since \(k \ll N\) typically, this is much better than \(O(N)\). \(\square\) ◻

Summary of Lower Bounds

Resource Lower Bound
Comparisons (worst-case) \(\Omega(N \log k)\)
Comparisons (average-case) \(\Omega(N \log k)\)
Time (assuming \(O(1)\) comparisons) \(\Omega(N \log k)\)
Auxiliary Space \(\Omega(k)\)

Key Insight: Any algorithm achieving \(O(N \log k)\) time and \(O(k)\) space is asymptotically optimal for this problem.

What’s Next?

Stage 2B will explore which algorithms (if any) achieve these lower bounds.