Collating Iterator: Formal Specification

Research Artifact

October 2025

Problem Statement

Design and implement CollatingIterator<T>, an iterator that merges \(k\) sorted input iterators into a single sorted output stream, where \(T\) implements Comparable<T>.

Inputs

Outputs

Invariants

Let \(I = \{I_0, I_1, \ldots, I_{k-1}\}\) be the set of input iterators and \(O\) be the sequence of elements yielded so far.

  1. Sorted Output Prefix: Elements yielded so far form non-decreasing sequence \[\forall i < j \text{ where both in } O: O[i] \leq O[j]\]

  2. Conservation: Total elements yielded equals elements consumed from inputs \[|O| = \sum_{i=0}^{k-1} |\texttt{consumed}(I_i)|\]

  3. Completeness: Output exhausted if and only if all inputs exhausted \[\neg \texttt{hasNext()} \Leftrightarrow \forall i \in [0, k): \neg I_i.\texttt{hasNext()}\]

  4. Uniqueness: Each element from each input appears exactly once in output \[\texttt{multiset}(O \cup \texttt{remaining}) = \bigcup_{i=0}^{k-1} \texttt{multiset}(I_i)\] where \(\texttt{remaining}\) is the multiset of elements not yet yielded.

  5. Laziness: No element consumed from any input until output element requested \[\texttt{consumed}(I_i) \text{ monotonically increases only when } \texttt{next()} \text{ called}\]

Assumptions

  1. Total Order: compareTo() defines a total order (transitive, antisymmetric, complete)

  2. Consistency: compareTo() is consistent with equals() (though duplicates across streams allowed)

  3. No Exceptions: Input iterators do not throw exceptions during normal traversal

  4. No Concurrency: No concurrent modification of underlying collections during iteration

  5. Idempotent hasNext(): Multiple calls to hasNext() without intervening next() return same result

  6. Non-null Elements: \(T\) elements are non-null (or nulls handled consistently by comparator)

Contract

Preconditions

Postconditions

Exceptions

Research Questions

This specification defines the problem. The following questions guide the analysis and design phases:

Theoretical Complexity

  1. Lower Bound: What is the minimum number of comparisons required for any comparison-based k-way merge algorithm?

  2. Time Complexity: What is the optimal asymptotic time complexity achievable? Express in terms of \(N = \sum_{i=0}^{k-1} |I_i|\) (total elements) and \(k\) (number of iterators).

  3. Space Complexity: What is the minimum auxiliary space required? Can we achieve better than \(O(N)\)?

  4. Per-Operation Costs: What are the costs of hasNext() and next() in an optimal implementation?

Design Alternatives

  1. What data structures can support k-way merge?

  2. What are the trade-offs between alternatives (comparisons, memory access patterns, implementation complexity)?

  3. Are there crossover points where one approach becomes preferable (e.g., based on \(k\) size)?

Constant Factors (“Can We Do Better?”)

Given an asymptotically optimal algorithm:

  1. How many comparisons per element in practice (not just big-O)?

  2. What are the cache behavior characteristics?

  3. What are the constant factor differences between theoretically equivalent approaches?

  4. Can we improve constants without changing asymptotic complexity?

Implementation Considerations

  1. How do constant factors vary across languages/platforms (Java, C++, Rust)?

  2. What are the practical performance characteristics for typical \(k\) values (2, 10, 100, 1000)?

  3. When do naive approaches (e.g., linear scan) outperform sophisticated algorithms?

Goal

The analysis phase should answer these questions rigorously, comparing alternatives and justifying design choices with both asymptotic analysis and empirical measurement.