October 2025
Design and implement CollatingIterator<T>, an
iterator that merges \(k\) sorted input
iterators into a single sorted output stream, where \(T\) implements
Comparable<T>.
Type: \(\texttt{Collection<Iterator<T>>}\) where \(T \texttt{ extends Comparable<T>}\)
Constraints:
Each input iterator yields elements in non-decreasing order per
compareTo()
\(k \geq 0\) iterators (empty collection yields empty output)
Each iterator is single-pass, forward-only
No iterator in collection is null
Type: \(\texttt{Iterator<T>}\) implementing standard Java Iterator interface
Guarantees:
Elements yielded in globally non-decreasing order per
compareTo()
All elements from all input iterators appear exactly once in output
Lazy evaluation: no element retrieved until next()
called
Output exhausted \(\Leftrightarrow\) all input iterators exhausted
Let \(I = \{I_0, I_1, \ldots, I_{k-1}\}\) be the set of input iterators and \(O\) be the sequence of elements yielded so far.
Sorted Output Prefix: Elements yielded so far form non-decreasing sequence \[\forall i < j \text{ where both in } O: O[i] \leq O[j]\]
Conservation: Total elements yielded equals elements consumed from inputs \[|O| = \sum_{i=0}^{k-1} |\texttt{consumed}(I_i)|\]
Completeness: Output exhausted if and only if all inputs exhausted \[\neg \texttt{hasNext()} \Leftrightarrow \forall i \in [0, k): \neg I_i.\texttt{hasNext()}\]
Uniqueness: Each element from each input appears exactly once in output \[\texttt{multiset}(O \cup \texttt{remaining}) = \bigcup_{i=0}^{k-1} \texttt{multiset}(I_i)\] where \(\texttt{remaining}\) is the multiset of elements not yet yielded.
Laziness: No element consumed from any input until output element requested \[\texttt{consumed}(I_i) \text{ monotonically increases only when } \texttt{next()} \text{ called}\]
Total Order: compareTo() defines a
total order (transitive, antisymmetric, complete)
Consistency: compareTo() is
consistent with equals() (though duplicates across streams
allowed)
No Exceptions: Input iterators do not throw exceptions during normal traversal
No Concurrency: No concurrent modification of underlying collections during iteration
Idempotent hasNext(): Multiple calls to
hasNext() without intervening next() return
same result
Non-null Elements: \(T\) elements are non-null (or nulls handled consistently by comparator)
Each iterator \(I_i\) is sorted: \(\forall j < j': I_i[j] \leq I_i[j']\)
No iterator in input collection is null
Input collection itself is non-null
Elements are mutually comparable (no
ClassCastException)
Output contains exactly the multiset union of all input elements: \[\texttt{multiset}(\texttt{output}) = \bigcup_{i=0}^{k-1} \texttt{multiset}(I_i)\]
Output is sorted: \(\forall j < j': \texttt{output}[j] \leq \texttt{output}[j']\)
All input iterators exhausted when output exhausted: \[\neg \texttt{hasNext()} \Leftrightarrow \forall i: \neg I_i.\texttt{hasNext()}\]
NullPointerException: if input collection is
null or contains null iterator
ClassCastException: if elements are not mutually
comparable
NoSuchElementException: if next()
called when hasNext() returns false
UnsupportedOperationException: if
remove() called (optional operation)
This specification defines the problem. The following questions guide the analysis and design phases:
Lower Bound: What is the minimum number of comparisons required for any comparison-based k-way merge algorithm?
Time Complexity: What is the optimal asymptotic time complexity achievable? Express in terms of \(N = \sum_{i=0}^{k-1} |I_i|\) (total elements) and \(k\) (number of iterators).
Space Complexity: What is the minimum auxiliary space required? Can we achieve better than \(O(N)\)?
Per-Operation Costs: What are the costs of
hasNext() and next() in an optimal
implementation?
What data structures can support k-way merge?
What are the trade-offs between alternatives (comparisons, memory access patterns, implementation complexity)?
Are there crossover points where one approach becomes preferable (e.g., based on \(k\) size)?
Given an asymptotically optimal algorithm:
How many comparisons per element in practice (not just big-O)?
What are the cache behavior characteristics?
What are the constant factor differences between theoretically equivalent approaches?
Can we improve constants without changing asymptotic complexity?
How do constant factors vary across languages/platforms (Java, C++, Rust)?
What are the practical performance characteristics for typical \(k\) values (2, 10, 100, 1000)?
When do naive approaches (e.g., linear scan) outperform sophisticated algorithms?
The analysis phase should answer these questions rigorously, comparing alternatives and justifying design choices with both asymptotic analysis and empirical measurement.