LeetCode MEDIUM 444 Sequence Reconstruction Summary

Generated by Codex with GPT-5

Quick facts

Difficulty: MEDIUM
Problem: Sequence Reconstruction
Topics: Array, Graph Theory, Topological Sort

What the problem is asking

The input gives a target sequence nums and a list of shorter sequences. Each shorter sequence says, “these numbers must appear in this relative order.”

The task is not just to ask whether nums can be built from those constraints. It asks whether nums is the only shortest sequence that satisfies every constraint. If another sequence of the same shortest length can also satisfy the constraints, the answer is False.

That makes this a uniqueness problem. The constraints form a directed graph: whenever two numbers appear next to each other in one of the shorter sequences, the first must come before the second. The question becomes:

Can this graph have exactly one topological ordering, and is that ordering exactly nums?

The core idea

A topological sort repeatedly chooses a node with zero incoming edges. A zero-indegree node has no remaining prerequisite, so it can legally be placed next.

For ordinary topological sorting, several zero-indegree nodes are fine because any of them can come next. For this problem, several choices are a problem. If there are two available nodes at the same time, then the reconstruction is not unique.

So the algorithm is:

Build a directed graph from adjacent pairs in sequences.
Count each value’s indegree.
Start with all values that currently have indegree zero.
At every step, require exactly one available value.
Require that this one value is the next value in nums.
Remove it from the graph and continue.

If the process consumes all values in nums, the constraints force exactly that sequence. If the process gets stuck, branches, or produces a different next value, the reconstruction is not unique or not equal to nums.

How to derive it in an interview

Start by translating every local ordering fact into an edge. For example, the subsequence [1, 3, 5] contributes 1 -> 3 and 3 -> 5. Longer-distance facts such as 1 before 5 are implied by those adjacent edges, so adjacent pairs are enough.

Then focus on the first element of the reconstructed sequence. If the constraints uniquely determine the answer, there must be exactly one value that can appear first. In graph terms, there must be exactly one zero-indegree node. If there are zero, the constraints contain a cycle. If there are two or more, either one could start a valid reconstruction, so the answer is not unique.

After choosing that one forced value, remove its outgoing edges. The same reasoning applies to the next position. This gives a direct way to compare the forced topological order against nums as the algorithm runs, instead of building a result first and checking later.

Python solution

from collections import deque
from typing import Sequence


def build_precedence_graph(
    nums: Sequence[int],
    sequences: Sequence[Sequence[int]],
) -> tuple[dict[int, set[int]], dict[int, int], bool]:
    """
    Build the graph implied by adjacent pairs in sequences.

    Returns (graph, indegree, is_valid_input). Duplicate edges are ignored so
    repeated evidence for the same ordering does not inflate indegrees.
    """
    allowed_values = set(nums)
    graph: dict[int, set[int]] = {value: set() for value in nums}
    indegree: dict[int, int] = {value: 0 for value in nums}
    seen_values: set[int] = set()

    for sequence in sequences:
        for value in sequence:
            if value not in allowed_values:
                return graph, indegree, False
            seen_values.add(value)

        for before, after in zip(sequence, sequence[1:]):
            if before == after:
                return graph, indegree, False

            if after not in graph[before]:
                graph[before].add(after)
                indegree[after] += 1

    if seen_values != allowed_values:
        return graph, indegree, False

    return graph, indegree, True


def is_unique_reconstruction(
    nums: Sequence[int],
    sequences: Sequence[Sequence[int]],
) -> bool:
    """
    Return True only when nums is the one forced shortest supersequence.

    The check is Kahn's topological sort with an extra uniqueness rule:
    each step must have exactly one available zero-indegree value, and that
    value must match the next target value in nums.
    """
    if not nums:
        return not any(sequences)

    graph, indegree, is_valid_input = build_precedence_graph(nums, sequences)
    if not is_valid_input:
        return False

    available_values = deque(
        value for value in nums if indegree[value] == 0
    )

    for expected_value in nums:
        if len(available_values) != 1:
            return False

        current_value = available_values.popleft()
        if current_value != expected_value:
            return False

        for next_value in graph[current_value]:
            indegree[next_value] -= 1
            if indegree[next_value] == 0:
                available_values.append(next_value)

    return True


class Solution:
    def sequenceReconstruction(
        self,
        nums: list[int],
        sequences: list[list[int]],
    ) -> bool:
        """LeetCode entry point."""
        return is_unique_reconstruction(nums, sequences)

Why this works

The graph stores every direct “must come before” relationship from the input sequences. A topological ordering of that graph is a sequence that respects all those relationships.

The key invariant is that available_values contains exactly the remaining values whose prerequisites have all been satisfied. If it is empty, no next value can be chosen. If it has more than one value, there is more than one legal next choice, so the reconstruction is not unique. If it has exactly one value, that value is forced.

The algorithm compares each forced value with the corresponding value in nums. If the forced value ever differs, the constraints may define a sequence, but not the target sequence. If every forced value matches and every value is consumed, the graph has exactly one topological order and that order is nums.

The seen_values check matters because missing values cannot be forced by the sequences. If a value from nums never appears in any sequence, it could be omitted from a shorter common supersequence, so nums is not the unique shortest reconstruction.

The time complexity is O(n + L), where n is len(nums) and L is the total number of values across all sequences. Each adjacent pair is inspected once, and each graph edge is processed once. The space complexity is O(n + E), where E is the number of distinct adjacent ordering edges.

Interview follow-ups

Can this be solved without an explicit topological sort?

Yes, if the interviewer guarantees that every sequence is already a valid subsequence of nums. Under that stronger condition, it is enough to verify that every adjacent pair in nums appears as an adjacent pair somewhere in sequences, and that every value in nums appears at least once.

This works because the pairs (nums[0], nums[1]), (nums[1], nums[2]), and so on force a single chain through the whole target sequence. Once that chain is fully present, no other order can satisfy all those pair constraints. The tradeoff is that this shortcut depends on the input guarantee. The topological-sort version is more defensive because it also detects contradictions and invalid values.

What if duplicate sequences or duplicate adjacent pairs appear?

Duplicate evidence should not change the graph. If 1 -> 2 appears ten times, it is still one ordering constraint. That is why the implementation stores each adjacency list as a set and increments indegree[2] only when the edge is first inserted.

Without this deduplication, the algorithm could count the same prerequisite multiple times and fail to release a node when it should. The memory cost is the set of distinct edges, which is appropriate because the logical graph itself also has that many edges.

What if the interviewer asks for the reconstructed sequence instead of a boolean?

Run the same unique topological sort, but append each forced value to an answer list. If the queue ever has anything other than one value, return an empty list or raise an ambiguity error depending on the API. If the process finishes, return the answer.

This is essentially the same algorithm because the boolean problem is already reconstructing the sequence one forced value at a time. Returning the sequence does not change the asymptotic complexity.

How would the answer change if uniqueness were not required?

If uniqueness is not required, the queue may contain multiple zero-indegree values. The algorithm can choose any of them and still produce a valid topological ordering, as long as it eventually processes every node.

The success condition changes from “the queue size must be exactly one at every step” to “the graph must not contain a cycle.” Kahn’s algorithm can detect that by counting how many nodes were processed. If fewer than n nodes are processed, some cycle prevented the remaining nodes from ever reaching indegree zero.

How would this scale for very large input?

The important scaling choice is to avoid expanding implied relationships. A sequence of length k should contribute only k - 1 adjacent edges, not every pair of values inside the sequence. Adjacent edges already preserve the sequence’s full relative order through transitivity.

For very large inputs, the graph can be built incrementally while streaming sequences, with the same deduplicated edge sets and indegree map. The algorithm still needs the final zero-indegree queue before reconstruction begins, so it remains linear in the total input size, but it does not need quadratic pair generation.

Practical takeaway

Sequence Reconstruction is a topological-sort uniqueness problem. The usual topological-sort queue tells more than just whether an order exists: its size tells whether the next element is forced. If exactly one value is available at every step, and those forced values match nums, then the shorter sequences determine nums uniquely.