LeetCode MEDIUM 442 Find All Duplicates in an Array Summary

Generated by Codex with GPT-5

Quick facts

Difficulty: MEDIUM
Problem: Find All Duplicates in an Array
Topics: Array, Hash Table, Sorting

Problem gist

The input is an array nums of length n. Every value is in the range 1 through n, and each value appears either once or twice. The task is to return every value that appears twice.

The obvious way is to keep a set of values already seen. If a number appears again, add it to the answer. That is simple and correct, but it uses O(n) extra space. Sorting also works, because duplicates become adjacent, but sorting costs O(n log n) time or mutates the input heavily.

The key constraint is the range: every value can point to a valid index. Value x maps to index x - 1. That means the array itself can act like the seen set.

Deriving the optimal approach

Think of each number as asking, “Have I visited my own slot before?”

For a value x, inspect nums[x - 1].

If that slot is positive, this is the first time x has been seen. Mark the slot by negating it.

If that slot is already negative, x was seen earlier. Because the problem guarantees each number appears at most twice, this second visit means x is a duplicate.

The only subtle point is that the scan mutates values that may be read later. So each step uses abs(value) before computing the index. The sign is metadata; the absolute value is still the original number.

This works because the input range creates a perfect mapping from values to positions. No hash table is needed because every possible key already has a home slot in the array.

Python solution

from typing import List


def find_duplicates_in_range(nums: List[int]) -> List[int]:
    """
    Return all values that appear twice in an array of length n where every
    value is in [1, n].

    The algorithm temporarily uses the sign of nums[value - 1] as a visited
    marker, then restores the input to avoid surprising callers outside
    LeetCode-style environments.
    """
    duplicates: List[int] = []

    for value in nums:
        original_value = abs(value)
        marker_index = original_value - 1

        if nums[marker_index] < 0:
            duplicates.append(original_value)
        else:
            nums[marker_index] = -nums[marker_index]

    for index, value in enumerate(nums):
        nums[index] = abs(value)

    return duplicates


class Solution:
    def findDuplicates(self, nums: List[int]) -> List[int]:
        return find_duplicates_in_range(nums)

The running time is O(n) because the array is scanned once to detect duplicates and once to restore signs. The extra working space is O(1) beyond the returned answer. The answer list itself is not counted as extra space in the usual LeetCode convention.

If the interviewer allows the input to remain mutated, the restore pass can be removed. The asymptotic complexity stays the same either way.

Why the sign trick is safe

The method depends on three guarantees.

First, every value is between 1 and n, so value - 1 is always a valid index.

Second, there are no zeroes. That matters because negating zero would not change its sign, so zero would make this marker technique ambiguous.

Third, every value appears once or twice. When a slot is already negative, the current value is not just “seen before”; it is exactly the duplicate occurrence the problem asks for. If a value could appear three or more times, the code would need a different rule for how many times to report it.

Common pitfalls

The most common bug is indexing with value - 1 directly after earlier values have been negated. Always use abs(value) - 1.

Another mistake is appending nums[marker_index] when a duplicate is found. That slot is negative metadata, not the duplicate value. Append original_value.

A third pitfall is forgetting that this solution mutates the array. In a pure LeetCode answer that is usually acceptable. In production-style code, restoring the array after the scan is a small cost that makes the helper safer to reuse.

Interview follow-ups

What if numbers could appear more than twice?

The sign-marking solution can still detect repeated visits, but it would append the same value on every visit after the first. If the required output is “all values that appear at least twice, reported once,” the array needs a marker with more than two states.

One in-place option is to add n to the bucket for each value instead of flipping signs. For each value x, increment nums[(x - 1) % n] by n. After the scan, the count of value i + 1 is nums[i] // n after accounting for the original value. Any count greater than one is a duplicate. This keeps O(n) time and O(1) extra space, but it is more error-prone and must be written carefully to avoid mixing original values with accumulated counts.

What if the input array must not be modified at all?

If no mutation is allowed, the cleanest answer is a hash set. Scan the array, store each seen value, and add a value to the result when it is encountered again. This is still O(n) time, but it uses O(n) extra space.

There is also a sorting-based approach if the caller permits sorting a copy: copy the array, sort it, then scan adjacent pairs. That avoids a hash set but still uses O(n) space for the copy and costs O(n log n) time. In most interviews, the hash set is the practical choice when immutability is strict.

What if values are not guaranteed to be in the range 1 through n?

The sign-marking idea no longer applies directly because a value might not map to a valid array index. The solution should switch to a general-purpose data structure such as a hash set or hash map.

If the values are arbitrary integers and the task is just to find duplicates, a set-based scan gives O(n) expected time and O(n) space. If the interviewer asks for all counts, use a frequency map instead. The tradeoff is that the algorithm becomes less space efficient, but it works without relying on special value-range constraints.

Can this be solved by cyclic sort?

Yes. Since each value belongs at index value - 1, cyclic sort repeatedly swaps each value toward its correct position. After the rearrangement, any index i where nums[i] != i + 1 reveals that nums[i] is a duplicate.

This also runs in O(n) time and O(1) extra space, but the implementation has more pointer movement and is easier to get wrong when duplicates block a value from reaching its home slot. The sign-marking method is usually the better interview implementation because the invariant is simpler: a negative slot means “this value has already been seen.”

How would the answer change if the array contained one duplicate and one missing number?

That is the “set mismatch” variant. Sign marking can find the duplicate exactly the same way. After the marking pass, the index that remains positive corresponds to the missing value, because that position was never visited by any number.

The algorithm still runs in O(n) time and O(1) extra space. The output changes from a list of duplicates to a pair: the duplicate value and the missing value. If the input must be restored, run the same final pass that converts every entry back to positive.

Takeaway

The problem is less about duplicate detection in general and more about exploiting the unusually strong constraints. Since every value already names a legal index, the array can store its own visited bits. Once that mapping is clear, the optimal solution becomes a short linear scan with constant extra working space.