LeetCode MEDIUM 287 Find the Duplicate Number Summary

Generated by Codex with GPT-5

Quick facts

Difficulty: MEDIUM
Problem: Find the Duplicate Number
Main tags: Array, Two Pointers, Binary Search

What the problem is really asking

The input has length n + 1, but every value is guaranteed to be in the range 1 through n.

That guarantee means one value must appear more than once. So the duplicate is not the hard part. The hard part is satisfying the interview constraints at the same time:

return the repeated value
do not modify the array
use only constant extra space

That immediately eliminates the most obvious ideas. A hash set is easy but uses O(n) extra space. Sorting is easy but changes the array unless a copy is made, which also costs O(n) space.

So the real question is this: what hidden structure inside the array can expose the duplicate without extra storage?

The key insight

Treat each index as a node, and treat nums[i] as the “next pointer” from that node.

For example, if nums[3] = 5, then node 3 points to node 5. Because every value is in 1..n, every pointer always lands on a valid node. Once the array is viewed this way, it behaves like a linked structure.

Now the important observation appears:

There are n + 1 nodes being followed through only n possible non-zero destinations. That forces a collision. Two different positions eventually point into the same path, and that merge creates a cycle. The duplicate value is exactly the entry point of that cycle.

That is why Floyd’s tortoise-and-hare algorithm is the optimal solution here. It detects a cycle in O(n) time and O(1) extra space, without modifying the array.

How to derive the optimal solution

The clean way to derive the answer is to ask a sequence of smaller questions.

First, what does a duplicate number mean structurally? It means at least two different indices point to the same next node.

Second, what happens if a path in a directed graph merges? After the merge, both paths share the same future. If the graph keeps routing inside a fixed set of nodes, a cycle must eventually appear.

Third, how does one locate the exact duplicate rather than just prove a cycle exists? Floyd’s algorithm solves that too:

Move a slow pointer one step at a time and a fast pointer two steps at a time until they meet somewhere inside the cycle.
Start one pointer back at 0 and leave the other at the meeting point.
Move both one step at a time. The node where they meet is the cycle entrance.
That entrance value is the duplicate number.

This works for the same reason it works in linked lists: the distance math forces the two pointers to meet at the cycle entry in phase two.

Best solution: Floyd’s cycle detection

This is the solution interviewers usually want when they include both “do not modify the array” and “constant extra space.”

Its strengths are simple:

O(n) time
O(1) extra space
no changes to the input

The only part that feels unusual at first is the modeling step. Once the array is reinterpreted as pointers, the rest is standard cycle detection.

Another useful solution: binary search on the value range

There is a second strong solution that is often easier to derive on the spot, even though it is not as fast asymptotically.

Pick a midpoint mid in the value range 1..n, then count how many numbers are <= mid.

If there were no duplicate in that lower half, at most mid values could fall there. If the count is larger than mid, the duplicate must be in the lower half. Otherwise it must be in the upper half.

That gives a binary search on values, not on indices. It uses O(1) extra space and does not modify the array, but it takes O(n log n) time because each binary-search step scans the whole array.

It is a good backup answer when the cycle trick is not immediately obvious.

Python solution

from typing import List


class Solution:
    def findDuplicate(self, numbers: List[int]) -> int:
        """
        Return the duplicated value without modifying the input array.

        The array is treated as a linked structure where index i points to
        index numbers[i]. The duplicate value is the entrance to the cycle.
        """
        intersection = self._find_cycle_intersection(numbers)
        return self._find_cycle_entrance(numbers, intersection)

    def _find_cycle_intersection(self, numbers: List[int]) -> int:
        slow_index = 0
        fast_index = 0

        # Phase 1: find any meeting point inside the cycle.
        while True:
            slow_index = numbers[slow_index]
            fast_index = numbers[numbers[fast_index]]

            if slow_index == fast_index:
                return slow_index

    def _find_cycle_entrance(self, numbers: List[int], intersection: int) -> int:
        seeker_from_start = 0
        seeker_from_cycle = intersection

        # Phase 2: walk from the start and from the intersection at the same
        # speed. They meet at the duplicate value, which is the cycle entry.
        while seeker_from_start != seeker_from_cycle:
            seeker_from_start = numbers[seeker_from_start]
            seeker_from_cycle = numbers[seeker_from_cycle]

        return seeker_from_start


def find_duplicate_via_value_binary_search(numbers: List[int]) -> int:
    """
    Alternative interview-friendly approach.

    Binary search the value range 1..n and use counting to decide which half
    must contain the duplicate.
    """
    left = 1
    right = len(numbers) - 1

    while left < right:
        midpoint = (left + right) // 2
        count_at_or_below_midpoint = sum(value <= midpoint for value in numbers)

        if count_at_or_below_midpoint > midpoint:
            right = midpoint
        else:
            left = midpoint + 1

    return left

The main Solution.findDuplicate method is the one that matches the strongest version of the problem. The binary-search helper is included because it is a realistic alternative an interviewer may ask about once the primary solution is done.

Interview follow-ups

Why does Floyd’s algorithm work on an array instead of a linked list?

The trick is that the array is only pretending to be an array problem. Each value acts like a next pointer, so the structure behaves exactly like a linked graph where every node has one outgoing edge. Since the values are constrained to 1..n, following pointers never leaves the valid range. The duplicate creates a merge, and that merge forces a cycle. Floyd’s algorithm only needs a deterministic “next” operation, so it works here just as well as it does on an actual linked list. The payoff is O(n) time and O(1) extra space.

What if modifying the array were allowed?

Then the solution space becomes much simpler. One common option is cyclic placement: keep swapping each value toward its “correct” index until a value wants to go to a position that already contains the same number. That repeated value is the duplicate. Another common option is to sort in place and scan for equal neighbors. Both approaches are easier to explain than Floyd’s algorithm, but they violate the original constraint because they mutate the input. Their main tradeoff is simplicity versus preserving the array exactly as given.

What if the cycle trick does not come to mind during the interview?

The best fallback is binary search on the value range. For any midpoint mid, count how many numbers are <= mid. If that count is greater than mid, the duplicate must lie in the lower half by the pigeonhole principle; otherwise it lies in the upper half. This approach still uses constant extra space and keeps the array unchanged, which makes it a strong answer. The tradeoff is runtime: it takes O(n log n) instead of O(n).

What if there can be multiple distinct duplicate values instead of exactly one?

Then the elegant cycle-entry interpretation breaks down, because the problem is no longer guaranteed to collapse into one well-defined entrance that represents the single answer. At that point the right approach depends on which constraints remain. If modifying the array is allowed, sign marking or cyclic placement can report duplicates efficiently. If the array must stay unchanged, a hash set is the most direct solution, but it costs O(n) extra space. This is a good follow-up because it tests whether the candidate understands exactly which assumptions make the Floyd solution possible.

Takeaway

This problem looks like it should be solved with a set, but the interview version is really about recognizing hidden structure.

Once the array is reinterpreted as pointers, the duplicate stops being a counting problem and becomes a cycle-entry problem. That modeling step is the whole interview. After that, Floyd’s algorithm is just the cleanest tool for the job.