Generated by Codex with GPT-5

Quick facts

  • Difficulty: MEDIUM
  • Problem: Subsets II
  • Main tags: Array, Backtracking, Bit Manipulation

What the problem is really asking

This is the power set problem with one extra constraint that changes everything: the input can contain duplicate values, but the output cannot contain duplicate subsets.

If the array were all distinct, the job would be straightforward. At each position, either include the number or skip it, and every path through that decision tree would produce a unique subset.

Duplicates break that reasoning. If the input is [1, 2, 2], then the subset [2] can be formed by choosing the first 2 or the second 2. Those are different execution paths, but they represent the same subset, so only one of them should appear in the answer.

So the real question is:

how can the search generate every valid subset while guaranteeing that equivalent choices among equal numbers do not create duplicate results?

The key insight

Sorting is the move that makes the problem manageable.

Once the array is sorted, equal values sit next to each other. That means the algorithm can detect when it is about to start a branch that is indistinguishable from a branch it has already explored at the same recursion depth.

That leads to the core rule:

when iterating choices from a given start position, if the current value is the same as the previous value and the previous value was skipped at this depth, skip the current one too.

In the common backtracking implementation, that rule becomes:

if candidate_index > start_index and nums[candidate_index] == nums[candidate_index - 1]: continue

This does not remove valid subsets. It only removes duplicate ways of producing the same subset.

How to derive the optimal solution

The cleanest way to build intuition is to start from the distinct-elements version.

For a normal subsets problem, the standard depth-first search keeps a growing current_subset, adds a copy of it to the answer, and then tries each remaining element as the next choice. That already generates all subsets in output-sensitive time.

Now add duplicates.

The brute-force version still works if the algorithm generates all subsets and then stores them in a set of tuples to deduplicate later, but that is wasteful. It spends time building repeated subsets and extra space hashing them away afterward.

A better strategy is to prevent duplicates during generation.

After sorting, consider a run like 2, 2, 2. At any fixed recursion level, only the first 2 should be allowed to start a new branch. Once that first 2 is chosen and the recursion goes deeper, the next 2 is allowed there, because [2, 2] is a genuinely different subset from [2]. What must be avoided is starting multiple sibling branches with identical values at the same depth.

That gives the standard optimal interview solution:

  1. sort the input
  2. run DFS/backtracking
  3. append the current subset at every call
  4. skip duplicate sibling choices with the sorted-array check

The complexity is:

  • Time: O(n log n + n * 2^n) in the worst case
  • Extra space: O(n) for the recursion stack, excluding the output

The n * 2^n factor is unavoidable in the worst case because the output itself can contain 2^n subsets, and copying subsets into the result costs proportional time overall.

Best approaches

The most common interview solution is sorted backtracking with duplicate skipping. It is short, easy to reason about, and directly mirrors the structure of the search space.

Another strong approach is to compress duplicates into value-count pairs and choose how many copies of each value to take: zero copies, one copy, two copies, and so on. That is especially elegant when there are many repeated numbers because it models the real choice more explicitly. Both approaches are output-sensitive and effectively optimal for this problem.

The Python implementation below uses sorted backtracking because it is the version most interviewers expect first, and it keeps the duplicate-handling rule extremely visible.

Python solution

from typing import List


class Solution:
    def subsetsWithDup(self, nums: List[int]) -> List[List[int]]:
        sorted_nums = sorted(nums)
        unique_subsets: List[List[int]] = []
        subset_builder: List[int] = []

        def explore_from(start_index: int) -> None:
            # Every recursion state represents one valid subset.
            unique_subsets.append(subset_builder.copy())

            for candidate_index in range(start_index, len(sorted_nums)):
                # At a fixed recursion depth, identical sibling choices would
                # generate duplicate subsets, so only the first one is kept.
                if (
                    candidate_index > start_index
                    and sorted_nums[candidate_index] == sorted_nums[candidate_index - 1]
                ):
                    continue

                subset_builder.append(sorted_nums[candidate_index])
                explore_from(candidate_index + 1)
                subset_builder.pop()

        explore_from(0)
        return unique_subsets

Why this works

The algorithm works because it separates two situations that look similar but are actually different.

If two equal values appear as sibling choices at the same recursion depth, choosing either one first produces the same family of subsets. That is why the later sibling is skipped.

If one equal value has already been chosen and the recursion goes deeper, choosing the next equal value is valid, because the subset now contains one more copy of that number. That produces a genuinely new subset, not a duplicate.

Sorting makes those two cases easy to detect. Equal numbers become adjacent, so the DFS can tell whether it is looking at a repeated sibling choice or a deeper extension of an already chosen value.

Because every unique subset corresponds to exactly one non-skipped search path, the algorithm generates all valid subsets once and only once.

Interview follow-ups

Can you do it iteratively instead of recursively?

Yes. The iterative version starts with [[]] and processes the sorted numbers from left to right. For each number, it extends some existing subsets by appending that number. The duplicate-handling trick is the important part: if the current number is different from the previous one, it can extend every subset built so far; if it is the same as the previous one, it should extend only the subsets that were created in the previous step. That prevents duplicate subsets from being regenerated. The asymptotic complexity is still output-sensitive, but the bookkeeping is slightly less intuitive than the DFS version.

What if there are many duplicates and only a few distinct values?

In that case, it can be cleaner to compress the input into (value, count) pairs. Then the recursion decides how many copies of each distinct value to include, from 0 through count. That approach matches the real structure of the problem very closely and can be easier to reason about when duplicate runs are long. Its running time is proportional to the number of unique subsets, which is the product of (count + 1) across all distinct values, multiplied by the work needed to materialize those subsets.

What if the interviewer asks for only subsets of size k?

The same backtracking framework still works. The recursion simply tracks how many elements have already been chosen and stops descending once the subset reaches size k. It is also useful to prune when there are not enough remaining elements to fill the subset. If duplicates are present, the same sorted-array skip rule still applies at each depth. This version can be much faster than generating the entire power set first because it visits only states that could still lead to a size-k answer.

Practical takeaway

The easiest way to think about this problem is:

sort first, then allow only one branch per distinct value at each recursion depth.

That single rule removes duplicates at the source instead of cleaning them up later. Once that idea clicks, Subsets II becomes a very standard backtracking problem rather than a tricky special case.