Generated by Codex with GPT-5
Difficulty: MEDIUM
Problem: Remove Duplicates from Sorted Array II
Problem gist
The input is a sorted array nums. The task is to modify it in place so every distinct value appears at most two times, while keeping the original relative order of the remaining values. The function returns k, the length of the valid prefix. Anything after index k - 1 does not matter.
For example, [0, 0, 1, 1, 1, 1, 2, 3, 3] should become a prefix like [0, 0, 1, 1, 2, 3, 3], and the function should return 7.
The sorted property is the whole reason the problem is manageable in one pass. Equal values are grouped together, so deciding whether the current value can be kept only requires looking at the compacted prefix, not searching the whole array.
Key idea
Keep a write_index that marks where the next accepted value should be written. As the algorithm scans from left to right, the prefix nums[:write_index] is always the cleaned version of everything seen so far.
When each value may appear at most twice, a new value is safe to keep when either:
- fewer than two values have been written so far, or
- the value two positions behind the write cursor is different from the current value.
That second test is the important trick. If nums[write_index - 2] == value, then the compacted prefix already ends with two copies of value, so keeping another one would create three copies. If it is different, keeping the current value still respects the limit.
How to derive it
A brute-force mental model is to count each run of equal values and keep the first two from each run. That works because the array is sorted, but it requires explicitly tracking the current value and how many times it has appeared.
The write-pointer version is the same idea with less state. Instead of storing a separate count, it asks: “Would appending this value make the prefix contain too many copies?” Since duplicates are contiguous, checking the value two kept positions back answers that question exactly.
This gives a clean invariant:
After processing each input position, nums[:write_index] is the correct compressed version of the values scanned so far.
If the current value passes the keep test, write it at write_index and advance write_index. If it fails, skip it. Because write_index never moves ahead of the read position, overwriting earlier positions does not destroy unread input.
Python solution
from collections.abc import MutableSequence
from typing import List
MAX_ALLOWED_COPIES = 2
def can_append_without_exceeding_limit(
values: MutableSequence[int],
write_index: int,
candidate: int,
max_allowed_copies: int,
) -> bool:
"""Return whether candidate can be appended to the compacted prefix."""
if write_index < max_allowed_copies:
return True
# In a sorted array, matching the value this far back means the prefix
# already contains max_allowed_copies copies of candidate.
return values[write_index - max_allowed_copies] != candidate
def compress_sorted_array(
values: MutableSequence[int],
max_allowed_copies: int,
) -> int:
"""Compress a sorted array in place and return the valid prefix length."""
if max_allowed_copies <= 0:
return 0
write_index = 0
for read_index in range(len(values)):
candidate = values[read_index]
if can_append_without_exceeding_limit(
values,
write_index,
candidate,
max_allowed_copies,
):
values[write_index] = candidate
write_index += 1
return write_index
class Solution:
def removeDuplicates(self, nums: List[int]) -> int:
return compress_sorted_array(nums, MAX_ALLOWED_COPIES)The time complexity is O(n) because each element is read once. The extra space complexity is O(1) because the array is edited in place and the algorithm only stores a few indexes and constants.
Interview follow-ups
What if each value can appear at most m times instead of at most two?
Use the same write-pointer pattern and replace 2 with m. A value is safe to append when write_index < m or when nums[write_index - m] != value. The reasoning is identical: if the value m kept positions back is the same, then the compacted prefix already contains m copies of the current value.
This still runs in O(n) time and O(1) extra space for any fixed m. The only extra edge case is m <= 0, where the valid prefix length is 0.
What if the array is not sorted?
The write_index - 2 trick no longer works because equal values are not grouped together. In an unsorted array, seeing a value two positions back says very little about how many times that value has appeared in the kept prefix.
The usual approach is to keep a hash map from value to count. Scan the array once, append a value to the write prefix only while its count is below the allowed limit, and increment the count when it is kept. That preserves the original relative order, but it uses O(u) extra space where u is the number of distinct values. The time complexity remains O(n) on average.
What if the interviewer asks for exactly one copy of each value?
This becomes the classic “remove duplicates from sorted array” problem. The same generalized solution works with max_allowed_copies = 1. The keep condition becomes write_index < 1 or nums[write_index - 1] != value, which says: keep the first value, then only keep a value when it differs from the last kept value.
The complexity is still O(n) time and O(1) extra space. This is a useful way to show that the two-copy version is not a separate trick, just a small generalization of the one-copy version.
What if the function must return a new array instead of modifying in place?
Build a result list using the same keep condition against the result prefix. Append the current value when the result has fewer than two items or when result[-2] != value. This is often clearer in production code when callers should not observe mutation.
The tradeoff is memory. Returning a new array takes O(k) extra space for the kept values, where k is the result length. The time complexity stays O(n).
What if the input is a stream of sorted values?
If the stream is sorted, no random access is required. Track the previous value and how many times it has been emitted for the current run. When a new value arrives, reset the count to one and emit it. When the same value arrives, emit it only if the count is still below two.
This uses O(1) memory and works well for large data because it never needs the full array. The tradeoff is that it produces output incrementally rather than modifying an existing array prefix.