Generated by Codex with GPT-5
Quick facts
- Difficulty:
HARD - Problem: Longest Balanced Subarray II
- Topics:
Array,Hash Table,Divide and Conquer,Segment Tree,Prefix Sum
Problem gist
The input is an integer array. A subarray is balanced when the number of distinct even values inside it equals the number of distinct odd values inside it. Repeated copies of the same number only count once, so [3, 2, 2, 5, 4] has two distinct odd values, 3 and 5, and two distinct even values, 2 and 4.
The task is to return the length of the longest balanced subarray. The hard part is not checking parity; it is handling distinct values over every possible subarray without rebuilding sets for each left and right boundary.
Deriving the optimal approach
If each distinct odd value contributes +1 and each distinct even value contributes -1, then a subarray is balanced exactly when its distinct-value contribution sum is zero.
For a fixed right endpoint, imagine asking every possible left boundary what the contribution sum of nums[left:right] would be. If the array had no duplicates, this would be a normal prefix-sum problem: equal prefix balances mark a zero-sum interval. Duplicates are what make it harder, because a value should contribute only from its most recent occurrence within the chosen subarray.
The useful viewpoint is to track each value’s latest position while sweeping the right endpoint from left to right. When a new value appears at position i, its parity contribution starts affecting all prefix positions from i onward. When the same value appears again, its old latest occurrence stops being the latest one, so its old contribution must be removed from a suffix of prefix positions before the new contribution is added.
That is a range-add problem over prefix positions. A segment tree stores the minimum and maximum prefix balance in each range and supports lazy range additions. After processing each right endpoint, the current total balance is known. The earliest prefix position with that same balance gives the longest balanced subarray ending at this right endpoint, because subtracting equal balances leaves zero distinct odd-minus-even contribution.
The segment tree can find that earliest position by walking left first whenever the target balance is within the left child’s minimum-to-maximum range. Prefix balances change by one contribution step at a time, so if the target lies between a segment’s min and max, that segment contains a position with the target balance.
Python solution
from typing import Dict, List
class RangeAddSegmentTree:
def __init__(self, highest_index: int) -> None:
self.highest_index = highest_index
capacity = 4 * (highest_index + 1)
self.min_balance = [0] * capacity
self.max_balance = [0] * capacity
self.lazy_delta = [0] * capacity
def add(self, left: int, right: int, delta: int) -> None:
if left > right:
return
self._add(node=1, node_left=0, node_right=self.highest_index,
query_left=left, query_right=right, delta=delta)
def find_first(self, target: int) -> int:
return self._find_first(
node=1,
node_left=0,
node_right=self.highest_index,
target=target,
)
def _apply(self, node: int, delta: int) -> None:
self.min_balance[node] += delta
self.max_balance[node] += delta
self.lazy_delta[node] += delta
def _push(self, node: int) -> None:
delta = self.lazy_delta[node]
if delta == 0:
return
self._apply(node * 2, delta)
self._apply(node * 2 + 1, delta)
self.lazy_delta[node] = 0
def _pull(self, node: int) -> None:
left_child = node * 2
right_child = left_child + 1
self.min_balance[node] = min(
self.min_balance[left_child],
self.min_balance[right_child],
)
self.max_balance[node] = max(
self.max_balance[left_child],
self.max_balance[right_child],
)
def _add(
self,
node: int,
node_left: int,
node_right: int,
query_left: int,
query_right: int,
delta: int,
) -> None:
if query_left <= node_left and node_right <= query_right:
self._apply(node, delta)
return
self._push(node)
midpoint = (node_left + node_right) // 2
if query_left <= midpoint:
self._add(node * 2, node_left, midpoint, query_left, query_right, delta)
if midpoint < query_right:
self._add(
node * 2 + 1,
midpoint + 1,
node_right,
query_left,
query_right,
delta,
)
self._pull(node)
def _find_first(
self,
node: int,
node_left: int,
node_right: int,
target: int,
) -> int:
if node_left == node_right:
return node_left
self._push(node)
midpoint = (node_left + node_right) // 2
left_child = node * 2
if self.min_balance[left_child] <= target <= self.max_balance[left_child]:
return self._find_first(left_child, node_left, midpoint, target)
return self._find_first(left_child + 1, midpoint + 1, node_right, target)
class Solution:
def longestBalanced(self, nums: List[int]) -> int:
n = len(nums)
segment_tree = RangeAddSegmentTree(n)
last_position_by_value: Dict[int, int] = {}
total_balance = 0
best_length = 0
for right_position, value in enumerate(nums, start=1):
contribution = 1 if value % 2 == 1 else -1
previous_position = last_position_by_value.get(value)
if previous_position is not None:
# The old latest occurrence no longer represents this value.
segment_tree.add(previous_position, n, -contribution)
total_balance -= contribution
last_position_by_value[value] = right_position
# The new latest occurrence contributes to every prefix at or after it.
segment_tree.add(right_position, n, contribution)
total_balance += contribution
earliest_same_balance = segment_tree.find_first(total_balance)
best_length = max(best_length, right_position - earliest_same_balance)
return best_lengthEach value occurrence causes at most two lazy range additions and one segment-tree search. The time complexity is O(n log n), and the space complexity is O(n) for the tree and the latest-position hash table.
Common mistakes
The most common wrong solution is a sliding window with counts of distinct odds and evens. It is tempting because the target is a longest subarray, but the condition is not monotonic: adding or removing a value can move the balance in either direction, and duplicates can change nothing at all. There is no simple “shrink until valid” rule.
Another mistake is treating every odd or even occurrence as a contribution. The problem counts distinct values, so [2, 2, 2, 3] has one distinct even value and one distinct odd value, not three evens and one odd.
A subtler bug is failing to revoke a duplicate’s old contribution. If value x last appeared at position p and appears again at i, then prefixes from p through i - 1 should no longer see x as already included before the current start. That is exactly the suffix update the segment tree handles.
Interview follow-ups
How would the solution change if every occurrence counted instead of every distinct value?
The problem becomes a standard prefix-sum question. Map odd numbers to +1 and even numbers to -1, keep the earliest index where each balance value appeared, and update the answer whenever the same balance appears again. Equal balances mean the subarray between them has total contribution zero.
This version needs no segment tree because contributions never move when a duplicate appears. Each element adds one fixed prefix-sum step, so the algorithm runs in O(n) time and uses O(n) space for the earliest-balance map.
Can the hard version be solved with a normal sliding window?
Not cleanly. Sliding windows work best when a condition becomes easier or harder in one direction as the window expands. Here, adding one number can make the window balanced, unbalanced, or unchanged depending on whether that exact value is already present. Removing from the left has similarly non-monotonic effects.
An interviewer may expect the candidate to reject the sliding-window instinct and explain why the segment tree is needed: for each right endpoint, all left endpoints remain plausible, and duplicate revocations update a whole range of those possibilities at once.
What if the interviewer asks for the actual subarray, not just its length?
Store the best pair of boundaries whenever right_position - earliest_same_balance improves the answer. With the one-based prefix indexing used in the code, the subarray is nums[earliest_same_balance:right_position] in Python slice form.
The segment tree search already returns the earliest valid prefix for the current right endpoint, so recording boundaries does not change the asymptotic cost. The algorithm still runs in O(n log n) time and O(n) space.
How can the code be adapted if values are much larger than 10^5?
Nothing essential changes because the algorithm never indexes directly by value. It stores latest positions in a hash table keyed by the actual integer value. Large values only affect hash keys, not segment-tree size.
The segment tree size depends on n, the number of array positions, because it manages prefix indices from 0 through n. The complexity remains O(n log n) time and O(n) space.
Why does the segment-tree query use min and max instead of storing every balance position?
The query only needs the earliest prefix index whose current balance equals a target. Since adjacent prefix balances differ by a single active contribution step, a segment whose minimum is at most the target and whose maximum is at least the target must contain some prefix position with that target balance.
That lets the tree descend left first using only range min and range max. Storing every balance value per node would make updates too expensive, while min and max stay cheap under lazy range addition.