Generated by Codex with GPT-5
Quick facts
- Difficulty:
MEDIUM - Problem: Kth Largest Element in an Array
- Main tags:
Array,Divide and Conquer,Sorting,Heap (Priority Queue),Quickselect
What the problem is really asking
This problem asks for the value that would appear in position k if the array
were sorted in descending order.
That sounds simple, but the important detail is that the task does not ask for the whole sorted array. It only asks for one rank.
That distinction is where the better solutions come from:
- full sorting does more work than necessary
- the answer depends only on relative order around one pivot position
- duplicates still count as separate positions
So if the array is [3, 2, 3, 1, 2, 4, 5, 5, 6] and k = 4, the answer is
4, because the descending order is [6, 5, 5, 4, 3, 3, 2, 2, 1].
Two practical solution paths
There are two standard ways to think about this problem.
The first is a size-k min-heap. Keep only the k largest values seen so far.
Whenever the heap grows larger than k, remove the smallest value in the heap.
At the end, the heap contains exactly the k largest elements, so its smallest
element is the kth largest overall.
The second is quickselect. Instead of fully sorting, quickselect partitions the
array around a pivot, then discards the half that cannot contain the target
rank. That gives expected O(n) time with randomized pivot selection, which is
better than sorting and is the usual optimal answer for the one-shot version of
the problem.
In interviews, both are valid. The min-heap solution is extremely reliable and easy to explain. Quickselect is the better asymptotic choice when the interviewer asks for the best expected runtime.
How to derive this in an interview
A clean interview path usually looks like this:
Start with sorting. If the array were sorted descending, the answer would be at
index k - 1. That immediately gives a correct O(n log n) baseline.
Then ask whether the whole sorted order is really needed. It is not. Only the element with one specific rank matters.
That observation leads naturally to two improvements:
- if only the top
kvalues matter, keep a min-heap of sizek - if only one final rank matters, partition like quicksort and recurse into the side that contains the answer
That is the key conceptual step: move from “sort everything” to “preserve only the information needed to identify one rank.”
Python solution
from __future__ import annotations
import heapq
import random
from typing import List
class KthLargestFinder:
"""Provide practical strategies for finding the kth largest element."""
def find_with_min_heap(self, values: List[int], k: int) -> int:
"""
Keep only the k largest values seen so far.
Time: O(n log k)
Space: O(k)
"""
self._validate_k(values, k)
largest_values_heap: List[int] = []
for value in values:
heapq.heappush(largest_values_heap, value)
if len(largest_values_heap) > k:
heapq.heappop(largest_values_heap)
return largest_values_heap[0]
def find_with_quickselect(self, values: List[int], k: int) -> int:
"""
Use in-place quickselect to find the target rank.
Time: O(n) expected, O(n^2) in the worst case
Space: O(1) extra beyond the copied input
"""
self._validate_k(values, k)
# Copy the input so callers do not observe the partitioning mutations.
working_values = list(values)
target_index = len(working_values) - k
left = 0
right = len(working_values) - 1
while left <= right:
pivot_index = self._partition(working_values, left, right)
if pivot_index == target_index:
return working_values[pivot_index]
if pivot_index < target_index:
left = pivot_index + 1
else:
right = pivot_index - 1
raise RuntimeError("Quickselect failed to locate the target index.")
def _partition(self, values: List[int], left: int, right: int) -> int:
"""
Partition the range so every value < pivot is moved to the left side.
"""
pivot_index = random.randint(left, right)
values[pivot_index], values[right] = values[right], values[pivot_index]
pivot_value = values[right]
store_index = left
for scan_index in range(left, right):
if values[scan_index] < pivot_value:
values[store_index], values[scan_index] = (
values[scan_index],
values[store_index],
)
store_index += 1
values[store_index], values[right] = values[right], values[store_index]
return store_index
def _validate_k(self, values: List[int], k: int) -> None:
if not 1 <= k <= len(values):
raise ValueError("k must be between 1 and the array length.")
class Solution:
def findKthLargest(self, nums: List[int], k: int) -> int:
finder = KthLargestFinder()
return finder.find_with_quickselect(nums, k)Why this works
The heap solution works because it maintains a very specific invariant: after
processing each number, the heap contains the k largest values seen so far.
If a smaller value arrives, it is either discarded immediately or removed later
when the heap exceeds size k. That means the heap’s root is always the
smallest among the retained top k values, which is exactly the kth largest
element.
Quickselect works because partitioning places one pivot into its final sorted position. After partitioning, every value to the left is smaller and every value to the right is greater than or equal to the pivot under this implementation. If the pivot lands at the target index, the answer has been found. Otherwise, the answer must lie entirely on one side, so the other side can be discarded.
That is why quickselect avoids the full cost of sorting: every step narrows the
search to the only partition that can still contain the target rank. Randomizing
the pivot is what gives the standard expected O(n) runtime on arbitrary input.
Interview follow-ups
What if the interviewer asks for the best worst-case guarantee?
Plain quickselect is expected O(n) but worst-case O(n^2) if the pivots are
consistently poor. If the interviewer explicitly wants a better worst-case
guarantee, there are two practical answers.
The simpler answer is to use a min-heap of size k, which guarantees
O(n log k) time and O(k) space. The more theoretical answer is
median-of-medians quickselect, which achieves worst-case O(n) but is much
more complex and rarely needed in production interviews.
The tradeoff is clarity versus optimal theory. In most interviews, saying
“randomized quickselect for expected O(n), heap for a simpler deterministic
bound” is the strongest practical response.
What changes if the numbers arrive as a stream instead of one fixed array?
In a streaming version, quickselect is no longer a good fit because it expects
all values to be present so it can partition them. The right tool is a min-heap
of size k.
Each new number is pushed into the heap. If the heap grows past size k, the
smallest element is removed. At every moment, the heap still contains the k
largest values seen so far, so the root remains the current kth largest.
This works well because the state stays small and incremental. The complexity is
O(log k) per incoming element with O(k) memory, which is exactly what a
streaming system wants.
What if the interviewer asks for the kth smallest instead?
That version is almost the same problem with the rank interpreted from the other direction.
For a heap-based answer, use a max-heap of size k over the smallest values
seen so far, or in Python simulate that with negated numbers. For quickselect,
change the target index from len(nums) - k to k - 1 if partitioning by
ascending order.
The important idea is that the machinery does not change. Only the target rank and heap orientation change. This is a good follow-up because it tests whether the candidate understands the invariant instead of memorizing one exact implementation.
What if the array will be queried many times for different values of k?
If many k queries will be asked against the same fixed array, repeated
quickselect calls are usually the wrong choice because each query repeats a lot
of work.
In that setting, sorting once in O(n log n) can become the better overall
strategy. After sorting, each query is answered in O(1) by indexing into the
sorted array. If updates are also required, then a more advanced data structure
such as an order-statistics tree would be appropriate, though that is beyond
what most LeetCode-style interviews expect.
The tradeoff is between one-shot optimality and repeated-query efficiency. Quickselect shines for a single query. Preprocessing shines when many queries share the same data.
Practical takeaway
This problem is a good reminder that rank-selection problems are not full sorting problems.
If the goal is one answer from one array, randomized quickselect is the best
expected-time tool. If the goal is robustness, streaming support, or a simpler
explanation, a size-k min-heap is often the better interview choice.