LeetCode MEDIUM 215 Kth Largest Element in an Array Summary

Generated by Codex with GPT-5

Quick facts

Difficulty: MEDIUM
Problem: Kth Largest Element in an Array
Main tags: Array, Divide and Conquer, Sorting, Heap (Priority Queue), Quickselect

What the problem is really asking

This problem asks for the value that would appear in position k if the array were sorted in descending order.

That sounds simple, but the important detail is that the task does not ask for the whole sorted array. It only asks for one rank.

That distinction is where the better solutions come from:

full sorting does more work than necessary
the answer depends only on relative order around one pivot position
duplicates still count as separate positions

So if the array is [3, 2, 3, 1, 2, 4, 5, 5, 6] and k = 4, the answer is 4, because the descending order is [6, 5, 5, 4, 3, 3, 2, 2, 1].

Two practical solution paths

There are two standard ways to think about this problem.

The first is a size-k min-heap. Keep only the k largest values seen so far. Whenever the heap grows larger than k, remove the smallest value in the heap. At the end, the heap contains exactly the k largest elements, so its smallest element is the kth largest overall.

The second is quickselect. Instead of fully sorting, quickselect partitions the array around a pivot, then discards the half that cannot contain the target rank. That gives expected O(n) time with randomized pivot selection, which is better than sorting and is the usual optimal answer for the one-shot version of the problem.

In interviews, both are valid. The min-heap solution is extremely reliable and easy to explain. Quickselect is the better asymptotic choice when the interviewer asks for the best expected runtime.

How to derive this in an interview

A clean interview path usually looks like this:

Start with sorting. If the array were sorted descending, the answer would be at index k - 1. That immediately gives a correct O(n log n) baseline.

Then ask whether the whole sorted order is really needed. It is not. Only the element with one specific rank matters.

That observation leads naturally to two improvements:

if only the top k values matter, keep a min-heap of size k
if only one final rank matters, partition like quicksort and recurse into the side that contains the answer

That is the key conceptual step: move from “sort everything” to “preserve only the information needed to identify one rank.”

Python solution

from __future__ import annotations

import heapq
import random
from typing import List


class KthLargestFinder:
    """Provide practical strategies for finding the kth largest element."""

    def find_with_min_heap(self, values: List[int], k: int) -> int:
        """
        Keep only the k largest values seen so far.

        Time: O(n log k)
        Space: O(k)
        """
        self._validate_k(values, k)

        largest_values_heap: List[int] = []
        for value in values:
            heapq.heappush(largest_values_heap, value)
            if len(largest_values_heap) > k:
                heapq.heappop(largest_values_heap)

        return largest_values_heap[0]

    def find_with_quickselect(self, values: List[int], k: int) -> int:
        """
        Use in-place quickselect to find the target rank.

        Time: O(n) expected, O(n^2) in the worst case
        Space: O(1) extra beyond the copied input
        """
        self._validate_k(values, k)

        # Copy the input so callers do not observe the partitioning mutations.
        working_values = list(values)
        target_index = len(working_values) - k

        left = 0
        right = len(working_values) - 1

        while left <= right:
            pivot_index = self._partition(working_values, left, right)

            if pivot_index == target_index:
                return working_values[pivot_index]

            if pivot_index < target_index:
                left = pivot_index + 1
            else:
                right = pivot_index - 1

        raise RuntimeError("Quickselect failed to locate the target index.")

    def _partition(self, values: List[int], left: int, right: int) -> int:
        """
        Partition the range so every value < pivot is moved to the left side.
        """
        pivot_index = random.randint(left, right)
        values[pivot_index], values[right] = values[right], values[pivot_index]

        pivot_value = values[right]
        store_index = left

        for scan_index in range(left, right):
            if values[scan_index] < pivot_value:
                values[store_index], values[scan_index] = (
                    values[scan_index],
                    values[store_index],
                )
                store_index += 1

        values[store_index], values[right] = values[right], values[store_index]
        return store_index

    def _validate_k(self, values: List[int], k: int) -> None:
        if not 1 <= k <= len(values):
            raise ValueError("k must be between 1 and the array length.")


class Solution:
    def findKthLargest(self, nums: List[int], k: int) -> int:
        finder = KthLargestFinder()
        return finder.find_with_quickselect(nums, k)

Why this works

The heap solution works because it maintains a very specific invariant: after processing each number, the heap contains the k largest values seen so far. If a smaller value arrives, it is either discarded immediately or removed later when the heap exceeds size k. That means the heap’s root is always the smallest among the retained top k values, which is exactly the kth largest element.

Quickselect works because partitioning places one pivot into its final sorted position. After partitioning, every value to the left is smaller and every value to the right is greater than or equal to the pivot under this implementation. If the pivot lands at the target index, the answer has been found. Otherwise, the answer must lie entirely on one side, so the other side can be discarded.

That is why quickselect avoids the full cost of sorting: every step narrows the search to the only partition that can still contain the target rank. Randomizing the pivot is what gives the standard expected O(n) runtime on arbitrary input.

Interview follow-ups

What if the interviewer asks for the best worst-case guarantee?

Plain quickselect is expected O(n) but worst-case O(n^2) if the pivots are consistently poor. If the interviewer explicitly wants a better worst-case guarantee, there are two practical answers.

The simpler answer is to use a min-heap of size k, which guarantees O(n log k) time and O(k) space. The more theoretical answer is median-of-medians quickselect, which achieves worst-case O(n) but is much more complex and rarely needed in production interviews.

The tradeoff is clarity versus optimal theory. In most interviews, saying “randomized quickselect for expected O(n), heap for a simpler deterministic bound” is the strongest practical response.

What changes if the numbers arrive as a stream instead of one fixed array?

In a streaming version, quickselect is no longer a good fit because it expects all values to be present so it can partition them. The right tool is a min-heap of size k.

Each new number is pushed into the heap. If the heap grows past size k, the smallest element is removed. At every moment, the heap still contains the k largest values seen so far, so the root remains the current kth largest.

This works well because the state stays small and incremental. The complexity is O(log k) per incoming element with O(k) memory, which is exactly what a streaming system wants.

What if the interviewer asks for the kth smallest instead?

That version is almost the same problem with the rank interpreted from the other direction.

For a heap-based answer, use a max-heap of size k over the smallest values seen so far, or in Python simulate that with negated numbers. For quickselect, change the target index from len(nums) - k to k - 1 if partitioning by ascending order.

The important idea is that the machinery does not change. Only the target rank and heap orientation change. This is a good follow-up because it tests whether the candidate understands the invariant instead of memorizing one exact implementation.

What if the array will be queried many times for different values of k?

If many k queries will be asked against the same fixed array, repeated quickselect calls are usually the wrong choice because each query repeats a lot of work.

In that setting, sorting once in O(n log n) can become the better overall strategy. After sorting, each query is answered in O(1) by indexing into the sorted array. If updates are also required, then a more advanced data structure such as an order-statistics tree would be appropriate, though that is beyond what most LeetCode-style interviews expect.

The tradeoff is between one-shot optimality and repeated-query efficiency. Quickselect shines for a single query. Preprocessing shines when many queries share the same data.

Practical takeaway

This problem is a good reminder that rank-selection problems are not full sorting problems.

If the goal is one answer from one array, randomized quickselect is the best expected-time tool. If the goal is robustness, streaming support, or a simpler explanation, a size-k min-heap is often the better interview choice.