LeetCode MEDIUM 36 Valid Sudoku Summary

Generated by Codex with GPT-5

Quick facts

Difficulty: MEDIUM
Problem: Valid Sudoku
Main tags: Array, Hash Table, Matrix

What the problem is really asking

This problem is not asking to solve the Sudoku board.

It is only asking whether the current partially filled board is still legal. That means every filled digit must satisfy three rules at the same time:

it has not already appeared in the same row
it has not already appeared in the same column
it has not already appeared in the same 3 x 3 box

Empty cells, written as ., do not matter.

So the real task is to scan the board once and answer one question for each digit:

“Have I already seen this digit in any place that would make it illegal here?”

The cleanest way to derive the solution

Start from the most direct interpretation of the rules.

For every filled cell, the algorithm needs to remember which digits have already been seen in:

each row
each column
each 3 x 3 box

That immediately suggests three tracking structures:

rows[row_index]
columns[column_index]
boxes[box_index]

Each one stores the digits that have appeared so far.

Then the scan becomes simple:

skip . cells
compute which row, column, and box the digit belongs to
if the digit is already present in any of those trackers, the board is invalid
otherwise record it in all three places and continue

That is already the optimal interview solution in asymptotic terms. The board is visited once, and each lookup or insert is constant time on average.

Best approaches

The most practical solution uses three arrays of sets. It is easy to explain, easy to code correctly, and runs in O(n^2) time for an n x n board.

A slightly more optimized variant replaces sets with bitmasks. The idea is the same: instead of storing "5" in a set, it turns on bit 5 inside an integer mask for the row, column, and box. That saves some constant-factor overhead, but it is a little less readable. In most interviews, the set-based solution is the better default unless the interviewer explicitly asks for tighter constant factors.

Why the box index formula works

The only part that can feel non-obvious is mapping a cell to one of the nine 3 x 3 boxes.

The trick is to group rows and columns in chunks of three:

rows 0-2 belong to the top band
rows 3-5 belong to the middle band
rows 6-8 belong to the bottom band

The same idea applies to columns. So the box number is:

(row_index // 3) * 3 + (column_index // 3)

That converts a two-dimensional box position into one index from 0 to 8.

Python solution

from typing import List, Set


class Solution:
    BOARD_SIZE = 9
    BOX_SIZE = 3
    EMPTY_CELL = "."

    def isValidSudoku(self, board: List[List[str]]) -> bool:
        rows_seen: List[Set[str]] = [set() for _ in range(self.BOARD_SIZE)]
        columns_seen: List[Set[str]] = [set() for _ in range(self.BOARD_SIZE)]
        boxes_seen: List[Set[str]] = [set() for _ in range(self.BOARD_SIZE)]

        for row_index in range(self.BOARD_SIZE):
            for column_index in range(self.BOARD_SIZE):
                value = board[row_index][column_index]

                if value == self.EMPTY_CELL:
                    continue

                box_index = self._box_index(
                    row_index=row_index,
                    column_index=column_index,
                )

                if self._creates_duplicate(
                    value=value,
                    row_seen=rows_seen[row_index],
                    column_seen=columns_seen[column_index],
                    box_seen=boxes_seen[box_index],
                ):
                    return False

                rows_seen[row_index].add(value)
                columns_seen[column_index].add(value)
                boxes_seen[box_index].add(value)

        return True

    def _box_index(self, row_index: int, column_index: int) -> int:
        return (row_index // self.BOX_SIZE) * self.BOX_SIZE + (
            column_index // self.BOX_SIZE
        )

    @staticmethod
    def _creates_duplicate(
        value: str,
        row_seen: Set[str],
        column_seen: Set[str],
        box_seen: Set[str],
    ) -> bool:
        return (
            value in row_seen
            or value in column_seen
            or value in box_seen
        )

This implementation keeps the logic very direct: every digit is checked against the three Sudoku constraints exactly once, and the helper methods isolate the only two pieces of reusable logic.

For the fixed 9 x 9 LeetCode board, both time and space are effectively constant. If the board is treated as a generalized n x n board, the algorithm is O(n^2) time and O(n^2) space.

Interview follow-ups

Can this be done with less memory?

Yes. The standard follow-up is to replace the sets with integer bitmasks. Each row, column, and box keeps one integer where bit 1 means digit 1 has been seen, bit 2 means digit 2 has been seen, and so on. For each cell, the algorithm computes the digit’s bit and checks whether that bit is already set in the row, column, or box mask. If it is, the board is invalid; otherwise the bit is turned on in all three masks.

Why it works: it is exactly the same bookkeeping as the set-based solution, just stored more compactly. Membership testing becomes a bitwise AND, and insertion becomes a bitwise OR.

Complexity tradeoff: the asymptotic complexity stays the same, but the constant factors improve. The downside is readability. For most interviews, sets are easier to explain and less error-prone, while bitmasks are a good optimization follow-up.

What if the board were updated many times and each update needed to be validated quickly?

In that version, it is better to keep persistent row, column, and box state instead of rebuilding it from scratch after every edit. An initialization pass can build those trackers once. Then when a cell changes, the algorithm removes the old digit from its row, column, and box counts, checks whether the new digit would conflict, and inserts it if the move is legal.

Why it works: the Sudoku constraints are local to one row, one column, and one box. When one cell changes, only those three structures need to be updated.

Complexity tradeoff: the one-time setup still costs O(n^2), but each later edit can be validated in O(1) time. The tradeoff is more state to maintain carefully, especially when removing digits or supporting undo operations.

What if the interviewer generalizes the puzzle to an `n x n` board instead of `9 x 9`?

The same strategy still works as long as the board is divided into equal square boxes. Instead of hardcoding 9 and 3, compute board_size = len(board) and box_size = int(sqrt(board_size)). Then build row, column, and box trackers sized to that board. The box index formula stays the same shape:

(row_index // box_size) * box_size + (column_index // box_size)

Why it works: the legality rule has not changed. A symbol is still invalid exactly when it repeats within its row, column, or box, so the same tracking idea scales naturally.

Complexity tradeoff: for a generalized board_size x board_size board, validation is O(board_size^2) time. Space is also O(board_size^2) if trackers are stored explicitly. The main extra care is validating that the box structure is actually square and compatible with the board dimensions.

Takeaway

The shortest path to the answer is to stop thinking about Sudoku as a puzzle and start thinking about it as a duplicate-detection problem across three overlapping views of the same cell: row, column, and box.

Once that framing is clear, the solution becomes a straightforward single pass.