LeetCode HARD 212 Word Search II Summary

Generated by Codex with GPT-5

Quick facts

Difficulty: HARD
Problem: Word Search II
Main tags: Array, String, Backtracking, Trie, Matrix

What the problem is really asking

The input gives a letter grid and a list of candidate words. A word is valid if it can be traced through horizontally or vertically adjacent cells without reusing the same cell inside that one word.

The important detail is that the board is shared across all words. The task is not “check one word.” It is “find every dictionary word that appears anywhere in the board.”

That changes the problem completely. Solving each word independently repeats the same board exploration over and over again.

The straightforward idea and why it is too slow

The first correct approach is:

Take one word.
Run a DFS from every board cell to see whether that word can be formed.
Repeat for every other word.

That works logically, but it scales badly. If there are many words with shared prefixes such as "oa", "oat", and "oath", the algorithm keeps rediscovering the same partial paths from scratch.

That repeated work is the main clue. The best solution should share prefix work across words instead of treating each word as a completely separate search.

The optimal idea: trie plus backtracking

The standard optimal approach is to store the dictionary in a trie, then run one DFS over the board while walking that trie at the same time.

The trie does two jobs:

it tells the search whether the current path is still a prefix of any word
it tells the search when a full word has been completed

That means the board DFS can stop immediately when a path no longer matches any dictionary prefix. This is the key optimization. Instead of exploring every possible character path, the search explores only paths that could still become a real word from the input list.

The full strategy is:

Build a trie from all words.
Start a DFS from each board cell whose letter exists as a trie root child.
During DFS, move through the board and through the trie together.
If a trie node stores a complete word, record it as found.
Mark the current board cell as visited, recurse to the four neighbors, then restore the character when backtracking finishes.
After exploring a branch, prune empty trie nodes so future searches skip dead prefixes even faster.

How to derive this in an interview

A clean derivation usually looks like this:

First, solve the easier problem of checking one word with DFS and backtracking. That establishes the board-search pattern and the “cannot reuse a cell in the same path” constraint.

Then ask what becomes expensive when the input contains many words. The answer is repeated prefix work. Many words begin the same way, so the search should share those prefixes.

Once that is clear, a trie becomes the natural data structure. It is built exactly for prefix sharing. From there, the combined DFS is the obvious next step: each move in the grid also moves one step deeper in the trie.

Why the trie-based search is correct

The core invariant is simple:

at every DFS step, the current board path exactly matches the prefix represented by the current trie node.

If the next board character is not a child in the trie, no word in the dictionary can continue from that path, so stopping immediately is safe. If the trie node stores a full word, that word has been spelled by a valid board path and should be added to the answer.

Backtracking preserves correctness because each recursive call temporarily reserves one cell for the current path and then restores it before other paths are explored. That guarantees a cell is never reused within one candidate word, but it can still be reused by completely different searches later.

The branch-pruning optimization is also safe. Once a trie node has no children and no stored word, no remaining search can ever use that prefix again, so removing it cannot hide a valid answer.

Complexity and practical tradeoffs

Building the trie costs O(total_characters_in_words).

The DFS portion is usually described as O(rows * cols * 4^L) in the worst case, where L is the maximum word length, because each search path can branch in up to four directions. In practice, the trie cuts that down sharply by rejecting dead prefixes early, and pruning makes later searches cheaper after words are found.

The extra space is O(total_characters_in_words) for the trie plus O(L) recursion depth for the current DFS path, ignoring the output list itself.

Python solution

The implementation below uses a trie to share prefixes across all candidate words. During the DFS, it marks cells in-place, records completed words once, and prunes exhausted trie branches so later searches do less work.

from __future__ import annotations

from dataclasses import dataclass, field
from typing import Dict, List, Optional


@dataclass
class TrieNode:
    children: Dict[str, "TrieNode"] = field(default_factory=dict)
    complete_word: Optional[str] = None


def build_trie(words: List[str]) -> TrieNode:
    root = TrieNode()

    # Deduplicate defensively so the same word is not inserted twice.
    for word in dict.fromkeys(words):
        current_node = root
        for character in word:
            if character not in current_node.children:
                current_node.children[character] = TrieNode()
            current_node = current_node.children[character]
        current_node.complete_word = word

    return root


def find_dictionary_words(board: List[List[str]], words: List[str]) -> List[str]:
    if not board or not board[0] or not words:
        return []

    row_count = len(board)
    column_count = len(board[0])
    trie_root = build_trie(words)
    matched_words: List[str] = []

    def backtrack(row_index: int, column_index: int, parent_node: TrieNode) -> None:
        current_character = board[row_index][column_index]
        current_node = parent_node.children.get(current_character)

        if current_node is None:
            return

        if current_node.complete_word is not None:
            matched_words.append(current_node.complete_word)
            # Clear the terminal marker so the same word is reported once.
            current_node.complete_word = None

        board[row_index][column_index] = "#"

        for row_offset, column_offset in ((1, 0), (-1, 0), (0, 1), (0, -1)):
            next_row = row_index + row_offset
            next_column = column_index + column_offset

            if not (0 <= next_row < row_count and 0 <= next_column < column_count):
                continue

            next_character = board[next_row][next_column]
            if next_character == "#":
                continue

            if next_character in current_node.children:
                backtrack(next_row, next_column, current_node)

        board[row_index][column_index] = current_character

        # Remove exhausted branches so later DFS calls skip them entirely.
        if not current_node.children and current_node.complete_word is None:
            del parent_node.children[current_character]

    for row_index in range(row_count):
        for column_index in range(column_count):
            if board[row_index][column_index] in trie_root.children:
                backtrack(row_index, column_index, trie_root)

    return matched_words


class Solution:
    def findWords(self, board: List[List[str]], words: List[str]) -> List[str]:
        return find_dictionary_words(board, words)

Interview follow-ups

What changes if the interviewer first asks for Word Search I and then extends it to many words?

For a single target word, plain DFS and backtracking from each cell is enough, because the search only needs to follow one fixed string. The complexity is roughly O(rows * cols * 4^L) for that one word.

Once the problem changes to many words, repeating that single-word DFS for every dictionary entry becomes wasteful because the same prefixes get explored again and again. The trie-based solution fixes exactly that issue by merging shared prefixes into one structure and scanning the board once. The tradeoff is extra trie memory, but that is usually well worth it when the dictionary is large.

How would the solution change if the interviewer wanted the path coordinates for each found word?

The DFS can carry a current_path list of coordinates alongside the trie traversal. Each recursive step appends the current cell before exploring neighbors and pops it during backtracking. When a complete word is found, the algorithm copies the current path into the result for that word.

The search logic and correctness argument stay the same because the trie still controls which prefixes are worth exploring. The main tradeoff is output size: copying paths adds extra work proportional to the number of coordinates returned, and the result structure becomes larger than a simple word list.

What if diagonal moves are allowed too?

The overall design does not change. The DFS still walks the board and the trie together, but the neighbor list expands from four directions to eight. Correctness is preserved because the same invariant still holds: the current board path must match the current trie prefix, and each cell can still be marked as visited for the duration of one path.

The tradeoff is the branching factor. With diagonal moves, the worst-case search space grows substantially, so the practical value of trie prefix pruning becomes even more important.

Can the search be optimized further when the word list is huge but the board is small?

Yes. A useful prefilter is to count board characters first and discard any word that asks for some letter more times than the board contains it. Another small optimization is to skip inserting words whose first letter does not appear on the board at all.

These checks do not change the asymptotic big-picture idea, but they can reduce trie size and cut unnecessary DFS calls in practice. The tradeoff is a bit more preprocessing code, so they are best treated as optional enhancements rather than the core interview solution.