LeetCode HARD 10 Regular Expression Matching Summary

Generated by Codex with GPT-5

Quick facts

Difficulty: HARD
Problem: Regular Expression Matching
Main tags: String, Dynamic Programming, Recursion

What the problem is really asking

This problem is not asking for a full regex engine.

It asks for one very specific kind of full-string match:

. matches exactly one arbitrary character
* means “repeat the immediately previous token zero or more times”
the whole string must match the whole pattern

That last rule is the detail that makes many first attempts fail. A match is not enough if there are leftover characters in either the text or the pattern.

So the real question is:

can the suffix of s starting at position i be matched by the suffix of p starting at position j?

Once the problem is phrased that way, it becomes a dynamic programming problem over pairs of indices.

Why brute force becomes expensive

The natural recursive idea is correct:

if the current characters match, move both pointers
if the next pattern token is *, either skip that token entirely or use it to consume one character

The problem is that the same suffix pair gets recomputed many times.

For example, different recursive paths can easily end up asking whether s[i:] matches p[j:]. Without caching, the search tree branches heavily and can grow exponentially. That repeated work is the signal that memoization or tabulation is the right fix.

The key DP insight

Define matches_from(i, j) as:

whether s[i:] matches p[j:]

Then every state has only a few cases.

If j has reached the end of the pattern, the answer is true only if i has also reached the end of the string.

Otherwise, first decide whether the current characters are compatible:

current_char_matches = i < len(s) and (p[j] == s[i] or p[j] == ".")

Now check whether the next pattern character is *.

If it is, there are exactly two valid choices:

Skip x* entirely and move to j + 2
If the current character matches x, consume one character from the string and stay at j

That “stay at j” part is the important insight. Staying there means the same x* token is still available to match more copies.

If there is no *, then the only option is a one-character match and advancing both pointers.

Two optimal ways to implement it

There are two standard O(m * n) solutions, where m is the string length and n is the pattern length.

The cleanest interview answer is top-down DP with memoization. It mirrors the problem statement directly, and the cache guarantees that each (i, j) pair is solved once.

The equally correct alternative is bottom-up DP. That version fills a table where dp[i][j] means the same thing as matches_from(i, j). It avoids recursion, but many candidates find the top-down version easier to derive and explain under time pressure.

How to derive the solution in an interview

A strong interview path is:

State clearly that this is full-string matching, not substring search.
Write the recursive meaning of matches_from(i, j).
Notice that * creates branching: skip or consume.
Point out that the same suffix pairs repeat, so a cache turns the recursion into DP.
Finish by giving the O(m * n) time and O(m * n) space bound for the memo table.

That sequence shows both problem-solving and control of the complexity argument.

Why the recurrence is correct

Every valid match of s[i:] against p[j:] must start in one of two ways.

If p[j + 1] is *, then either the x* token is used zero times, or it is used at least once. There is no third possibility. Skipping it corresponds to matches_from(i, j + 2). Using it once is only legal when the current characters match, and after consuming one character from the string the same x* token may still be used again, which gives matches_from(i + 1, j).

If p[j + 1] is not *, then the first characters must match and both pointers must advance together. Again, there is no other legal move.

Because the recurrence covers all legal first moves and only legal first moves, the DP is correct.

Python solution

The implementation below uses memoized depth-first search because it is the clearest way to encode the recurrence. The helper names are explicit, the * behavior is isolated in one decision point, and comments call out the two branches that matter.

from functools import lru_cache


def characters_match(text: str, pattern: str, text_index: int, pattern_index: int) -> bool:
    """Return whether the current text and pattern characters are compatible."""
    if text_index >= len(text):
        return False
    return pattern[pattern_index] == "." or text[text_index] == pattern[pattern_index]


def is_regular_expression_match(text: str, pattern: str) -> bool:
    """
    Return whether the entire text matches the entire pattern.

    Supported pattern operators:
    - '.' matches any single character
    - '*' repeats the previous token zero or more times
    """

    @lru_cache(maxsize=None)
    def matches_from(text_index: int, pattern_index: int) -> bool:
        # A pattern can match the text only if both are fully consumed.
        if pattern_index == len(pattern):
            return text_index == len(text)

        current_char_matches = characters_match(
            text=text,
            pattern=pattern,
            text_index=text_index,
            pattern_index=pattern_index,
        )

        next_token_is_star = (
            pattern_index + 1 < len(pattern) and pattern[pattern_index + 1] == "*"
        )

        if next_token_is_star:
            # Option 1: skip the "x*" token completely.
            skip_star_token = matches_from(text_index, pattern_index + 2)

            # Option 2: consume one matching character and keep the same pattern token.
            consume_one_character = current_char_matches and matches_from(
                text_index + 1,
                pattern_index,
            )

            return skip_star_token or consume_one_character

        return current_char_matches and matches_from(
            text_index + 1,
            pattern_index + 1,
        )

    return matches_from(0, 0)


class Solution:
    def isMatch(self, s: str, p: str) -> bool:
        """LeetCode entry point."""
        return is_regular_expression_match(text=s, pattern=p)

This runs in O(m * n) time because there are only m + 1 possible text positions and n + 1 possible pattern positions, and each state is computed once. The memo table also uses O(m * n) space.

Interview follow-ups

How would you write this bottom-up instead of top-down?

Use a table where dp[i][j] means whether s[i:] matches p[j:]. Fill the table from the end toward the front so that the states needed by the recurrence are already known when each cell is computed. The * case uses the same two choices as the memoized recursion: skip x*, or consume one character and stay on the same pattern token. This works because the bottom-up table is just the memoized state graph written explicitly. The complexity stays O(m * n) time and O(m * n) space. The tradeoff is mostly readability: bottom-up avoids recursion depth issues, but many people find it less intuitive to derive.

What if recursion depth is a concern?

Then the bottom-up DP is the safest answer. Python’s recursion limit can become a practical problem if the input is large enough, even though the algorithmic complexity is still fine. An iterative table removes that runtime risk while preserving the same recurrence and correctness argument. The tradeoff is implementation overhead: the memoized DFS is often shorter and easier to explain, while the iterative table is more operationally robust.

How would you support `+` and `?` in the pattern?

Keep the same DP-state idea, but add transitions for the new operators. For x?, the choices are “skip this token” or “use it exactly once.” For x+, the choices are “use it once and move on” or “use it once and stay on the same token so more copies can be consumed.” This works because the core DP state is still “which text suffix and which pattern suffix remain.” The complexity remains polynomial in the lengths of the text and pattern as long as the operator set stays local and the recurrence only depends on nearby pattern positions. The tradeoff is that the transition logic gets more detailed, so clear helper functions become more important.

What changes if the problem is substring matching instead of full-string matching?

Then the acceptance condition changes. In the current problem, success requires consuming both the text and the pattern completely. For substring matching, the engine would need to try starting the pattern at multiple text positions, or equivalently allow unmatched text before and after a successful local match. The DP core still works, but the outer definition of success is different. That makes the original problem simpler than a general regex search engine, which is why this LeetCode version is manageable with straightforward dynamic programming.