LeetCode MEDIUM 139 Word Break Summary

Generated by Codex with GPT-5

Quick facts

Difficulty: MEDIUM
Problem: Word Break
Main tags: String, Hash Table, Dynamic Programming, Trie

What the problem is really asking

The input is a string s and a dictionary of valid words. The task is to decide whether s can be split into one or more dictionary words. Words may be reused, so the question is not about consuming the dictionary; it is about whether every character in the string can be covered by valid dictionary chunks.

For example, if s = "leetcode" and the dictionary contains "leet" and "code", the answer is true because the string can be split as "leet" + "code". If one middle portion cannot be matched by any dictionary word, the whole split fails.

The main trap is trying to greedily take the first word that fits. A local choice can easily block a later split even when another earlier word would have worked. The reliable way to solve the problem is to treat each index as a boundary and ask whether the prefix ending there is segmentable.

The dynamic programming idea

Define can_segment[end] as:

True if s[:end] can be built from dictionary words.

The base case is can_segment[0] = True, because the empty prefix before the first character is already fully segmented.

For every ending position end, try possible word lengths. If a word starts at start and ends at end, then the split works when both conditions are true:

s[start:end] is in the dictionary
can_segment[start] is already true

That means the prefix before the word is valid, and the current word covers the next chunk. Together, they make s[:end] valid.

How to derive it in an interview

Start with the final character of a valid split. If the full string can be segmented, then the last piece must be some dictionary word. Suppose that last word is s[start:n]. Everything before it, s[:start], must also be segmentable.

That gives the recurrence:

can_segment[end] = any(can_segment[start] and s[start:end] in word_set)

The only remaining implementation question is which start values to try. Instead of trying every start from 0 to end, the code below tries only lengths that actually appear in the dictionary. This keeps the same idea but avoids checking impossible substring sizes.

Python solution

from typing import List, Set


def can_break_into_words(text: str, dictionary_words: List[str]) -> bool:
    """
    Return True when text can be segmented into one or more dictionary words.

    The DP state can_segment[end] means text[:end] can be segmented. For each
    end index, only real dictionary word lengths are considered.
    """
    if text == "":
        return True

    word_set: Set[str] = set(dictionary_words)
    if not word_set:
        return False

    word_lengths = sorted({len(word) for word in word_set})
    can_segment = [False] * (len(text) + 1)
    can_segment[0] = True

    for end in range(1, len(text) + 1):
        for word_length in word_lengths:
            start = end - word_length
            if start < 0:
                break

            if can_segment[start] and text[start:end] in word_set:
                can_segment[end] = True
                break

    return can_segment[len(text)]


class Solution:
    def wordBreak(self, s: str, wordDict: List[str]) -> bool:
        return can_break_into_words(s, wordDict)

Why this works

The invariant is:

after processing index end, can_segment[end] correctly tells whether the prefix s[:end] can be split into dictionary words.

The base case is true because the empty prefix needs no words. For every later end, the algorithm checks whether some dictionary word can be the final word of the prefix. If such a word is s[start:end], then the prefix is valid exactly when s[:start] was already valid. That earlier result is stored in can_segment[start].

If the algorithm marks can_segment[end] true, it has found a valid previous prefix plus a valid final word. If it leaves the value false, then no possible dictionary word length produced a valid final split, so there is no valid segmentation ending there.

The runtime is O(n * u * L) in Python, where n is the string length, u is the number of unique word lengths, and L is the cost of slicing and hashing a checked substring. With LeetCode’s usual constraints, this is practical. The DP array uses O(n) extra space, and the dictionary set uses O(total dictionary characters) storage.

Interview follow-ups

How would you return one valid segmentation instead of only true or false?

Store a predecessor pointer whenever an index first becomes reachable. Instead of only setting can_segment[end] = True, also save previous_index[end] = start. After the DP finishes, walk backward from n through those saved starts and reverse the collected words.

This works because each saved pointer records a valid final word for a valid prefix. The runtime stays close to the boolean version because the same transitions are checked. The extra space is still O(n), plus the output words.

How would you return every possible sentence?

Use memoized DFS from each start index. For a given start, try every dictionary word that matches s[start:end], recursively collect all segmentations from end, and combine the current word with each suffix sentence.

Memoization is essential because many different prefixes can lead to the same suffix. It prevents recomputing the same suffix results over and over. The important tradeoff is that the output itself can be exponential, so no algorithm can avoid exponential time in the worst case when all valid sentences must be returned.

How would you optimize many repeated queries against the same dictionary?

Build a trie from the dictionary. For each reachable start index, walk forward through the trie character by character. Every time the trie reaches a terminal word, mark that end index reachable.

The trie avoids creating many substrings and lets the scan stop as soon as a character path is impossible. This is most useful when the same dictionary is reused across many strings or when the dictionary contains many words with shared prefixes. The tradeoff is extra implementation complexity and trie memory.

What if the interviewer asks for the minimum number of words?

Change the boolean DP into a shortest-path style DP. Let minimum_words[end] store the fewest dictionary words needed to build s[:end], with minimum_words[0] = 0. For each valid transition from start to end, update minimum_words[end] = min(minimum_words[end], minimum_words[start] + 1).

This works for the same reason as the boolean recurrence, but instead of remembering whether a prefix is reachable, it remembers the cheapest way to reach it. The runtime is the same order as the standard DP, and the space remains O(n).

Practical takeaway

The clean way to recognize Word Break is to think in prefix boundaries. A split is valid only if its final word is in the dictionary and everything before that word was already valid. Turning that sentence into can_segment[start] and s[start:end] in word_set gives the full dynamic programming solution.