LeetCode MEDIUM 249 Group Shifted Strings Summary

Generated by Codex with GPT-5

Difficulty: MEDIUM

Problem gist

Each string can be shifted forward through the alphabet, wrapping from z to a. For example, shifting "abc" once gives "bcd", and shifting "az" once gives "ba". The task is to group strings that can become each other by applying the same shift amount to every character.

The important constraint is that only relative spacing matters. "abc", "bcd", and "xyz" are in one group because their letters have the same shape: each character is one step after the previous one. "az" and "ba" are also in one group because wraparound keeps the same relative difference.

Optimal idea

Give every string a canonical signature that ignores its starting letter. Shift the first character to a, then shift every other character by the same amount. Strings with the same normalized pattern belong in the same group.

For "bcd", subtract the value of b from every character:

b -> a, c -> b, d -> c, so the signature is (0, 1, 2).

For "xyz", subtract the value of x:

x -> a, y -> b, z -> c, so the signature is also (0, 1, 2).

For "ba", subtract the value of b and wrap modulo 26:

b -> a, a -> z, so the signature is (0, 25), which matches "az".

Once the signature exists, the rest is a hash table problem: map each signature to the strings that share it.

How to derive it

Start by asking what should stay unchanged after a uniform shift. Absolute letters do not stay unchanged: "abc" and "bcd" use different letters. Distances between letters do stay unchanged modulo 26. That suggests either comparing adjacent differences or normalizing all characters relative to the first character.

Normalizing relative to the first character is usually easiest to explain and implement. The first character always becomes 0, and every later character becomes its circular distance from the first character. That signature is identical exactly when two strings are in the same shifting sequence.

The algorithm is:

For each string, compute a tuple of normalized character offsets.
Use that tuple as a dictionary key.
Append the original string to the matching group.
Return all dictionary values.

The runtime is O(total characters) because each character is processed once. The extra space is also O(total characters) for the grouped output and signatures.

Python solution

from collections import defaultdict
from typing import DefaultDict, List, Tuple


class Solution:
    def groupStrings(self, strings: List[str]) -> List[List[str]]:
        groups_by_signature: DefaultDict[Tuple[int, ...], List[str]] = defaultdict(list)

        for word in strings:
            signature = self._normalized_shift_signature(word)
            groups_by_signature[signature].append(word)

        return list(groups_by_signature.values())

    @staticmethod
    def _normalized_shift_signature(word: str) -> Tuple[int, ...]:
        if not word:
            return tuple()

        first_letter_offset = ord(word[0]) - ord("a")

        # Normalize every character as if the first character were shifted to "a".
        return tuple(
            (ord(character) - ord("a") - first_letter_offset) % 26
            for character in word
        )

Interview follow-ups

How can the solution use adjacent differences instead?

Instead of normalizing every character against the first character, compute the circular difference between each adjacent pair. "abc" becomes (1, 1), "bcd" also becomes (1, 1), and "az" becomes (25).

This works because a uniform shift changes every absolute character value by the same amount, so subtracting neighboring characters cancels out the shift. The implementation is just as efficient: O(total characters) time and O(total characters) space. The main tradeoff is readability. The first-character normalization often makes the group identity easier to visualize, while adjacent differences create slightly shorter signatures.

What if the interviewer asks for sorted output?

The grouping logic does not change. After building the hash map, sort each group lexicographically, then sort the list of groups by a chosen key such as the first string in each sorted group.

Sorting adds extra cost. If there are n strings, sorting the strings inside groups costs up to O(n log n) comparisons overall, and each string comparison may inspect characters. The hash signature pass remains linear in the total input size.

What if the alphabet is not limited to lowercase English letters?

Replace the fixed modulo 26 with the alphabet size, and replace ord(character) - ord("a") with a lookup table from character to alphabet index. The same invariant still works as long as the alphabet has a fixed circular order.

The algorithm remains linear after the lookup table is built. The main tradeoff is validation: production code should reject characters that are not in the configured alphabet, while interview code can usually state the input assumptions and keep the implementation focused.

What if the input is too large to keep fully in memory?

If the final output must contain every original string grouped together, the output itself may require large memory, so the problem cannot be solved with tiny memory in the worst case. The practical approach is to stream input strings, compute each signature, and write each (signature, string) pair to external storage. Then sort or partition by signature and emit one group at a time.

If the interviewer only needs group counts, not the actual grouped strings, memory can drop to O(number of distinct signatures) because each hash entry stores a count instead of a list.

How should empty strings or one-character strings be handled?

One-character strings all belong together because any single lowercase letter can shift to any other single lowercase letter. The normalization code gives each one-character string the signature (0).

Empty strings are not part of the usual LeetCode constraints, but robust code can still handle them with a distinct empty tuple signature. That keeps empty strings grouped with other empty strings and separate from one-character strings.