LeetCode MEDIUM 49 Group Anagrams Summary

Generated by Codex with GPT-5

Quick facts

Difficulty: MEDIUM
Problem: Group Anagrams
Main tags: Array, Hash Table, String, Sorting

What the problem is really asking

Two strings belong in the same group if they use the exact same letters with the exact same counts, just in a different order.

So the problem is not really about comparing every string against every other string. It is about finding a stable “signature” for each word, so all words with the same signature land in the same bucket.

For example:

"eat", "tea", and "ate" should share one bucket
"tan" and "nat" should share another
"bat" should stand alone

Once that idea is clear, the rest of the problem becomes a grouping exercise.

Why the obvious approach is wasteful

The brute-force instinct is to compare each string with every other string and ask, “Are these anagrams?”

That works, but it repeats far too much work:

there are O(n^2) string pairs
each comparison needs extra logic to verify matching letters

This is the wrong direction because anagrams are an equivalence-class problem. Instead of repeatedly checking pairs, it is much better to compute one canonical key per string and group by that key.

Best solutions

There are two common ways to build the grouping key.

Solution 1: sorted-string signature

Sort the letters inside each word. Every anagram becomes the same sorted string:

"eat" -> "aet"
"tea" -> "aet"
"ate" -> "aet"

Then use a hash map from sorted string to list of original words.

This is very easy to explain and implement:

Time: O(n * k log k)
Space: O(n * k)

Here, n is the number of strings and k is the maximum string length.

Solution 2: letter-count signature

If the input only contains lowercase English letters, there is an even better key.

Build a 26-length count array for each word:

count how many 'a', 'b', 'c', and so on appear
convert that count array into a tuple so it can be used as a dictionary key

Now two words land in the same group exactly when their 26 counts match.

This avoids sorting each word:

Time: O(n * k)
Space: O(n * k)

This count-signature approach is the optimal solution for the usual LeetCode constraints.

How to derive the optimal solution

A practical way to derive the final algorithm is:

Notice that order does not matter, only letter multiset matters.
Ask what representation stays identical for all anagrams.
Try sorted letters first, because it is the most direct canonical form.
Notice sorting is extra work, and replace it with raw letter frequencies.
Group strings in a hash map keyed by that frequency signature.

That line of thinking turns the problem from “compare strings” into “compute a key once and append to a bucket.”

Python solution

from collections import defaultdict
from typing import DefaultDict, List, Tuple


LetterSignature = Tuple[int, ...]


def build_letter_signature(word: str) -> LetterSignature:
    """Return a hashable 26-letter frequency signature for a lowercase word."""
    letter_counts = [0] * 26

    for character in word:
        character_index = ord(character) - ord("a")
        letter_counts[character_index] += 1

    return tuple(letter_counts)


def group_anagrams(words: List[str]) -> List[List[str]]:
    """Group words that share the same letter-frequency signature."""
    words_by_signature: DefaultDict[LetterSignature, List[str]] = defaultdict(list)

    for word in words:
        signature = build_letter_signature(word)
        words_by_signature[signature].append(word)

    return list(words_by_signature.values())


class Solution:
    def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
        return group_anagrams(strs)

Why this works

The helper computes a signature that records the exact number of times each letter appears.

That means:

if two words are anagrams, they must produce the same signature
if two words produce the same signature, they must be anagrams

So the signature is both necessary and sufficient for grouping. The dictionary simply collects all words with the same signature into the same output list.

Practical takeaway

This problem becomes easy once it is reframed as:

find a canonical representation
use that representation as a hash-map key
group instead of pairwise compare

If the interviewer wants the simplest version, use sorted strings. If they want the optimal version, use the 26-count signature.