Generated by Codex with GPT-5
Quick facts
- Difficulty:
MEDIUM - Problem: Group Anagrams
- Main tags:
Array,Hash Table,String,Sorting
What the problem is really asking
Two strings belong in the same group if they use the exact same letters with the exact same counts, just in a different order.
So the problem is not really about comparing every string against every other string. It is about finding a stable “signature” for each word, so all words with the same signature land in the same bucket.
For example:
"eat","tea", and"ate"should share one bucket"tan"and"nat"should share another"bat"should stand alone
Once that idea is clear, the rest of the problem becomes a grouping exercise.
Why the obvious approach is wasteful
The brute-force instinct is to compare each string with every other string and ask, “Are these anagrams?”
That works, but it repeats far too much work:
- there are
O(n^2)string pairs - each comparison needs extra logic to verify matching letters
This is the wrong direction because anagrams are an equivalence-class problem. Instead of repeatedly checking pairs, it is much better to compute one canonical key per string and group by that key.
Best solutions
There are two common ways to build the grouping key.
Solution 1: sorted-string signature
Sort the letters inside each word. Every anagram becomes the same sorted string:
"eat"->"aet""tea"->"aet""ate"->"aet"
Then use a hash map from sorted string to list of original words.
This is very easy to explain and implement:
- Time:
O(n * k log k) - Space:
O(n * k)
Here, n is the number of strings and k is the maximum string length.
Solution 2: letter-count signature
If the input only contains lowercase English letters, there is an even better key.
Build a 26-length count array for each word:
- count how many
'a','b','c', and so on appear - convert that count array into a tuple so it can be used as a dictionary key
Now two words land in the same group exactly when their 26 counts match.
This avoids sorting each word:
- Time:
O(n * k) - Space:
O(n * k)
This count-signature approach is the optimal solution for the usual LeetCode constraints.
How to derive the optimal solution
A practical way to derive the final algorithm is:
- Notice that order does not matter, only letter multiset matters.
- Ask what representation stays identical for all anagrams.
- Try sorted letters first, because it is the most direct canonical form.
- Notice sorting is extra work, and replace it with raw letter frequencies.
- Group strings in a hash map keyed by that frequency signature.
That line of thinking turns the problem from “compare strings” into “compute a key once and append to a bucket.”
Python solution
from collections import defaultdict
from typing import DefaultDict, List, Tuple
LetterSignature = Tuple[int, ...]
def build_letter_signature(word: str) -> LetterSignature:
"""Return a hashable 26-letter frequency signature for a lowercase word."""
letter_counts = [0] * 26
for character in word:
character_index = ord(character) - ord("a")
letter_counts[character_index] += 1
return tuple(letter_counts)
def group_anagrams(words: List[str]) -> List[List[str]]:
"""Group words that share the same letter-frequency signature."""
words_by_signature: DefaultDict[LetterSignature, List[str]] = defaultdict(list)
for word in words:
signature = build_letter_signature(word)
words_by_signature[signature].append(word)
return list(words_by_signature.values())
class Solution:
def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
return group_anagrams(strs)Why this works
The helper computes a signature that records the exact number of times each letter appears.
That means:
- if two words are anagrams, they must produce the same signature
- if two words produce the same signature, they must be anagrams
So the signature is both necessary and sufficient for grouping. The dictionary simply collects all words with the same signature into the same output list.
Practical takeaway
This problem becomes easy once it is reframed as:
- find a canonical representation
- use that representation as a hash-map key
- group instead of pairwise compare
If the interviewer wants the simplest version, use sorted strings. If they want the optimal version, use the 26-count signature.