LeetCode Problem Workspace
Top K Frequent Words
Solve Top K Frequent Words by counting each word, then ordering ties alphabetically so frequency wins before lexicographic comparison.
8
Topics
5
Code langs
3
Related
Practice Focus
Medium · Array scanning plus hash lookup
Answer-first summary
Solve Top K Frequent Words by counting each word, then ordering ties alphabetically so frequency wins before lexicographic comparison.
Ace coding interviews with Interview AiBoxInterview AiBox guidance for Array scanning plus hash lookup
For Top K Frequent Words, the core move is to scan the array once, count occurrences with a hash map, then rank words by higher frequency and smaller lexicographical order. The interview trap is not the counting step but the comparator: ties must place alphabetically smaller words first. You can solve it cleanly with full sorting or use a heap when you want to cap work around the top k results.
Problem Statement
You are given a list of lowercase words and an integer k. Your job is to return the k words that appear most often in the array. The result cannot be in arbitrary order: words with larger counts must come first, and when two words have the same count, the alphabetically smaller word must appear earlier.
A quick check is words = ["i","love","leetcode","i","love","coding"] with k = 2. Both "i" and "love" appear twice, while the other words appear once, so the answer is ["i","love"] because the tie is broken by lexicographical order. That tie rule is the part that usually breaks otherwise correct Top K Frequent Words solutions.
Examples
Example 1
Input: words = ["i","love","leetcode","i","love","coding"], k = 2
Output: ["i","love"]
"i" and "love" are the two most frequent words. Note that "i" comes before "love" due to a lower alphabetical order.
Example 2
Input: words = ["the","day","is","sunny","the","the","the","sunny","is","is"], k = 4
Output: ["the","is","sunny","day"]
"the", "is", "sunny" and "day" are the four most frequent words, with the number of occurrence being 4, 3, 2 and 1 respectively.
Constraints
- 1 <= words.length <= 500
- 1 <= words[i].length <= 10
- words[i] consists of lowercase English letters.
- k is in the range [1, The number of unique words[i]]
Solution Approach
Count frequencies with one array pass
Start with the problem's simplest pattern: scan the words array once and store each word's frequency in a hash map. This gives direct lookup for every repeated word and turns the rest of the problem into ranking unique words instead of repeatedly rescanning the full array.
Sort by frequency first, lexicographical order second
Take the unique words and sort them with a comparator that puts larger counts first. When two counts match, compare the strings directly so the alphabetically smaller word comes first. In Top K Frequent Words, this comparator is the real logic, because a reversed tie-break silently returns the wrong answer even when the counts are correct.
Use a heap when you want to focus on top k
A heap approach is also valid after counting frequencies. Maintain ordering so lower-priority entries are removed first, which means you must be extra careful because the heap comparator often looks inverted compared with the final output order. This version helps when discussing top-k extraction, but for LeetCode 692 many candidates choose sorting because the tie rule is easier to express correctly.
Complexity Analysis
| Metric | Value |
|---|---|
| Time | Depends on the final approach |
| Space | Depends on the final approach |
Let u be the number of unique words. Counting takes O(n) time with O(u) space. A full sort of unique words costs O(u log u), so the total is O(n + u log u) time and O(u) space. With a heap, a common version runs in O(n + u log k) time with O(u) counting space plus O(k) heap space, but the comparator is easier to get wrong because lexicographical ties must be reversed correctly inside removal order.
What Interviewers Usually Probe
- They want to see whether you separate counting from ranking instead of trying to compare words during the initial scan.
- They are checking whether you implement the tie-break exactly: same frequency means lexicographically smaller word first.
- They may ask for a heap follow-up to test whether you understand why heap ordering for Top K Frequent Words can look opposite from final output order.
Common Pitfalls or Variants
Common pitfalls
- Sorting tied words in descending alphabetical order, which breaks examples like "i" versus "love" immediately.
- Building a min-heap with the same comparator as the final answer, then popping the wrong word when frequencies tie.
- Forgetting that only unique words should be ranked after counting, which leads to unnecessary repeated comparisons across duplicates.
Follow-up variants
- Return the top k frequent numbers instead of words, which removes the lexicographical tie-break but keeps counting plus ranking.
- Return all words grouped by frequency, which shifts the task toward bucket organization after the hash count.
- Stream words one by one and keep the current top k, which turns Top K Frequent Words into an incremental heap maintenance problem.
FAQ
What is the best approach for Top K Frequent Words?
The most direct approach is to count each word with a hash map, then sort the unique words by descending frequency and ascending lexicographical order. It is easy to explain, easy to verify on examples, and makes the tie-break rule explicit.
Why does lexicographical order matter in this problem?
Because Top K Frequent Words does not allow arbitrary ordering when counts match. If two words appear the same number of times, the alphabetically smaller one must come first, so your comparator must encode that rule exactly.
Can I use a heap instead of sorting?
Yes. After counting, you can keep a heap of size k to track the best candidates. The tricky part is that the heap comparator often needs to treat lexicographically larger tied words as lower priority so they get removed first.
What pattern should I recognize here?
Recognize array scanning plus hash lookup, followed by ranking of unique keys. The scan builds frequencies in O(n), and the second phase decides the top k with either sorting or a heap.
Why do correct counts still lead to wrong answers sometimes?
Because the failure is often in the ordering step, not the counting step. A solution can count every word perfectly and still fail if the tie-break comparator is reversed for sorting or heap removal.
Solution
Solution 1: Hash Table + Sorting
We can use a hash table $\textit{cnt}$ to record the frequency of each word. Then, we sort the key-value pairs in the hash table by value, and if the values are the same, we sort by key.
class Solution:
def topKFrequent(self, words: List[str], k: int) -> List[str]:
cnt = Counter(words)
return sorted(cnt, key=lambda x: (-cnt[x], x))[:k]Continue Topic
array
Practice more edge cases under the same topic.
arrow_forwardauto_awesomeContinue Pattern
Array scanning plus hash lookup
Expand the same solving frame across more problems.
arrow_forwardsignal_cellular_altSame Difficulty Track
Medium
Stay on this level to stabilize interview delivery.
arrow_forward