#692

Medium

auto_awesomeArray scanning plus hash lookup

LeetCode Problem Workspace

Top K Frequent Words

Solve Top K Frequent Words by counting each word, then ordering ties alphabetically so frequency wins before lexicographic comparison.

array hash table string trie sorting

Problem Statement

You are given a list of lowercase words and an integer k. Your job is to return the k words that appear most often in the array. The result cannot be in arbitrary order: words with larger counts must come first, and when two words have the same count, the alphabetically smaller word must appear earlier.

A quick check is words = ["i","love","leetcode","i","love","coding"] with k = 2. Both "i" and "love" appear twice, while the other words appear once, so the answer is ["i","love"] because the tie is broken by lexicographical order. That tie rule is the part that usually breaks otherwise correct Top K Frequent Words solutions.

Examples

Example 1

Input: words = ["i","love","leetcode","i","love","coding"], k = 2

Output: ["i","love"]

"i" and "love" are the two most frequent words. Note that "i" comes before "love" due to a lower alphabetical order.

Example 2

Input: words = ["the","day","is","sunny","the","the","the","sunny","is","is"], k = 4

Output: ["the","is","sunny","day"]

"the", "is", "sunny" and "day" are the four most frequent words, with the number of occurrence being 4, 3, 2 and 1 respectively.

Constraints

1 <= words.length <= 500
1 <= words[i].length <= 10
words[i] consists of lowercase English letters.
k is in the range [1, The number of unique words[i]]

Solution Approach

Count frequencies with one array pass

Start with the problem's simplest pattern: scan the words array once and store each word's frequency in a hash map. This gives direct lookup for every repeated word and turns the rest of the problem into ranking unique words instead of repeatedly rescanning the full array.

Sort by frequency first, lexicographical order second

Take the unique words and sort them with a comparator that puts larger counts first. When two counts match, compare the strings directly so the alphabetically smaller word comes first. In Top K Frequent Words, this comparator is the real logic, because a reversed tie-break silently returns the wrong answer even when the counts are correct.

Use a heap when you want to focus on top k

A heap approach is also valid after counting frequencies. Maintain ordering so lower-priority entries are removed first, which means you must be extra careful because the heap comparator often looks inverted compared with the final output order. This version helps when discussing top-k extraction, but for LeetCode 692 many candidates choose sorting because the tie rule is easier to express correctly.

Complexity Analysis

Metric	Value
Time	Depends on the final approach
Space	Depends on the final approach

Let u be the number of unique words. Counting takes O(n) time with O(u) space. A full sort of unique words costs O(u log u), so the total is O(n + u log u) time and O(u) space. With a heap, a common version runs in O(n + u log k) time with O(u) counting space plus O(k) heap space, but the comparator is easier to get wrong because lexicographical ties must be reversed correctly inside removal order.

What Interviewers Usually Probe

They want to see whether you separate counting from ranking instead of trying to compare words during the initial scan.
They are checking whether you implement the tie-break exactly: same frequency means lexicographically smaller word first.
They may ask for a heap follow-up to test whether you understand why heap ordering for Top K Frequent Words can look opposite from final output order.

Common Pitfalls or Variants

Common pitfalls

Sorting tied words in descending alphabetical order, which breaks examples like "i" versus "love" immediately.
Building a min-heap with the same comparator as the final answer, then popping the wrong word when frequencies tie.
Forgetting that only unique words should be ranked after counting, which leads to unnecessary repeated comparisons across duplicates.

Follow-up variants

Return the top k frequent numbers instead of words, which removes the lexicographical tie-break but keeps counting plus ranking.
Return all words grouped by frequency, which shifts the task toward bucket organization after the hash count.
Stream words one by one and keep the current top k, which turns Top K Frequent Words into an incremental heap maintenance problem.

FAQ

What is the best approach for Top K Frequent Words?

The most direct approach is to count each word with a hash map, then sort the unique words by descending frequency and ascending lexicographical order. It is easy to explain, easy to verify on examples, and makes the tie-break rule explicit.

Why does lexicographical order matter in this problem?

Because Top K Frequent Words does not allow arbitrary ordering when counts match. If two words appear the same number of times, the alphabetically smaller one must come first, so your comparator must encode that rule exactly.

Can I use a heap instead of sorting?

Yes. After counting, you can keep a heap of size k to track the best candidates. The tricky part is that the heap comparator often needs to treat lexicographically larger tied words as lower priority so they get removed first.

What pattern should I recognize here?

Recognize array scanning plus hash lookup, followed by ranking of unique keys. The scan builds frequencies in O(n), and the second phase decides the top k with either sorting or a heap.

Why do correct counts still lead to wrong answers sometimes?

Because the failure is often in the ordering step, not the counting step. A solution can count every word perfectly and still fail if the tie-break comparator is reversed for sorting or heap removal.

terminal

Solution

Solution 1: Hash Table + Sorting

We can use a hash table $\textit{cnt}$ to record the frequency of each word. Then, we sort the key-value pairs in the hash table by value, and if the values are the same, we sort by key.

1

2

3

4

class Solution:
    def topKFrequent(self, words: List[str], k: int) -> List[str]:
        cnt = Counter(words)
        return sorted(cnt, key=lambda x: (-cnt[x], x))[:k]

Continue Practicing

#451 sort characters by frequency #347 top k frequent elements #720 longest word in dictionary