What is the best approach for Top K Frequent Words?

The most direct approach is to count each word with a hash map, then sort the unique words by descending frequency and ascending lexicographical order. It is easy to explain, easy to verify on examples, and makes the tie-break rule explicit.

Why does lexicographical order matter in this problem?

Because Top K Frequent Words does not allow arbitrary ordering when counts match. If two words appear the same number of times, the alphabetically smaller one must come first, so your comparator must encode that rule exactly.

Can I use a heap instead of sorting?

Yes. After counting, you can keep a heap of size k to track the best candidates. The tricky part is that the heap comparator often needs to treat lexicographically larger tied words as lower priority so they get removed first.

What pattern should I recognize here?

Recognize array scanning plus hash lookup, followed by ranking of unique keys. The scan builds frequencies in O(n), and the second phase decides the top k with either sorting or a heap.

Why do correct counts still lead to wrong answers sometimes?

Because the failure is often in the ordering step, not the counting step. A solution can count every word perfectly and still fail if the tie-break comparator is reversed for sorting or heap removal.

#692

Medium

auto_awesome数组·哈希·扫描

LeetCode 题解工作台

前K个高频单词

给定一个单词列表 words 和一个整数 k ，返回前 k 个出现次数最多的单词。返回的答案应该按单词出现频率由高到低排序。如果不同的单词有相同出现频率，按字典顺序排序。示例 1：输入: words = ["i", "love", "leetcode", "i", "love", "cod…

数组哈希表字符串字典树排序

题目描述

给定一个单词列表 words 和一个整数 k ，返回前 k 个出现次数最多的单词。

返回的答案应该按单词出现频率由高到低排序。如果不同的单词有相同出现频率， 按字典顺序 排序。

示例 1：

输入: words = ["i", "love", "leetcode", "i", "love", "coding"], k = 2
输出: ["i", "love"]
解析: "i" 和 "love" 为出现次数最多的两个单词，均为2次。
    注意，按字母顺序 "i" 在 "love" 之前。

示例 2：

输入: ["the", "day", "is", "sunny", "the", "the", "the", "sunny", "is", "is"], k = 4
输出: ["the", "is", "sunny", "day"]
解析: "the", "is", "sunny" 和 "day" 是出现次数最多的四个单词，
    出现次数依次为 4, 3, 2 和 1 次。

注意：

1 <= words.length <= 500
1 <= words[i].length <= 10
words[i] 由小写英文字母组成。
k 的取值范围是 [1, 不同 words[i] 的数量]

进阶：尝试以 O(n log k) 时间复杂度和 O(n) 空间复杂度解决。

lightbulb

解题思路

方法一：哈希表 + 排序

我们可以用一个哈希表 $\textit{cnt}$ 记录每一个单词出现的次数，然后对哈希表中的键值对按照值进行排序，如果值相同，按照键进行排序。

最后取出前 $k$ 个键即可。

时间复杂度 $O(n \times \log n)$ ，空间复杂度 $O(n)$ 。其中 $n$ 为单词的个数。

1

2

3

4

5

class Solution:
    def topKFrequent(self, words: List[str], k: int) -> List[str]:
        cnt = Counter(words)
        return sorted(cnt, key=lambda x: (-cnt[x], x))[:k]

speed

复杂度分析

指标	值
时间	Depends on the final approach
空间	Depends on the final approach

psychology

面试官常问的追问

外企场景

question_mark
They want to see whether you separate counting from ranking instead of trying to compare words during the initial scan.
question_mark
They are checking whether you implement the tie-break exactly: same frequency means lexicographically smaller word first.
question_mark
They may ask for a heap follow-up to test whether you understand why heap ordering for Top K Frequent Words can look opposite from final output order.

warning

常见陷阱

外企场景

error
Sorting tied words in descending alphabetical order, which breaks examples like "i" versus "love" immediately.
error
Building a min-heap with the same comparator as the final answer, then popping the wrong word when frequencies tie.
error
Forgetting that only unique words should be ranked after counting, which leads to unnecessary repeated comparisons across duplicates.

swap_horiz

进阶变体

外企场景

arrow_right_alt
Return the top k frequent numbers instead of words, which removes the lexicographical tie-break but keeps counting plus ranking.
arrow_right_alt
Return all words grouped by frequency, which shifts the task toward bucket organization after the hash count.
arrow_right_alt
Stream words one by one and keep the current top k, which turns Top K Frequent Words into an incremental heap maintenance problem.

help

常见问题

外企场景

继续练习

#451 根据字符出现频率排序 #347 前 K 个高频元素 #720 词典中最长的单词