LeetCode 题解工作台
最常见的单词
给你一个字符串 paragraph 和一个表示禁用词的字符串数组 banned ,返回出现频率最高的非禁用词。题目数据 保证 至少存在一个非禁用词,且答案 唯一 。 paragraph 中的单词 不区分大小写 ,答案应以 小写 形式返回。 注意 单词不包含标点符号。 示例 1: 输入: paragr…
4
题型
6
代码语言
3
相关题
当前训练重点
简单 · 数组·哈希·扫描
答案摘要
正则匹配(或双指针)找出所有单词,用哈希表统计每个单词出现的频率,找到出现未在 banned 中出现且频率最大的单词。 class Solution:
Interview AiBoxInterview AiBox 实时 AI 助手,陪你讲清 数组·哈希·扫描 题型思路
题目描述
给你一个字符串 paragraph 和一个表示禁用词的字符串数组 banned ,返回出现频率最高的非禁用词。题目数据 保证 至少存在一个非禁用词,且答案 唯一 。
paragraph 中的单词 不区分大小写 ,答案应以 小写 形式返回。
注意 单词不包含标点符号。
示例 1:
输入:paragraph = "Bob hit a ball, the hit BALL flew far after it was hit.", banned = ["hit"] 输出:"ball" 解释: "hit" 出现了 3 次,但它是禁用词。 "ball" 出现了两次(没有其他单词出现这么多次),因此它是段落中出现频率最高的非禁用词。 请注意,段落中的单词不区分大小写, 标点符号会被忽略(即使它们紧挨着单词,如 "ball,"), 并且尽管 "hit" 出现的次数更多,但它不能作为答案,因为它是禁用词。
示例 2:
输入:paragraph = "a.", banned = [] 输出:"a"
提示:
1 <= paragraph.length <= 1000paragraph由英文字母、空格' '、和以下符号组成:"!?',;."0 <= banned.length <= 1001 <= banned[i].length <= 10banned[i]仅由小写英文字母组成
解题思路
方法一:正则匹配/双指针 + 哈希表
正则匹配(或双指针)找出所有单词,用哈希表统计每个单词出现的频率,找到出现未在 banned 中出现且频率最大的单词。
class Solution:
def mostCommonWord(self, paragraph: str, banned: List[str]) -> str:
s = set(banned)
p = Counter(re.findall('[a-z]+', paragraph.lower()))
return next(word for word, _ in p.most_common() if word not in s)
复杂度分析
| 指标 | 值 |
|---|---|
| 时间 | \mathcal{O}(N + M) |
| 空间 | \mathcal{O}(N + M) |
面试官常问的追问
外企场景- question_mark
Evaluating the candidate's ability to manage edge cases like punctuation and case sensitivity.
- question_mark
Assessing how well the candidate uses hash tables for efficient word frequency counting.
- question_mark
Looking for the candidate's approach to ensuring that banned words are properly excluded from consideration.
常见陷阱
外企场景- error
Not properly removing punctuation from words, which could lead to incorrect frequency counts.
- error
Ignoring case sensitivity or failing to convert all words to lowercase, leading to mismatches in frequency counting.
- error
Not correctly handling the scenario where all words are banned or there are no banned words.
进阶变体
外企场景- arrow_right_alt
Implement the solution in a case-sensitive manner, which requires additional logic to distinguish words based on case.
- arrow_right_alt
Modify the problem to allow multiple words with the same frequency; return any one of them.
- arrow_right_alt
Extend the problem to handle a larger range of punctuation marks and symbols.