LeetCode 题解工作台

最常见的单词

给你一个字符串 paragraph 和一个表示禁用词的字符串数组 banned ,返回出现频率最高的非禁用词。题目数据 保证 至少存在一个非禁用词,且答案 唯一 。 paragraph 中的单词 不区分大小写 ,答案应以 小写 形式返回。 注意 单词不包含标点符号。 示例 1: 输入: paragr…

category

4

题型

code_blocks

6

代码语言

hub

3

相关题

当前训练重点

简单 · 数组·哈希·扫描

bolt

答案摘要

正则匹配(或双指针)找出所有单词,用哈希表统计每个单词出现的频率,找到出现未在 banned 中出现且频率最大的单词。 class Solution:

Interview AiBox logo

Interview AiBox 实时 AI 助手,陪你讲清 数组·哈希·扫描 题型思路

试试 AiBox 面试助手arrow_forward
description

题目描述

给你一个字符串 paragraph 和一个表示禁用词的字符串数组 banned ,返回出现频率最高的非禁用词。题目数据 保证 至少存在一个非禁用词,且答案 唯一

paragraph 中的单词 不区分大小写 ,答案应以 小写 形式返回。

注意 单词不包含标点符号。

 

示例 1:

输入:paragraph = "Bob hit a ball, the hit BALL flew far after it was hit.", banned = ["hit"]
输出:"ball"
解释:
"hit" 出现了 3 次,但它是禁用词。
"ball" 出现了两次(没有其他单词出现这么多次),因此它是段落中出现频率最高的非禁用词。
请注意,段落中的单词不区分大小写,
标点符号会被忽略(即使它们紧挨着单词,如 "ball,"),
并且尽管 "hit" 出现的次数更多,但它不能作为答案,因为它是禁用词。

示例 2:

输入:paragraph = "a.", banned = []
输出:"a"

 

提示:

  • 1 <= paragraph.length <= 1000
  • paragraph 由英文字母、空格 ' '、和以下符号组成:"!?',;."
  • 0 <= banned.length <= 100
  • 1 <= banned[i].length <= 10
  • banned[i] 仅由小写英文字母组成
lightbulb

解题思路

方法一:正则匹配/双指针 + 哈希表

正则匹配(或双指针)找出所有单词,用哈希表统计每个单词出现的频率,找到出现未在 banned 中出现且频率最大的单词。

1
2
3
4
5
6
class Solution:
    def mostCommonWord(self, paragraph: str, banned: List[str]) -> str:
        s = set(banned)
        p = Counter(re.findall('[a-z]+', paragraph.lower()))
        return next(word for word, _ in p.most_common() if word not in s)
speed

复杂度分析

指标
时间\mathcal{O}(N + M)
空间\mathcal{O}(N + M)
psychology

面试官常问的追问

外企场景
  • question_mark

    Evaluating the candidate's ability to manage edge cases like punctuation and case sensitivity.

  • question_mark

    Assessing how well the candidate uses hash tables for efficient word frequency counting.

  • question_mark

    Looking for the candidate's approach to ensuring that banned words are properly excluded from consideration.

warning

常见陷阱

外企场景
  • error

    Not properly removing punctuation from words, which could lead to incorrect frequency counts.

  • error

    Ignoring case sensitivity or failing to convert all words to lowercase, leading to mismatches in frequency counting.

  • error

    Not correctly handling the scenario where all words are banned or there are no banned words.

swap_horiz

进阶变体

外企场景
  • arrow_right_alt

    Implement the solution in a case-sensitive manner, which requires additional logic to distinguish words based on case.

  • arrow_right_alt

    Modify the problem to allow multiple words with the same frequency; return any one of them.

  • arrow_right_alt

    Extend the problem to handle a larger range of punctuation marks and symbols.

help

常见问题

外企场景

最常见的单词题解:数组·哈希·扫描 | LeetCode #819 简单