LeetCode Problem Workspace
Most Common Word
Find the most frequent non-banned word in a paragraph, excluding punctuation, using an efficient array scan and hash table approach.
4
Topics
6
Code langs
3
Related
Practice Focus
Easy · Array scanning plus hash lookup
Answer-first summary
Find the most frequent non-banned word in a paragraph, excluding punctuation, using an efficient array scan and hash table approach.
Ace coding interviews with Interview AiBoxInterview AiBox guidance for Array scanning plus hash lookup
This problem requires scanning a paragraph for the most frequent non-banned word. Words should be counted without considering case or punctuation. Using hash tables allows efficient tracking and exclusion of banned words, ultimately identifying the correct answer.
Problem Statement
Given a string paragraph and a string array banned, return the most frequent word from the paragraph that is not banned. Words in the paragraph are case-insensitive and punctuation marks should be ignored.
You can assume there is at least one word that is not banned, and the answer is unique. The result should be returned in lowercase, and no punctuation marks should be considered when determining the most frequent non-banned word.
Examples
Example 1
Input: paragraph = "Bob hit a ball, the hit BALL flew far after it was hit.", banned = ["hit"]
Output: "ball"
"hit" occurs 3 times, but it is a banned word. "ball" occurs twice (and no other word does), so it is the most frequent non-banned word in the paragraph. Note that words in the paragraph are not case sensitive, that punctuation is ignored (even if adjacent to words, such as "ball,"), and that "hit" isn't the answer even though it occurs more because it is banned.
Example 2
Input: paragraph = "a.", banned = []
Output: "a"
Example details omitted.
Constraints
- 1 <= paragraph.length <= 1000
- paragraph consists of English letters, space ' ', or one of the symbols: "!?',;.".
- 0 <= banned.length <= 100
- 1 <= banned[i].length <= 10
- banned[i] consists of only lowercase English letters.
Solution Approach
Array Scanning and Hash Table Lookup
The key to solving this problem efficiently is scanning through the paragraph array, normalizing each word by removing punctuation and converting it to lowercase. This allows us to maintain a hash table to count the frequency of words, skipping any banned words in the process.
Handling Punctuation and Case Sensitivity
It is crucial to normalize the input words. This means converting them to lowercase and removing punctuation marks. The re.sub() method can be helpful in this case to strip unwanted characters while processing the paragraph string.
Efficient Hash Table Updates
The hash table stores word counts, and when iterating through the words in the paragraph, check if they are in the banned list. If not, increment their count in the table. Finally, return the word with the highest count.
Complexity Analysis
| Metric | Value |
|---|---|
| Time | \mathcal{O}(N + M) |
| Space | \mathcal{O}(N + M) |
The time complexity of this solution is O(N + M), where N is the length of the paragraph and M is the number of banned words. The space complexity is O(N + M) because we store each word's frequency in the hash table and the banned list.
What Interviewers Usually Probe
- Evaluating the candidate's ability to manage edge cases like punctuation and case sensitivity.
- Assessing how well the candidate uses hash tables for efficient word frequency counting.
- Looking for the candidate's approach to ensuring that banned words are properly excluded from consideration.
Common Pitfalls or Variants
Common pitfalls
- Not properly removing punctuation from words, which could lead to incorrect frequency counts.
- Ignoring case sensitivity or failing to convert all words to lowercase, leading to mismatches in frequency counting.
- Not correctly handling the scenario where all words are banned or there are no banned words.
Follow-up variants
- Implement the solution in a case-sensitive manner, which requires additional logic to distinguish words based on case.
- Modify the problem to allow multiple words with the same frequency; return any one of them.
- Extend the problem to handle a larger range of punctuation marks and symbols.
FAQ
How can I efficiently handle punctuation in this problem?
You can use a regular expression to remove punctuation from each word before processing it, ensuring only valid characters are counted.
What is the time complexity of the solution?
The time complexity is O(N + M), where N is the length of the paragraph and M is the number of banned words.
How should case sensitivity be handled?
Convert all words to lowercase before processing to ensure case-insensitivity, which is a requirement of the problem.
What happens if all words in the paragraph are banned?
The problem guarantees that there is at least one non-banned word, so this case won't occur.
Can I use other data structures like arrays instead of hash tables?
While arrays can be used for simple counts, hash tables provide more efficient lookups and are better suited for this problem.
Solution
Solution 1
#### Python3
class Solution:
def mostCommonWord(self, paragraph: str, banned: List[str]) -> str:
s = set(banned)
p = Counter(re.findall('[a-z]+', paragraph.lower()))
return next(word for word, _ in p.most_common() if word not in s)Continue Topic
array
Practice more edge cases under the same topic.
arrow_forwardauto_awesomeContinue Pattern
Array scanning plus hash lookup
Expand the same solving frame across more problems.
arrow_forwardsignal_cellular_altSame Difficulty Track
Easy
Stay on this level to stabilize interview delivery.
arrow_forward