#819

Easy

auto_awesomeArray scanning plus hash lookup

LeetCode Problem Workspace

Most Common Word

Find the most frequent non-banned word in a paragraph, excluding punctuation, using an efficient array scan and hash table approach.

array hash table string counting

Problem Statement

Given a string paragraph and a string array banned, return the most frequent word from the paragraph that is not banned. Words in the paragraph are case-insensitive and punctuation marks should be ignored.

You can assume there is at least one word that is not banned, and the answer is unique. The result should be returned in lowercase, and no punctuation marks should be considered when determining the most frequent non-banned word.

Examples

Example 1

Input: paragraph = "Bob hit a ball, the hit BALL flew far after it was hit.", banned = ["hit"]

Output: "ball"

"hit" occurs 3 times, but it is a banned word. "ball" occurs twice (and no other word does), so it is the most frequent non-banned word in the paragraph. Note that words in the paragraph are not case sensitive, that punctuation is ignored (even if adjacent to words, such as "ball,"), and that "hit" isn't the answer even though it occurs more because it is banned.

Example 2

Input: paragraph = "a.", banned = []

Output: "a"

Example details omitted.

Constraints

1 <= paragraph.length <= 1000
paragraph consists of English letters, space ' ', or one of the symbols: "!?',;.".
0 <= banned.length <= 100
1 <= banned[i].length <= 10
banned[i] consists of only lowercase English letters.

Solution Approach

Array Scanning and Hash Table Lookup

The key to solving this problem efficiently is scanning through the paragraph array, normalizing each word by removing punctuation and converting it to lowercase. This allows us to maintain a hash table to count the frequency of words, skipping any banned words in the process.

Handling Punctuation and Case Sensitivity

It is crucial to normalize the input words. This means converting them to lowercase and removing punctuation marks. The re.sub() method can be helpful in this case to strip unwanted characters while processing the paragraph string.

Efficient Hash Table Updates

The hash table stores word counts, and when iterating through the words in the paragraph, check if they are in the banned list. If not, increment their count in the table. Finally, return the word with the highest count.

Complexity Analysis

Metric	Value
Time	\mathcal{O}(N + M)
Space	\mathcal{O}(N + M)

The time complexity of this solution is O(N + M), where N is the length of the paragraph and M is the number of banned words. The space complexity is O(N + M) because we store each word's frequency in the hash table and the banned list.

What Interviewers Usually Probe

Evaluating the candidate's ability to manage edge cases like punctuation and case sensitivity.
Assessing how well the candidate uses hash tables for efficient word frequency counting.
Looking for the candidate's approach to ensuring that banned words are properly excluded from consideration.

Common Pitfalls or Variants

Common pitfalls

Not properly removing punctuation from words, which could lead to incorrect frequency counts.
Ignoring case sensitivity or failing to convert all words to lowercase, leading to mismatches in frequency counting.
Not correctly handling the scenario where all words are banned or there are no banned words.

Follow-up variants

Implement the solution in a case-sensitive manner, which requires additional logic to distinguish words based on case.
Modify the problem to allow multiple words with the same frequency; return any one of them.
Extend the problem to handle a larger range of punctuation marks and symbols.

FAQ

How can I efficiently handle punctuation in this problem?

You can use a regular expression to remove punctuation from each word before processing it, ensuring only valid characters are counted.

What is the time complexity of the solution?

The time complexity is O(N + M), where N is the length of the paragraph and M is the number of banned words.

How should case sensitivity be handled?

Convert all words to lowercase before processing to ensure case-insensitivity, which is a requirement of the problem.

What happens if all words in the paragraph are banned?

The problem guarantees that there is at least one non-banned word, so this case won't occur.

Can I use other data structures like arrays instead of hash tables?

While arrays can be used for simple counts, hash tables provide more efficient lookups and are better suited for this problem.

terminal

Solution

Solution 1

#### Python3

1

2

3

4

5

class Solution:
    def mostCommonWord(self, paragraph: str, banned: List[str]) -> str:
        s = set(banned)
        p = Counter(re.findall('[a-z]+', paragraph.lower()))
        return next(word for word, _ in p.most_common() if word not in s)

Continue Practicing

#811 subdomain visit count #692 top k frequent words #1160 find words that can be formed by characters