What is the primary approach for solving the Number of Matching Subsequences problem?

The primary approach is to combine array scanning with hash lookup, allowing for efficient subsequence matching in string `s`.

How do you avoid brute-force solutions in this problem?

You avoid brute force by using a hash table or binary search, significantly reducing the time complexity when checking if a word is a subsequence.

Can this problem be solved with a trie?

Yes, a trie-based approach can be used to store the words and scan string `s`, offering faster subsequence matching.

What are the time and space complexities for this problem?

The time complexity is O(N + M) with hash lookup, where N is the length of `s` and M is the total length of all words. The space complexity is O(M).

What is the key pattern in solving this problem?

The key pattern is using efficient array scanning combined with hash lookup to determine subsequences in a time-efficient manner.

#792

Medium

auto_awesome数组·哈希·扫描

LeetCode 题解工作台

匹配子序列的单词数

给定字符串 s 和字符串数组 words , 返回 words[i] 中是 s 的子序列的单词个数。字符串的子序列是从原始字符串中生成的新字符串，可以从中删去一些字符(可以是none)，而不改变其余字符的相对顺序。例如， “ace” 是 “abcde” 的子序列。示例 1: 输入: s …

数组哈希表字符串二分查找动态规划

题目描述

给定字符串 s 和字符串数组 words, 返回 words[i] 中是s的子序列的单词个数 。

字符串的 子序列 是从原始字符串中生成的新字符串，可以从中删去一些字符(可以是none)，而不改变其余字符的相对顺序。

例如， “ace” 是 “abcde” 的子序列。

示例 1:

输入: s = "abcde", words = ["a","bb","acd","ace"]
输出: 3
解释: 有三个是 s 的子序列的单词: "a", "acd", "ace"。

Example 2:

输入: s = "dsahjpjauf", words = ["ahjpjau","ja","ahbwzgqnuk","tnmlanowax"]
输出: 2

提示:

1 <= s.length <= 5 * 10⁴
1 <= words.length <= 5000
1 <= words[i].length <= 50
words[i]和 s 都只由小写字母组成。

lightbulb

解题思路

方法一：分桶

题目中字符串 $s$ 的数据规模最高达到 $5 \times 10^4$ ，如果暴力枚举 $words$ 中的每个字符串 $w$ ，判断其是否为 $s$ 的子序列，很有可能会超时。

我们不妨将 $words$ 中的所有单词根据首字母来分桶，即：把所有单词按照首字母分到 $26$ 个桶中，每个桶中存储的是所有以该字母开头的所有单词。

比如对于 words = ["a", "bb", "acd", "ace"]，我们得到以下的分桶结果：

a: ["a", "acd", "ace"]
b: ["bb"]

然后我们从 $s$ 的第一个字符开始遍历，假设当前字符为 'a'，我们从 'a' 开头的桶中取出所有单词。对于取出的每个单词，如果此时单词长度为 $1$ ，说明该单词已经匹配完毕，我们将答案加 $1$ ；否则我们将单词的首字母去掉，然后放入下一个字母开头的桶中，比如对于单词 "acd"，去掉首字母 'a' 后，我们将其放入 'c' 开头的桶中。这一轮结束后，分桶结果变为：

c: ["cd", "ce"]
b: ["bb"]

遍历完 $s$ 后，我们就得到了答案。

实际上，每个桶可以只存储单词的下标 $i$ 以及该单词当前匹配到的位置 $j$ ，这样可以节省空间。

时间复杂度 $O(n + \sum_{i=0}^{m-1} |w_i|)$ ，空间复杂度 $O(m)$ 。其中 $n$ 和 $m$ 分别为 $s$ 和 $words$ 的长度，而 $|w_i|$ 为 $words[i]$ 的长度。

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

class Solution:
    def numMatchingSubseq(self, s: str, words: List[str]) -> int:
        d = defaultdict(deque)
        for w in words:
            d[w[0]].append(w)
        ans = 0
        for c in s:
            for _ in range(len(d[c])):
                t = d[c].popleft()
                if len(t) == 1:
                    ans += 1
                else:
                    d[t[1]].append(t[1:])
        return ans

speed

复杂度分析

指标	值
时间	Depends on the final approach
空间	Depends on the final approach

psychology

面试官常问的追问

外企场景

question_mark
The candidate demonstrates an understanding of subsequence matching and optimizes using hash maps or binary search.
question_mark
The candidate is able to apply different data structures, such as hash maps, binary search, or tries, to optimize subsequence search.
question_mark
The candidate is able to reduce time complexity efficiently and handle large input sizes within constraints.

warning

常见陷阱

外企场景

error
Brute force checking of each word as a subsequence could lead to time complexity that exceeds the problem's limits.
error
Not using an efficient data structure like a hash table or trie might result in slow performance for larger inputs.
error
Overlooking edge cases where words might not match the sequence order in `s` or contain extra characters.

swap_horiz

进阶变体

外企场景

arrow_right_alt
Use dynamic programming to store partial subsequences for more complex scenarios.
arrow_right_alt
Instead of checking each word individually, preprocess the input string `s` to speed up subsequence checks.
arrow_right_alt
Modify the approach to handle multiple strings where each has different character sets or sequences.

help

常见问题

外企场景

继续练习

#720 词典中最长的单词 #692 前K个高频单词 #1048 最长字符串链