LeetCode 题解工作台
单词的压缩编码
单词数组 words 的 有效编码 由任意助记字符串 s 和下标数组 indices 组成,且满足: words.length == indices.length 助记字符串 s 以 '#' 字符结尾 对于每个下标 indices[i] , s 的一个从 indices[i] 开始、到下一个 '#'…
4
题型
4
代码语言
3
相关题
当前训练重点
中等 · 数组·哈希·扫描
答案摘要
题目大意:充分利用重叠的后缀,使有效编码尽可能短。 判断当前单词是否是其他单词的后缀,若是,就不用写入助记字符串中,否则需要写入并且加上一个 # 后缀。
Interview AiBoxInterview AiBox 实时 AI 助手,陪你讲清 数组·哈希·扫描 题型思路
题目描述
单词数组 words 的 有效编码 由任意助记字符串 s 和下标数组 indices 组成,且满足:
words.length == indices.length- 助记字符串
s以'#'字符结尾 - 对于每个下标
indices[i],s的一个从indices[i]开始、到下一个'#'字符结束(但不包括'#')的 子字符串 恰好与words[i]相等
给你一个单词数组 words ,返回成功对 words 进行编码的最小助记字符串 s 的长度 。
示例 1:
输入:words = ["time", "me", "bell"]
输出:10
解释:一组有效编码为 s = "time#bell#" 和 indices = [0, 2, 5] 。
words[0] = "time" ,s 开始于 indices[0] = 0 到下一个 '#' 结束的子字符串,如加粗部分所示 "time#bell#"
words[1] = "me" ,s 开始于 indices[1] = 2 到下一个 '#' 结束的子字符串,如加粗部分所示 "time#bell#"
words[2] = "bell" ,s 开始于 indices[2] = 5 到下一个 '#' 结束的子字符串,如加粗部分所示 "time#bell#"
示例 2:
输入:words = ["t"] 输出:2 解释:一组有效编码为 s = "t#" 和 indices = [0] 。
提示:
1 <= words.length <= 20001 <= words[i].length <= 7words[i]仅由小写字母组成
解题思路
方法一:前缀树
题目大意:充分利用重叠的后缀,使有效编码尽可能短。
判断当前单词是否是其他单词的后缀,若是,就不用写入助记字符串中,否则需要写入并且加上一个 # 后缀。
class Trie:
def __init__(self) -> None:
self.children = [None] * 26
class Solution:
def minimumLengthEncoding(self, words: List[str]) -> int:
root = Trie()
for w in words:
cur = root
for c in w[::-1]:
idx = ord(c) - ord("a")
if cur.children[idx] == None:
cur.children[idx] = Trie()
cur = cur.children[idx]
return self.dfs(root, 1)
def dfs(self, cur: Trie, l: int) -> int:
isLeaf, ans = True, 0
for i in range(26):
if cur.children[i] != None:
isLeaf = False
ans += self.dfs(cur.children[i], l + 1)
if isLeaf:
ans += l
return ans
复杂度分析
| 指标 | 值 |
|---|---|
| 时间 | complexity is O(sum of all word lengths) because each character is processed once during reversal and set operations. Space complexity is also O(sum of all word lengths) for storing reversed words in the set. |
| 空间 | O(\sum w_i) |
面试官常问的追问
外企场景- question_mark
Notice if the candidate correctly identifies redundant suffixes among words.
- question_mark
Check if they efficiently remove duplicates using hashing rather than nested loops.
- question_mark
Watch whether they compute the length correctly by accounting for '#' delimiters.
常见陷阱
外企场景- error
Failing to remove words that are suffixes of others, leading to longer encoding.
- error
Ignoring the '#' characters when summing the total length.
- error
Using nested loops to compare all word pairs, causing timeouts on large inputs.
进阶变体
外企场景- arrow_right_alt
Compute minimal encoding when words can include uppercase letters or digits.
- arrow_right_alt
Return the actual reference string s instead of just its length.
- arrow_right_alt
Extend to support dynamic insertion of words and updating the minimal encoding efficiently.