What is the main pattern used in Short Encoding of Words?

The core pattern is array scanning plus hash lookup to detect and remove redundant suffixes efficiently.

Do I need to reverse the words for the solution?

Yes, reversing words makes suffixes become prefixes, simplifying detection using a set.

How do I handle the '#' characters in computing the length?

Add one per word for the '#' delimiter when summing lengths of the filtered set of words.

Can this solution handle large arrays of words?

Yes, using array scanning with hash lookup reduces redundant checks, keeping time complexity linear in total characters.

Is it necessary to store the actual indices of words in s?

Not for computing length, but indices are part of a valid encoding; length calculation only needs filtered words and '#' counts.

#820

Medium

auto_awesome数组·哈希·扫描

LeetCode 题解工作台

单词的压缩编码

单词数组 words 的有效编码由任意助记字符串 s 和下标数组 indices 组成，且满足： words.length == indices.length 助记字符串 s 以 '#' 字符结尾对于每个下标 indices[i] ， s 的一个从 indices[i] 开始、到下一个 '#'…

数组哈希表字符串字典树

题目描述

单词数组 words 的 有效编码 由任意助记字符串 s 和下标数组 indices 组成，且满足：

words.length == indices.length
助记字符串 s 以 '#' 字符结尾
对于每个下标 indices[i] ，s 的一个从 indices[i] 开始、到下一个 '#' 字符结束（但不包括 '#'）的 子字符串 恰好与 words[i] 相等

给你一个单词数组 words ，返回成功对 words 进行编码的最小助记字符串 s 的长度。

示例 1：

输入：words = ["time", "me", "bell"]
输出：10
解释：一组有效编码为 s = "time#bell#" 和 indices = [0, 2, 5] 。
words[0] = "time" ，s 开始于 indices[0] = 0 到下一个 '#' 结束的子字符串，如加粗部分所示 "time#bell#"
words[1] = "me" ，s 开始于 indices[1] = 2 到下一个 '#' 结束的子字符串，如加粗部分所示 "time#bell#"
words[2] = "bell" ，s 开始于 indices[2] = 5 到下一个 '#' 结束的子字符串，如加粗部分所示 "time#bell#"

示例 2：

输入：words = ["t"]
输出：2
解释：一组有效编码为 s = "t#" 和 indices = [0] 。

提示：

1 <= words.length <= 2000
1 <= words[i].length <= 7
words[i] 仅由小写字母组成

lightbulb

解题思路

方法一：前缀树

题目大意：充分利用重叠的后缀，使有效编码尽可能短。

判断当前单词是否是其他单词的后缀，若是，就不用写入助记字符串中，否则需要写入并且加上一个 # 后缀。

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

class Trie:
    def __init__(self) -> None:
        self.children = [None] * 26


class Solution:
    def minimumLengthEncoding(self, words: List[str]) -> int:
        root = Trie()
        for w in words:
            cur = root
            for c in w[::-1]:
                idx = ord(c) - ord("a")
                if cur.children[idx] == None:
                    cur.children[idx] = Trie()
                cur = cur.children[idx]
        return self.dfs(root, 1)

    def dfs(self, cur: Trie, l: int) -> int:
        isLeaf, ans = True, 0
        for i in range(26):
            if cur.children[i] != None:
                isLeaf = False
                ans += self.dfs(cur.children[i], l + 1)
        if isLeaf:
            ans += l
        return ans

speed

复杂度分析

指标	值
时间	complexity is O(sum of all word lengths) because each character is processed once during reversal and set operations. Space complexity is also O(sum of all word lengths) for storing reversed words in the set.
空间	O(\sum w_i)

psychology

面试官常问的追问

外企场景

question_mark
Notice if the candidate correctly identifies redundant suffixes among words.
question_mark
Check if they efficiently remove duplicates using hashing rather than nested loops.
question_mark
Watch whether they compute the length correctly by accounting for '#' delimiters.

warning

常见陷阱

外企场景

error
Failing to remove words that are suffixes of others, leading to longer encoding.
error
Ignoring the '#' characters when summing the total length.
error
Using nested loops to compare all word pairs, causing timeouts on large inputs.

swap_horiz

进阶变体

外企场景

arrow_right_alt
Compute minimal encoding when words can include uppercase letters or digits.
arrow_right_alt
Return the actual reference string s instead of just its length.
arrow_right_alt
Extend to support dynamic insertion of words and updating the minimal encoding efficiently.

help

常见问题

外企场景

继续练习

#792 匹配子序列的单词数 #745 前缀和后缀搜索 #720 词典中最长的单词