#820

Medium

auto_awesomeArray scanning plus hash lookup

LeetCode Problem Workspace

Short Encoding of Words

Find the minimum length of a reference string encoding an array of words using array scanning and hash lookup efficiently.

array hash table string trie

Problem Statement

Given an array of lowercase words, compute the minimal length of a reference string s that encodes all words. Each word must be represented as a substring ending with '#' in s, ensuring no redundant storage of suffixes.

Return the length of the shortest valid reference string s. For example, words = ["time", "me", "bell"] can be encoded as s = "time#bell#" with indices [0,2,5], yielding length 10.

Examples

Example 1

Input: words = ["time", "me", "bell"]

Output: 10

A valid encoding would be s = "time#bell#" and indices = [0, 2, 5]. words[0] = "time", the substring of s starting from indices[0] = 0 to the next '#' is underlined in "time#bell#" words[1] = "me", the substring of s starting from indices[1] = 2 to the next '#' is underlined in "time#bell#" words[2] = "bell", the substring of s starting from indices[2] = 5 to the next '#' is underlined in "time#bell#"

Example 2

Input: words = ["t"]

Output: 2

A valid encoding would be s = "t#" and indices = [0].

Constraints

1 <= words.length <= 2000
1 <= words[i].length <= 7
words[i] consists of only lowercase letters.

Solution Approach

Reverse words and use a set for suffix elimination

Reverse each word so that suffixes become prefixes. Insert all reversed words into a set, then remove any word that is a prefix of another. The remaining words correspond to unique paths in the final encoding.

Sum lengths with '#' for final encoding

After filtering, compute the total length by adding each remaining word's length plus one for the '#' character. This sum represents the minimal reference string length.

Array scanning plus hash lookup ensures efficiency

Scan through the reversed words array while using a hash set to quickly check if a word is a suffix of another. This prevents duplicate encoding and reduces the complexity to O(total characters).

Complexity Analysis

Metric	Value
Time	O(\sum w_i)
Space	O(\sum w_i)

Time complexity is O(sum of all word lengths) because each character is processed once during reversal and set operations. Space complexity is also O(sum of all word lengths) for storing reversed words in the set.

What Interviewers Usually Probe

Notice if the candidate correctly identifies redundant suffixes among words.
Check if they efficiently remove duplicates using hashing rather than nested loops.
Watch whether they compute the length correctly by accounting for '#' delimiters.

Common Pitfalls or Variants

Common pitfalls

Failing to remove words that are suffixes of others, leading to longer encoding.
Ignoring the '#' characters when summing the total length.
Using nested loops to compare all word pairs, causing timeouts on large inputs.

Follow-up variants

Compute minimal encoding when words can include uppercase letters or digits.
Return the actual reference string s instead of just its length.
Extend to support dynamic insertion of words and updating the minimal encoding efficiently.

FAQ

What is the main pattern used in Short Encoding of Words?

The core pattern is array scanning plus hash lookup to detect and remove redundant suffixes efficiently.

Do I need to reverse the words for the solution?

Yes, reversing words makes suffixes become prefixes, simplifying detection using a set.

How do I handle the '#' characters in computing the length?

Add one per word for the '#' delimiter when summing lengths of the filtered set of words.

Can this solution handle large arrays of words?

Yes, using array scanning with hash lookup reduces redundant checks, keeping time complexity linear in total characters.

Is it necessary to store the actual indices of words in s?

Not for computing length, but indices are part of a valid encoding; length calculation only needs filtered words and '#' counts.

terminal

Solution

Solution 1

#### Python3

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

class Trie:
    def __init__(self) -> None:
        self.children = [None] * 26


class Solution:
    def minimumLengthEncoding(self, words: List[str]) -> int:
        root = Trie()
        for w in words:
            cur = root
            for c in w[::-1]:
                idx = ord(c) - ord("a")
                if cur.children[idx] == None:
                    cur.children[idx] = Trie()
                cur = cur.children[idx]
        return self.dfs(root, 1)

    def dfs(self, cur: Trie, l: int) -> int:
        isLeaf, ans = True, 0
        for i in range(26):
            if cur.children[i] != None:
                isLeaf = False
                ans += self.dfs(cur.children[i], l + 1)
        if isLeaf:
            ans += l
        return ans

Solution 2

#### Python3

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

class Trie:
    def __init__(self) -> None:
        self.children = [None] * 26


class Solution:
    def minimumLengthEncoding(self, words: List[str]) -> int:
        root = Trie()
        for w in words:
            cur = root
            for c in w[::-1]:
                idx = ord(c) - ord("a")
                if cur.children[idx] == None:
                    cur.children[idx] = Trie()
                cur = cur.children[idx]
        return self.dfs(root, 1)

    def dfs(self, cur: Trie, l: int) -> int:
        isLeaf, ans = True, 0
        for i in range(26):
            if cur.children[i] != None:
                isLeaf = False
                ans += self.dfs(cur.children[i], l + 1)
        if isLeaf:
            ans += l
        return ans

Continue Practicing

#792 number of matching subsequences #745 prefix and suffix search #720 longest word in dictionary