LeetCode Problem Workspace

Short Encoding of Words

Find the minimum length of a reference string encoding an array of words using array scanning and hash lookup efficiently.

category

4

Topics

code_blocks

4

Code langs

hub

3

Related

Practice Focus

Medium · Array scanning plus hash lookup

bolt

Answer-first summary

Find the minimum length of a reference string encoding an array of words using array scanning and hash lookup efficiently.

Interview AiBox logo

Ace coding interviews with Interview AiBox guidance for Array scanning plus hash lookup

Try AiBox Copilotarrow_forward

This problem requires constructing a minimal reference string that encodes all words so each can be retrieved as a substring. Using array scanning with hash lookup ensures duplicates and suffixes are efficiently removed, avoiding redundant storage. The approach leverages string reversal and set operations to detect overlapping word endings, producing the shortest total length.

Problem Statement

Given an array of lowercase words, compute the minimal length of a reference string s that encodes all words. Each word must be represented as a substring ending with '#' in s, ensuring no redundant storage of suffixes.

Return the length of the shortest valid reference string s. For example, words = ["time", "me", "bell"] can be encoded as s = "time#bell#" with indices [0,2,5], yielding length 10.

Examples

Example 1

Input: words = ["time", "me", "bell"]

Output: 10

A valid encoding would be s = "time#bell#" and indices = [0, 2, 5]. words[0] = "time", the substring of s starting from indices[0] = 0 to the next '#' is underlined in "time#bell#" words[1] = "me", the substring of s starting from indices[1] = 2 to the next '#' is underlined in "time#bell#" words[2] = "bell", the substring of s starting from indices[2] = 5 to the next '#' is underlined in "time#bell#"

Example 2

Input: words = ["t"]

Output: 2

A valid encoding would be s = "t#" and indices = [0].

Constraints

  • 1 <= words.length <= 2000
  • 1 <= words[i].length <= 7
  • words[i] consists of only lowercase letters.

Solution Approach

Reverse words and use a set for suffix elimination

Reverse each word so that suffixes become prefixes. Insert all reversed words into a set, then remove any word that is a prefix of another. The remaining words correspond to unique paths in the final encoding.

Sum lengths with '#' for final encoding

After filtering, compute the total length by adding each remaining word's length plus one for the '#' character. This sum represents the minimal reference string length.

Array scanning plus hash lookup ensures efficiency

Scan through the reversed words array while using a hash set to quickly check if a word is a suffix of another. This prevents duplicate encoding and reduces the complexity to O(total characters).

Complexity Analysis

Metric Value
Time O(\sum w_i)
Space O(\sum w_i)

Time complexity is O(sum of all word lengths) because each character is processed once during reversal and set operations. Space complexity is also O(sum of all word lengths) for storing reversed words in the set.

What Interviewers Usually Probe

  • Notice if the candidate correctly identifies redundant suffixes among words.
  • Check if they efficiently remove duplicates using hashing rather than nested loops.
  • Watch whether they compute the length correctly by accounting for '#' delimiters.

Common Pitfalls or Variants

Common pitfalls

  • Failing to remove words that are suffixes of others, leading to longer encoding.
  • Ignoring the '#' characters when summing the total length.
  • Using nested loops to compare all word pairs, causing timeouts on large inputs.

Follow-up variants

  • Compute minimal encoding when words can include uppercase letters or digits.
  • Return the actual reference string s instead of just its length.
  • Extend to support dynamic insertion of words and updating the minimal encoding efficiently.

FAQ

What is the main pattern used in Short Encoding of Words?

The core pattern is array scanning plus hash lookup to detect and remove redundant suffixes efficiently.

Do I need to reverse the words for the solution?

Yes, reversing words makes suffixes become prefixes, simplifying detection using a set.

How do I handle the '#' characters in computing the length?

Add one per word for the '#' delimiter when summing lengths of the filtered set of words.

Can this solution handle large arrays of words?

Yes, using array scanning with hash lookup reduces redundant checks, keeping time complexity linear in total characters.

Is it necessary to store the actual indices of words in s?

Not for computing length, but indices are part of a valid encoding; length calculation only needs filtered words and '#' counts.

terminal

Solution

Solution 1

#### Python3

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
class Trie:
    def __init__(self) -> None:
        self.children = [None] * 26


class Solution:
    def minimumLengthEncoding(self, words: List[str]) -> int:
        root = Trie()
        for w in words:
            cur = root
            for c in w[::-1]:
                idx = ord(c) - ord("a")
                if cur.children[idx] == None:
                    cur.children[idx] = Trie()
                cur = cur.children[idx]
        return self.dfs(root, 1)

    def dfs(self, cur: Trie, l: int) -> int:
        isLeaf, ans = True, 0
        for i in range(26):
            if cur.children[i] != None:
                isLeaf = False
                ans += self.dfs(cur.children[i], l + 1)
        if isLeaf:
            ans += l
        return ans

Solution 2

#### Python3

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
class Trie:
    def __init__(self) -> None:
        self.children = [None] * 26


class Solution:
    def minimumLengthEncoding(self, words: List[str]) -> int:
        root = Trie()
        for w in words:
            cur = root
            for c in w[::-1]:
                idx = ord(c) - ord("a")
                if cur.children[idx] == None:
                    cur.children[idx] = Trie()
                cur = cur.children[idx]
        return self.dfs(root, 1)

    def dfs(self, cur: Trie, l: int) -> int:
        isLeaf, ans = True, 0
        for i in range(26):
            if cur.children[i] != None:
                isLeaf = False
                ans += self.dfs(cur.children[i], l + 1)
        if isLeaf:
            ans += l
        return ans
Short Encoding of Words Solution: Array scanning plus hash lookup | LeetCode #820 Medium