#1948

Hard

auto_awesomeArray scanning plus hash lookup

LeetCode Problem Workspace

Delete Duplicate Folders in System

Solve Delete Duplicate Folders in System by building a trie, serializing child subtrees, and deleting repeated non-empty signatures.

array hash table string trie hash function

Problem Statement

You are given absolute folder paths from a buggy file system, and some folders should be deleted because they represent the same subtree. Two folders count as duplicates when their child folder sets are the same and those children recursively have the same structure. The match is structural, so duplicate folders can appear under different parents and at different depths.

The deletion happens only once after all duplicate folders are identified from the original tree. That detail changes the result in cases where removing one duplicate pair would make a new pair look identical later, because those newly matching folders must stay. The practical task is to rebuild the folder tree from the path list, detect repeated non-empty subtree shapes, remove every marked folder and its descendants, and return the surviving paths in any order.

Examples

Example 1

Input: paths = [["a"],["c"],["d"],["a","b"],["c","b"],["d","a"]]

Output: [["d"],["d","a"]]

The file structure is as shown. Folders "/a" and "/c" (and their subfolders) are marked for deletion because they both contain an empty folder named "b".

Example 2

Input: paths = [["a"],["c"],["a","b"],["c","b"],["a","b","x"],["a","b","x","y"],["w"],["w","y"]]

Output: [["c"],["c","b"],["a"],["a","b"]]

The file structure is as shown. Folders "/a/b/x" and "/w" (and their subfolders) are marked for deletion because they both contain an empty folder named "y". Note that folders "/a" and "/c" are identical after the deletion, but they are not deleted because they were not marked beforehand.

Example 3

Input: paths = [["a","b"],["c","d"],["c"],["a"]]

Output: [["c"],["c","d"],["a"],["a","b"]]

All folders are unique in the file system. Note that the returned array can be in a different order as the order does not matter.

Constraints

1 <= paths.length <= 2 * 104
1 <= paths[i].length <= 500
1 <= paths[i][j].length <= 10
1 <= sum(paths[i][j].length) <= 2 * 105
path[i][j] consists of lowercase English letters.
No two paths lead to the same folder.
For any folder not at the root level, its parent folder will also be in the input.

Solution Approach

Build the folder trie from the path list

Insert each path into a trie where every node is a folder and each edge is a folder name. This step converts flat arrays into the exact parent-child structure the duplicate rule depends on. Plain array scanning is not enough by itself because duplicates are defined by recursive subtree shape, not by adjacent paths or shared prefixes alone.

Serialize each subtree and count repeated signatures

Run a postorder DFS. For each folder, compute a canonical string from its children, such as sorted child name plus child serialization pairs. Leaf folders get an empty child signature, but only folders with a non-empty child set are eligible for duplicate removal. Store each serialization in a hash table and count how many times it appears. This is the critical lookup step: two folders are duplicates only when their serialized child structures match exactly.

Mark duplicates once and collect remaining paths

After counting signatures, run another DFS and mark any folder whose non-empty serialization appears more than once. Do not recompute deletions after removing nodes, because the problem explicitly says the system deletes only once. Finally traverse from the root, skip marked nodes, and record every remaining path. This separation between identify first and delete later prevents the common mistake of cascading extra deletions.

Complexity Analysis

Metric	Value
Time	Depends on the final approach
Space	Depends on the final approach

Let N be the number of folders and let S be the total serialized content size across all trie nodes. Building the trie takes O(total path segments). The subtree DFS and hash lookup are typically O(S), although canonical string construction can add overhead if child signatures are concatenated naively. Space is O(N + S) for the trie, stored serializations, frequency map, and output. In practice, the main trade-off is between simple string serialization and a more compact integer hashing scheme that reduces repeated string cost.

What Interviewers Usually Probe

They want you to notice that duplicate detection is based on subtree structure, so rebuilding the hierarchy is more important than sorting raw paths.
If they ask about a trie, they are steering you away from pairwise folder comparison, which is too slow and misses recursive structure.
If they stress that deletion runs once, they are testing whether you separate detection from removal instead of deleting during the first DFS.

Common Pitfalls or Variants

Common pitfalls

Treating empty folders as duplicate signatures to delete automatically. In this problem, duplicate folders must have a non-empty set of subfolders.
Serializing children without a stable canonical order, which can make identical subtrees produce different signatures depending on insertion order.
Deleting folders as soon as a duplicate is found, which incorrectly removes folders that only become identical after the initial deletion pass.

Follow-up variants

Return the number of deleted folders instead of the remaining paths, which still uses the same trie plus subtree signature pass.
Treat files as labeled leaves in addition to folders, so the serialization must include both file names and child folder signatures.
Support repeated cleanup rounds until no duplicates remain, which changes the one-pass rule and requires iterative rebuild or repeated marking.

FAQ

What is the main pattern for Delete Duplicate Folders in System?

The reliable pattern is trie construction plus postorder subtree serialization with hash lookup. The trie captures parent-child structure, and the hash table detects repeated non-empty child signatures without expensive pairwise subtree comparison.

Why is a trie better than comparing the input paths directly?

The input is a flat list of absolute paths, but duplicate folders are defined by recursive descendants. A trie rebuilds the hierarchy so each folder can be evaluated from its children upward, which is exactly what the duplicate rule needs.

Why are folders that become identical after deletion not removed?

Because the system marks duplicates from the original tree and performs deletion only once. That means you must finish the full signature-counting pass before removing anything, then skip marked nodes in the final collection pass.

Can I use hashing instead of full string serialization?

Yes. Many solutions serialize each subtree into a canonical string first because it is easy to reason about, then optimize with integer IDs or rolling hashes if repeated string construction becomes costly. The important part is that equal subtrees always map to the same canonical representation.

What usually breaks solutions on this problem?

Three mistakes show up often: deleting empty folders as duplicates, building signatures in unstable child order, and deleting during detection instead of after all duplicate signatures are known. Each one produces wrong answers on the official edge cases.

terminal

Solution

Solution 1: Trie + DFS

We can use a trie to store the folder structure, where each node in the trie contains the following data:

class Trie:
    def __init__(self):
        self.children: Dict[str, "Trie"] = defaultdict(Trie)
        self.deleted: bool = False


class Solution:
    def deleteDuplicateFolder(self, paths: List[List[str]]) -> List[List[str]]:
        root = Trie()
        for path in paths:
            cur = root
            for name in path:
                if cur.children[name] is None:
                    cur.children[name] = Trie()
                cur = cur.children[name]

        g: Dict[str, Trie] = {}

        def dfs(node: Trie) -> str:
            if not node.children:
                return ""
            subs: List[str] = []
            for name, child in node.children.items():
                subs.append(f"{name}({dfs(child)})")
            s = "".join(sorted(subs))
            if s in g:
                node.deleted = g[s].deleted = True
            else:
                g[s] = node
            return s

        def dfs2(node: Trie) -> None:
            if node.deleted:
                return
            if path:
                ans.append(path[:])
            for name, child in node.children.items():
                path.append(name)
                dfs2(child)
                path.pop()

        dfs(root)
        ans: List[List[str]] = []
        path: List[str] = []
        dfs2(root)
        return ans

Continue Practicing

#2227 encrypt and decrypt strings #2261 k divisible elements subarrays #2707 extra characters in a string