#839

Hard

auto_awesomeArray scanning plus hash lookup

LeetCode Problem Workspace

Similar String Groups

Determine the number of connected groups of similar strings by swapping at most two letters using array scanning and hash lookup techniques.

array hash table string depth first search breadth first search

Problem Statement

Given a list of strings where each string is an anagram of the others, two strings are considered similar if they are identical or can become identical by swapping exactly two letters in one string. Your task is to determine how many groups of strings are connected by similarity, where each string belongs to a group if it is similar to at least one other string in the same group.

For example, consider strs = ["tars","rats","arts","star"]. "tars" is similar to "rats" and "rats" is similar to "arts", forming one connected group {"tars","rats","arts"}. "star" forms a separate group as it is not similar to any of the other strings. Return the total number of such groups.

Examples

Example 1

Input: strs = ["tars","rats","arts","star"]

Output: 2

Example details omitted.

Example 2

Input: strs = ["omv","ovm"]

Output: 1

Example details omitted.

Constraints

1 <= strs.length <= 300
1 <= strs[i].length <= 300
strs[i] consists of lowercase letters only.
All words in strs have the same length and are anagrams of each other.

Solution Approach

Brute Force DFS with Similarity Check

Iterate through each string and perform DFS to mark all strings reachable through similarity. Use a helper function to check if two strings differ by at most two swaps. This approach ensures every group is discovered but can be O(n^2 * m) in the worst case, where n is the number of strings and m is string length.

Union-Find Optimization

Initialize each string as its own parent in a union-find structure. For every pair of strings, union them if they are similar using the two-swap check. After processing all pairs, count the unique roots to determine the number of groups. This reduces repeated DFS visits and leverages efficient merging.

Hash-Based Similarity Candidate Reduction

Instead of comparing all pairs, use a hash map to store string signatures or sorted forms to quickly locate potential similar strings. Only perform the detailed two-swap check on likely candidates, reducing unnecessary comparisons and improving runtime on large arrays.

Complexity Analysis

Metric	Value
Time	Depends on the final approach
Space	Depends on the final approach

Time complexity depends on approach: naive DFS is O(n^2 * m) for n strings of length m. Union-Find reduces repeated checks, but still requires pairwise similarity verification in worst case. Space complexity is O(n * m) to store parent pointers or visited flags and possible hash signatures.

What Interviewers Usually Probe

Asks for an efficient method to count string similarity groups.
Checks if you can identify similarity using at most two swaps correctly.
Wants an optimized union-find or DFS implementation to handle up to 300 strings.

Common Pitfalls or Variants

Common pitfalls

Failing to correctly implement the two-letter swap similarity check.
Performing unnecessary full pairwise comparisons without candidate reduction.
Merging groups incorrectly, leading to undercounting or overcounting groups.

Follow-up variants

Allowing similarity defined by one-letter swap only.
Strings may have different lengths requiring substring comparisons.
Counting largest group size instead of total number of groups.

FAQ

What defines similarity in the Similar String Groups problem?

Two strings are similar if they are identical or can become identical by swapping exactly two letters.

Can GhostInterview handle large string arrays efficiently?

Yes, by leveraging union-find and hash candidate reduction, it avoids redundant pairwise checks for arrays up to 300 strings.

Why is union-find preferred over plain DFS?

Union-find efficiently merges connected groups without revisiting strings multiple times, reducing overhead in large datasets.

How do I implement the two-letter swap similarity check?

Compare the strings character by character, count differences, and return true only if there are zero or exactly two mismatched positions that can be swapped.

Are there optimizations using string hashing?

Yes, hashing or canonical sorted forms can quickly locate candidate strings likely to be similar, minimizing unnecessary detailed comparisons.

terminal

Solution

Solution 1: Union-Find

We can enumerate any two strings $s$ and $t$ in the list of strings. Since $s$ and $t$ are anagrams, if the number of differing characters at corresponding positions between $s$ and $t$ does not exceed $2$, then $s$ and $t$ are similar. We can use the union-find data structure to merge $s$ and $t$. If the merge is successful, the number of similar string groups decreases by $1$.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

class UnionFind:
    def __init__(self, n):
        self.p = list(range(n))
        self.size = [1] * n

    def find(self, x):
        if self.p[x] != x:
            self.p[x] = self.find(self.p[x])
        return self.p[x]

    def union(self, a, b):
        pa, pb = self.find(a), self.find(b)
        if pa == pb:
            return False
        if self.size[pa] > self.size[pb]:
            self.p[pb] = pa
            self.size[pa] += self.size[pb]
        else:
            self.p[pa] = pb
            self.size[pb] += self.size[pa]
        return True


class Solution:
    def numSimilarGroups(self, strs: List[str]) -> int:
        n, m = len(strs), len(strs[0])
        uf = UnionFind(n)
        for i, s in enumerate(strs):
            for j, t in enumerate(strs[:i]):
                if sum(s[k] != t[k] for k in range(m)) <= 2 and uf.union(i, j):
                    n -= 1
        return n

Continue Practicing

#721 accounts merge #1202 smallest string with swaps #924 minimize malware spread