What defines similarity in the Similar String Groups problem?

Two strings are similar if they are identical or can become identical by swapping exactly two letters.

Can GhostInterview handle large string arrays efficiently?

Yes, by leveraging union-find and hash candidate reduction, it avoids redundant pairwise checks for arrays up to 300 strings.

Why is union-find preferred over plain DFS?

Union-find efficiently merges connected groups without revisiting strings multiple times, reducing overhead in large datasets.

How do I implement the two-letter swap similarity check?

Compare the strings character by character, count differences, and return true only if there are zero or exactly two mismatched positions that can be swapped.

Are there optimizations using string hashing?

Yes, hashing or canonical sorted forms can quickly locate candidate strings likely to be similar, minimizing unnecessary detailed comparisons.

#839

Hard

auto_awesome数组·哈希·扫描

LeetCode 题解工作台

相似字符串组

如果交换字符串 X 中的两个不同位置的字母，使得它和字符串 Y 相等，那么称 X 和 Y 两个字符串相似。如果这两个字符串本身是相等的，那它们也是相似的。例如， "tars" 和 "rats" 是相似的 (交换 0 与 2 的位置)； "rats" 和 "arts" 也是相似的，但是 "star"…

数组哈希表字符串深度优先搜索广度优先搜索

题目描述

如果交换字符串 X 中的两个不同位置的字母，使得它和字符串 Y 相等，那么称 X 和 Y 两个字符串相似。如果这两个字符串本身是相等的，那它们也是相似的。

例如，"tars" 和 "rats" 是相似的 (交换 0 与 2 的位置)； "rats" 和 "arts" 也是相似的，但是 "star" 不与 "tars"，"rats"，或 "arts" 相似。

总之，它们通过相似性形成了两个关联组：{"tars", "rats", "arts"} 和 {"star"}。注意，"tars" 和 "arts" 是在同一组中，即使它们并不相似。形式上，对每个组而言，要确定一个单词在组中，只需要这个词和该组中至少一个单词相似。

给你一个字符串列表 strs。列表中的每个字符串都是 strs 中其它所有字符串的一个字母异位词。请问 strs 中有多少个相似字符串组？

示例 1：

输入：strs = ["tars","rats","arts","star"]
输出：2

示例 2：

输入：strs = ["omv","ovm"]
输出：1

提示：

1 <= strs.length <= 300
1 <= strs[i].length <= 300
strs[i] 只包含小写字母。
strs 中的所有单词都具有相同的长度，且是彼此的字母异位词。

lightbulb

解题思路

方法一：并查集

我们可以枚举字符串列表中的任意两个字符串 $s$ 和 $t$ ，由于 $s$ 和 $t$ 是字母异位词，因此如果 $s$ 和 $t$ 的对应位置字符不同的数量不超过 $2$ ，那么 $s$ 和 $t$ 是相似的，我们就可以使用并查集将 $s$ 和 $t$ 合并，如果合并成功，那么相似字符串组的数量减少 $1$ 。

最终相似字符串组的数量就是并查集中连通分量的数量。

时间复杂度 $O(n^2 \times (m + \alpha(n)))$ ，空间复杂度 $O(n)$ 。其中 $n$ 和 $m$ 分别是字符串列表的长度和字符串的长度，而 $\alpha(n)$ 是 Ackermann 函数的反函数，可以看成是一个很小的常数。

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

class UnionFind:
    def __init__(self, n):
        self.p = list(range(n))
        self.size = [1] * n

    def find(self, x):
        if self.p[x] != x:
            self.p[x] = self.find(self.p[x])
        return self.p[x]

    def union(self, a, b):
        pa, pb = self.find(a), self.find(b)
        if pa == pb:
            return False
        if self.size[pa] > self.size[pb]:
            self.p[pb] = pa
            self.size[pa] += self.size[pb]
        else:
            self.p[pa] = pb
            self.size[pb] += self.size[pa]
        return True


class Solution:
    def numSimilarGroups(self, strs: List[str]) -> int:
        n, m = len(strs), len(strs[0])
        uf = UnionFind(n)
        for i, s in enumerate(strs):
            for j, t in enumerate(strs[:i]):
                if sum(s[k] != t[k] for k in range(m)) <= 2 and uf.union(i, j):
                    n -= 1
        return n

speed

复杂度分析

指标	值
时间	complexity depends on approach: naive DFS is O(n^2 * m) for n strings of length m. Union-Find reduces repeated checks, but still requires pairwise similarity verification in worst case. Space complexity is O(n * m) to store parent pointers or visited flags and possible hash signatures.
空间	Depends on the final approach

psychology

面试官常问的追问

外企场景

question_mark
Asks for an efficient method to count string similarity groups.
question_mark
Checks if you can identify similarity using at most two swaps correctly.
question_mark
Wants an optimized union-find or DFS implementation to handle up to 300 strings.

warning

常见陷阱

外企场景

error
Failing to correctly implement the two-letter swap similarity check.
error
Performing unnecessary full pairwise comparisons without candidate reduction.
error
Merging groups incorrectly, leading to undercounting or overcounting groups.

swap_horiz

进阶变体

外企场景

arrow_right_alt
Allowing similarity defined by one-letter swap only.
arrow_right_alt
Strings may have different lengths requiring substring comparisons.
arrow_right_alt
Counting largest group size instead of total number of groups.

help

常见问题

外企场景

继续练习

#721 账户合并 #1202 交换字符串中的元素 #924 尽量减少恶意软件的传播