What is the main pattern in Count Pairs Of Similar Strings?

The main pattern is array scanning plus hash lookup. Convert each word into a signature of its distinct letters, then count how many earlier words already have that same signature.

Why is a bitmask a good fit here?

The words use only lowercase English letters, so each letter can map to one of 26 bits. That makes it easy to represent the full character set of a word in one integer and compare similar words quickly.

Do repeated letters matter in this problem?

No. Similarity depends only on whether a letter appears at least once. That is why aba, aabb, and ab all share the same signature.

Could I solve it by sorting each word?

You could sort and deduplicate characters in each word to build a canonical key, but that does extra work compared with a direct presence-based mask. The bitmask approach matches the problem definition more directly.

Why not compare every pair of words directly?

Direct comparison works functionally, but it repeats character-set construction across many pairs. Grouping by signature is cleaner because each word is processed once, and every match is counted through the hash map.

#2506

Easy

auto_awesome数组·哈希·扫描

LeetCode 题解工作台

统计相似字符串对的数目

给你一个下标从 0 开始的字符串数组 words 。如果两个字符串由相同的字符组成，则认为这两个字符串相似。例如， "abca" 和 "cba" 相似，因为它们都由字符 'a' 、 'b' 、 'c' 组成。然而， "abacba" 和 "bcfd" 不相似，因为它们不是相同字符组成的。 …

数组哈希表字符串位运算计数

题目描述

给你一个下标从 0 开始的字符串数组 words 。

如果两个字符串由相同的字符组成，则认为这两个字符串相似。

例如，"abca" 和 "cba" 相似，因为它们都由字符 'a'、'b'、'c' 组成。
然而，"abacba" 和 "bcfd" 不相似，因为它们不是相同字符组成的。

请你找出满足字符串 words[i] 和 words[j] 相似的下标对 (i, j) ，并返回下标对的数目，其中 0 <= i < j <= words.length - 1 。

示例 1：

输入：words = ["aba","aabb","abcd","bac","aabc"]
输出：2
解释：共有 2 对满足条件：
- i = 0 且 j = 1 ：words[0] 和 words[1] 只由字符 'a' 和 'b' 组成。 
- i = 3 且 j = 4 ：words[3] 和 words[4] 只由字符 'a'、'b' 和 'c' 。

示例 2：

输入：words = ["aabb","ab","ba"]
输出：3
解释：共有 3 对满足条件：
- i = 0 且 j = 1 ：words[0] 和 words[1] 只由字符 'a' 和 'b' 组成。 
- i = 0 且 j = 2 ：words[0] 和 words[2] 只由字符 'a' 和 'b' 组成。 
- i = 1 且 j = 2 ：words[1] 和 words[2] 只由字符 'a' 和 'b' 组成。

示例 3：

输入：words = ["nba","cba","dba"]
输出：0
解释：不存在满足条件的下标对，返回 0 。

提示：

1 <= words.length <= 100
1 <= words[i].length <= 100
words[i] 仅由小写英文字母组成

lightbulb

解题思路

方法一：哈希表 + 位运算

对于每个字符串，我们可以将其转换为一个长度为 $26$ 的二进制数，其中第 $i$ 位为 $1$ 表示该字符串中包含第 $i$ 个字母。

如果两个字符串包含相同的字母，则它们的二进制数是相同的，因此，对于每个字符串，我们用哈希表统计其二进制数出现的次数，每一次累加到答案中，再将其二进制数出现的次数加 $1$ 。

时间复杂度 $O(L)$ ，空间复杂度 $O(n)$ 。其中 $L$ 是所有字符串的长度之和，而 $n$ 是字符串的数量。

1

2

3

4

5

6

7

8

9

10

11

12

class Solution:
    def similarPairs(self, words: List[str]) -> int:
        ans = 0
        cnt = Counter()
        for s in words:
            x = 0
            for c in map(ord, s):
                x |= 1 << (c - ord("a"))
            ans += cnt[x]
            cnt[x] += 1
        return ans

speed

复杂度分析

指标	值
时间	Depends on the final approach
空间	Depends on the final approach

psychology

面试官常问的追问

外企场景

question_mark
They ask how to check whether two strings are similar without sorting entire words.
question_mark
They hint that character frequency is irrelevant, so duplicates inside one word should collapse away.
question_mark
They want you to replace repeated pairwise set construction with a reusable signature plus hash counting.

warning

常见陷阱

外企场景

error
Using raw words as map keys instead of a character-set signature, which misses pairs like ab and ba.
error
Counting letter frequency instead of letter presence, which incorrectly separates aba from aabb.
error
Running nested comparisons over all pairs and rebuilding sets every time, which adds avoidable work for this grouping problem.

swap_horiz

进阶变体

外企场景

arrow_right_alt
Return the grouped words instead of only the pair count by storing index lists for each signature.
arrow_right_alt
Change the alphabet size, which keeps the same hash-grouping idea but may replace the 26-bit mask representation.
arrow_right_alt
Count pairs where one word's character set is a subset of another, which changes equality lookup into subset matching.

help

常见问题

外企场景

继续练习

#1684 统计一致字符串的数目 #2564 子字符串异或查询 #2351 第一个出现两次的字母