#2506

Easy

auto_awesomeArray scanning plus hash lookup

LeetCode Problem Workspace

Count Pairs Of Similar Strings

Count similar string pairs by converting each word into a character-set signature and counting matching signatures efficiently.

array hash table string bit manipulation counting

Problem Statement

You get a 0-indexed array called words. Two words are considered similar when they contain exactly the same distinct letters, even if the counts and order are different. For example, aba and aabb are similar because both use only a and b, while abcd is not similar to bac because one has d and the other does not.

Your task is to count how many index pairs (i, j) satisfy i < j and produce similar words. A direct comparison of every pair works on small input, but this problem is designed to reward collapsing each word into a reusable signature of its character set before counting matches.

Examples

Example 1

Input: words = ["aba","aabb","abcd","bac","aabc"]

Output: 2

There are 2 pairs that satisfy the conditions:

i = 0 and j = 1 : both words[0] and words[1] only consist of characters 'a' and 'b'.
i = 3 and j = 4 : both words[3] and words[4] only consist of characters 'a', 'b', and 'c'.

Example 2

Input: words = ["aabb","ab","ba"]

Output: 3

There are 3 pairs that satisfy the conditions:

i = 0 and j = 1 : both words[0] and words[1] only consist of characters 'a' and 'b'.
i = 0 and j = 2 : both words[0] and words[2] only consist of characters 'a' and 'b'.
i = 1 and j = 2 : both words[1] and words[2] only consist of characters 'a' and 'b'.

Example 3

Input: words = ["nba","cba","dba"]

Output: 0

Since there does not exist any pair that satisfies the conditions, we return 0.

Constraints

1 <= words.length <= 100
1 <= words[i].length <= 100
words[i] consist of only lowercase English letters.

Solution Approach

Build a signature for each word

Scan each word and record which lowercase letters appear. The key detail is that frequency does not matter in Count Pairs Of Similar Strings, only presence. You can store the set as a 26-bit mask or as a normalized character-set string, but the mask is simpler and faster.

Count matches while scanning the array

For each word, compute its signature and look it up in a hash map. If that signature has already appeared k times, then the current word forms k new similar pairs with earlier words. After adding k to the answer, increment the stored count for that signature.

Why this beats pairwise comparison

The common failure mode is comparing every word against every other word and rebuilding sets again and again. That repeats the same work. By converting each word once and using hash lookup, you turn the problem into grouping equal signatures during one pass through the array.

Complexity Analysis

Metric	Value
Time	Depends on the final approach
Space	Depends on the final approach

Let n be the number of words and m be the maximum word length. Computing one signature takes O(m), so the full scan costs O(n * m). Hash lookups are O(1) on average, and the extra space is O(n) in the worst case when every word produces a different signature.

What Interviewers Usually Probe

They ask how to check whether two strings are similar without sorting entire words.
They hint that character frequency is irrelevant, so duplicates inside one word should collapse away.
They want you to replace repeated pairwise set construction with a reusable signature plus hash counting.

Common Pitfalls or Variants

Common pitfalls

Using raw words as map keys instead of a character-set signature, which misses pairs like ab and ba.
Counting letter frequency instead of letter presence, which incorrectly separates aba from aabb.
Running nested comparisons over all pairs and rebuilding sets every time, which adds avoidable work for this grouping problem.

Follow-up variants

Return the grouped words instead of only the pair count by storing index lists for each signature.
Change the alphabet size, which keeps the same hash-grouping idea but may replace the 26-bit mask representation.
Count pairs where one word's character set is a subset of another, which changes equality lookup into subset matching.

FAQ

What is the main pattern in Count Pairs Of Similar Strings?

The main pattern is array scanning plus hash lookup. Convert each word into a signature of its distinct letters, then count how many earlier words already have that same signature.

Why is a bitmask a good fit here?

The words use only lowercase English letters, so each letter can map to one of 26 bits. That makes it easy to represent the full character set of a word in one integer and compare similar words quickly.

Do repeated letters matter in this problem?

No. Similarity depends only on whether a letter appears at least once. That is why aba, aabb, and ab all share the same signature.

Could I solve it by sorting each word?

You could sort and deduplicate characters in each word to build a canonical key, but that does extra work compared with a direct presence-based mask. The bitmask approach matches the problem definition more directly.

Why not compare every pair of words directly?

Direct comparison works functionally, but it repeats character-set construction across many pairs. Grouping by signature is cleaner because each word is processed once, and every match is counted through the hash map.

terminal

Solution

Solution 1: Hash Table + Bit Manipulation

For each string, we can convert it into a binary number of length $26$, where the $i$-th bit being $1$ indicates that the string contains the $i$-th letter.

1

2

3

4

5

6

7

8

9

10

11

class Solution:
    def similarPairs(self, words: List[str]) -> int:
        ans = 0
        cnt = Counter()
        for s in words:
            x = 0
            for c in map(ord, s):
                x |= 1 << (c - ord("a"))
            ans += cnt[x]
            cnt[x] += 1
        return ans

Continue Practicing

#1684 count the number of consistent strings #2564 substring xor queries #2351 first letter to appear twice