#929

Easy

auto_awesomeArray scanning plus hash lookup

LeetCode Problem Workspace

Unique Email Addresses

Identify the count of unique email addresses by normalizing local names and using hash-based lookups efficiently.

Problem Statement

Given a list of email addresses, compute the number of distinct addresses that actually receive emails. Each email consists of a local name and a domain name separated by '@'. In the local name, periods '.' are ignored and everything after the first plus '+' is discarded. Domain names remain unchanged.

Return the count of unique email addresses after applying these normalization rules. For example, 'test.email+alex@leetcode.com' and 'test.e.mail@leetcode.com' are considered the same. The input array can contain up to 100 emails, each up to 100 characters long, consisting of lowercase letters, '.', '+', and '@'.

Examples

Example 1

Input: emails = ["test.email+alex@leetcode.com","test.e.mail+bob.cathy@leetcode.com","testemail+david@lee.tcode.com"]

Output: 2

"testemail@leetcode.com" and "testemail@lee.tcode.com" actually receive mails.

Example 2

Input: emails = ["a@leetcode.com","b@leetcode.com","c@leetcode.com"]

Output: 3

Example details omitted.

Constraints

1 <= emails.length <= 100
1 <= emails[i].length <= 100
emails[i] consist of lowercase English letters, '+', '.' and '@'.
Each emails[i] contains exactly one '@' character.
All local and domain names are non-empty.
Local names do not start with a '+' character.
Domain names end with the ".com" suffix.
Domain names must contain at least one character before ".com" suffix.

Solution Approach

Normalize each email

Iterate through each email string. Split it into local and domain parts. Remove all '.' from the local part and truncate at the first '+'. Recombine with the domain to produce the normalized email.

Use a hash set for uniqueness

Insert each normalized email into a hash set. This ensures duplicates are automatically ignored. At the end, the size of the hash set represents the number of unique emails.

Optimize scanning and string operations

Avoid unnecessary string concatenations by using string builders or slicing carefully. Process each character only once to keep the algorithm efficient for up to 100 emails of 100 characters each.

Complexity Analysis

Metric	Value
Time	Depends on the final approach
Space	Depends on the final approach

Time complexity is O(N * L) where N is the number of emails and L is the average length, due to scanning and normalizing each email. Space complexity is O(N * L) to store normalized emails in a hash set.

What Interviewers Usually Probe

Look for O(N) array scanning with string manipulation.
Expect correct handling of '+' and '.' in local names.
Check if duplicates are correctly ignored using a hash set.

Common Pitfalls or Variants

Common pitfalls

Forgetting to ignore characters after '+' in the local name.
Removing dots from domain names instead of only the local part.
Recomputing strings inefficiently leading to unnecessary time or memory use.

Follow-up variants

Return the list of normalized emails instead of just counting.
Handle uppercase letters by converting all to lowercase before normalization.
Allow arbitrary domain suffixes, not just '.com', while applying the same local name rules.

FAQ

How does the plus '+' affect the Unique Email Addresses problem?

In the local name, any character after the first '+' is ignored. This ensures filtered emails count as duplicates if their normalized local names match.

Should dots '.' in domain names be removed?

No, dots are removed only in the local part. Domain names remain intact to distinguish unique addresses correctly.

What is the recommended data structure for counting unique emails?

A hash set is ideal, as it automatically ignores duplicates and provides O(1) insertion and lookup.

Can emails have multiple '@' characters?

No, each email must contain exactly one '@' separating local and domain names according to the problem constraints.

What pattern does this problem follow in GhostInterview?

This problem follows the array scanning plus hash lookup pattern, emphasizing string normalization and duplicate elimination.

terminal

Solution

Solution 1: Hash Table

We can use a hash table $s$ to store all unique email addresses. Then, we traverse the array $\textit{emails}$. For each email address, we split it into the local part and the domain part. We process the local part by removing all dots and ignoring characters after a plus sign. Finally, we concatenate the processed local part with the domain part and add it to the hash table $s$.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

class Solution:
    def numUniqueEmails(self, emails: List[str]) -> int:
        s = set()
        for email in emails:
            local, domain = email.split("@")
            t = []
            for c in local:
                if c == ".":
                    continue
                if c == "+":
                    break
                t.append(c)
            s.add("".join(t) + "@" + domain)
        return len(s)

Continue Practicing

#916 word subsets #953 verifying an alien dictionary #893 groups of special equivalent strings