LeetCode 题解工作台
删去重复的行
DataFrame customers +-------------+--------+ | Column Name | Type | +-------------+--------+ | customer_id | int | | name | object | | email | object …
0
题型
1
代码语言
0
相关题
当前训练重点
简单 · Drop Duplicate Rows core interview pattern
答案摘要
import pandas as pd def dropDuplicateEmails(customers: pd.DataFrame) -> pd.DataFrame:
Interview AiBoxInterview AiBox 实时 AI 助手,陪你讲清 Drop Duplicate Rows core interview pattern 题型思路
题目描述
DataFrame customers +-------------+--------+ | Column Name | Type | +-------------+--------+ | customer_id | int | | name | object | | email | object | +-------------+--------+
在 DataFrame 中基于 email 列存在一些重复行。
编写一个解决方案,删除这些重复行,仅保留第一次出现的行。
返回结果格式如下例所示。
示例 1:
输入: +-------------+---------+---------------------+ | customer_id | name | email | +-------------+---------+---------------------+ | 1 | Ella | emily@example.com | | 2 | David | michael@example.com | | 3 | Zachary | sarah@example.com | | 4 | Alice | john@example.com | | 5 | Finn | john@example.com | | 6 | Violet | alice@example.com | +-------------+---------+---------------------+ 输出: +-------------+---------+---------------------+ | customer_id | name | email | +-------------+---------+---------------------+ | 1 | Ella | emily@example.com | | 2 | David | michael@example.com | | 3 | Zachary | sarah@example.com | | 4 | Alice | john@example.com | | 6 | Violet | alice@example.com | +-------------+---------+---------------------+ 解释: Alice (customer_id = 4) 和 Finn (customer_id = 5) 都使用 john@example.com,因此只保留该邮箱地址的第一次出现。
解题思路
方法一
import pandas as pd
def dropDuplicateEmails(customers: pd.DataFrame) -> pd.DataFrame:
return customers.drop_duplicates(subset=['email'])
复杂度分析
| 指标 | 值 |
|---|---|
| 时间 | Depends on the final approach |
| 空间 | Depends on the final approach |
面试官常问的追问
外企场景- question_mark
The candidate demonstrates proficiency with pandas functions.
- question_mark
They understand how to manipulate dataframes effectively for cleaning tasks.
- question_mark
The candidate can optimize solutions based on the size of the dataset.
常见陷阱
外企场景- error
Forgetting to specify the `subset` argument in `drop_duplicates()` may result in duplicates being removed from all columns, not just the email column.
- error
Sorting the DataFrame incorrectly before dropping duplicates could cause the wrong rows to be kept.
- error
Using inefficient methods like manual iteration for large datasets may lead to performance issues.
进阶变体
外企场景- arrow_right_alt
Consider cases where additional columns besides email need to be unique.
- arrow_right_alt
Handle scenarios where the DataFrame has missing or null values in the email column.
- arrow_right_alt
Optimize for cases where the dataset is very large, ensuring the solution scales efficiently.