LeetCode 题解工作台

删去重复的行

DataFrame customers +-------------+--------+ | Column Name | Type | +-------------+--------+ | customer_id | int | | name | object | | email | object …

category

0

题型

code_blocks

1

代码语言

hub

0

相关题

当前训练重点

简单 · Drop Duplicate Rows core interview pattern

bolt

答案摘要

import pandas as pd def dropDuplicateEmails(customers: pd.DataFrame) -> pd.DataFrame:

Interview AiBox logo

Interview AiBox 实时 AI 助手,陪你讲清 Drop Duplicate Rows core interview pattern 题型思路

试试 AiBox 面试助手arrow_forward
description

题目描述

DataFrame customers
+-------------+--------+
| Column Name | Type   |
+-------------+--------+
| customer_id | int    |
| name        | object |
| email       | object |
+-------------+--------+

在 DataFrame 中基于 email 列存在一些重复行。

编写一个解决方案,删除这些重复行,仅保留第一次出现的行。

返回结果格式如下例所示。

 

示例 1:

输入:
+-------------+---------+---------------------+
| customer_id | name    | email               |
+-------------+---------+---------------------+
| 1           | Ella    | emily@example.com   |
| 2           | David   | michael@example.com |
| 3           | Zachary | sarah@example.com   |
| 4           | Alice   | john@example.com    |
| 5           | Finn    | john@example.com    |
| 6           | Violet  | alice@example.com   |
+-------------+---------+---------------------+
输出:
+-------------+---------+---------------------+
| customer_id | name    | email               |
+-------------+---------+---------------------+
| 1           | Ella    | emily@example.com   |
| 2           | David   | michael@example.com |
| 3           | Zachary | sarah@example.com   |
| 4           | Alice   | john@example.com    |
| 6           | Violet  | alice@example.com   |
+-------------+---------+---------------------+
解释:
Alice (customer_id = 4) 和 Finn (customer_id = 5) 都使用 john@example.com,因此只保留该邮箱地址的第一次出现。
lightbulb

解题思路

方法一

1
2
3
4
5
6
import pandas as pd


def dropDuplicateEmails(customers: pd.DataFrame) -> pd.DataFrame:
    return customers.drop_duplicates(subset=['email'])
speed

复杂度分析

指标
时间Depends on the final approach
空间Depends on the final approach
psychology

面试官常问的追问

外企场景
  • question_mark

    The candidate demonstrates proficiency with pandas functions.

  • question_mark

    They understand how to manipulate dataframes effectively for cleaning tasks.

  • question_mark

    The candidate can optimize solutions based on the size of the dataset.

warning

常见陷阱

外企场景
  • error

    Forgetting to specify the `subset` argument in `drop_duplicates()` may result in duplicates being removed from all columns, not just the email column.

  • error

    Sorting the DataFrame incorrectly before dropping duplicates could cause the wrong rows to be kept.

  • error

    Using inefficient methods like manual iteration for large datasets may lead to performance issues.

swap_horiz

进阶变体

外企场景
  • arrow_right_alt

    Consider cases where additional columns besides email need to be unique.

  • arrow_right_alt

    Handle scenarios where the DataFrame has missing or null values in the email column.

  • arrow_right_alt

    Optimize for cases where the dataset is very large, ensuring the solution scales efficiently.

help

常见问题

外企场景

删去重复的行题解:Drop Duplicate Rows cor… | LeetCode #2882 简单