All Downloads are FREE. Search and download functionalities are using the official Maven repository.

g2801_2900.s2882_drop_duplicate_rows.readme.md Maven / Gradle / Ivy

There is a newer version: 1.30
Show newest version
2882\. Drop Duplicate Rows

Easy

DataFrame customers 

    +-------------+--------+ 
    | Column Name | Type   | 
    +-------------+--------+ 
    | customer_id | int    | 
    | name        | object | 
    | email       | object | 
    +-------------+--------+

There are some duplicate rows in the DataFrame based on the `email` column.

Write a solution to remove these duplicate rows and keep only the **first** occurrence.

The result format is in the following example.

**Example 1:** 

**Input:** 

    +-------------+---------+---------------------+ 
    | customer_id | name    | email               | 
    +-------------+---------+---------------------+ 
    | 1           | Ella    | [email protected]   | 
    | 2           | David   | [email protected] | 
    | 3           | Zachary | [email protected]   | 
    | 4           | Alice   | [email protected]    | 
    | 5           | Finn    | [email protected]    | 
    | 6           | Violet  | [email protected]   | 
    +-------------+---------+---------------------+ 

**Output:** 

    +-------------+---------+---------------------+ 
    | customer_id | name    | email               | 
    +-------------+---------+---------------------+ 
    | 1           | Ella    | [email protected]   | 
    | 2           | David   | [email protected] | 
    | 3           | Zachary | [email protected]   | 
    | 4           | Alice   | [email protected]    | 
    | 6           | Violet  | [email protected]   | 
    +-------------+---------+---------------------+

**Explanation:** Alic (customer_id = 4) and Finn (customer_id = 5) both use [email protected], so only the first occurrence of this email is retained. 




© 2015 - 2024 Weber Informatics LLC | Privacy Policy