Freedom from redundancy
Description
It is essential to identify duplicate data, which can be extremely difficult. With numerical measurement data, it is almost impossible to identify duplicate numbers. Therefore, it is better to compare complete data series and decide individually if it is a duplicate recording.
Tools and Libraries
Install pandas
pip install pandas
Python
The pandas.DataFrame.duplicated() method is used to find duplicate rows in a DataFrame. It returns a boolean series which identifies whether a row is duplicate or unique.
#import pandas
import pandas as pd
data_df = {
"Name": ["Arpit", "Riya", "Priyanka", "Aman", "Arpit", "Rohan", "Riya", "Sakshi"],
"Employment Type": [
"Full-time Employee",
"Part-time Employee",
"Intern",
"Intern",
"Full-time Employee",
"Part-time Employee",
"Part-time Employee",
"Full-time Employee",
],
"Department": [
"Administration",
"Marketing",
"Technical",
"Marketing",
"Administration",
"Technical",
"Marketing",
"Administration",
],
}
df = pd.DataFrame(data_df)
# Use the DataFrame.duplicated() method to return a series of boolean values
bool_series = df.duplicated()