Uniformity
Description
The information of a data set must be structured in a uniform way. Data of the same type should also have the same dimensions.
Uniformity is therefore specific to metrics and units of measurement, and is particularly important when data comes from different sources.
In the following examples, data will be compared for uniformity and if necessary recalculated to fit uniformity.
Tools and Libraries
Install pandas
pip install pandas
Python
Load the needed dataset as showed in the following code snippet:
# import pandas
import pandas as pd
# load dataset
df = pd.read_csv(r"C:\Users\Datasets\basketballteam.csv", delimiter=",")
In [1]: print(df.head())
Out[2]: Name Height Handedness
0 Jon 6'5" Right
1 Rob 6'7.5" Left
2 Sharon 6'3" Right
3 Alex 6'2" Right
4 Rebecca 7' Right
As you can see, the height column contains impirical dimensions. For further processing or comparison with metric data, it is necessary to convert from feet and inches to metres and centimetres.
The following function can be used to convert the data from inches to metres.
# import re
import re
import pandas as pd
r = re.compile(r"(?:^(?:(\d+)')?(?:[-| ]*)(?:(\d*(?: ?\d+\/\d+)?|(?:\d*\.\d+)?)?\")?$)")
def get_inches(height):
"""Calculates the number of inches in a given impirical height.
Args:
height (string): hieght consisting of feet and inches as string
Returns:
float: Number of inches
"""
m = r.match(height)
print(int(m.group(1)))
if not m.groups()[1]:
return float('NaN') if m is None else int(m.group(1))*30.48
else:
return float('NaN') if m is None else float(m.group(1))*30.48 + float(m.group(2))*2.54
# load dataset
df = pd.read_csv(r"C:\Users\Datasets\basketballteam.csv", delimiter=",")
df["height_new"] = df.apply(lambda row: get_inches(row["Height"]), axis=1)
print(df)
Now you can apply this function to the whole column and recalculate to meters
In [1]: df["height_new"] = df.apply(lambda row: get_inches(row["Height"]), axis=1)
Out[2]: Name Height Handedness height_new
0 Jon 6'5" Right 195.58
1 Rob 6'7.5" Left 201.93
2 Sharon 6'3" Right 190.50
3 Alex 6'2" Right 187.96
4 Rebecca 7' Right 213.36
5 Ariane 5'8" Left 172.72
6 Bryon 7' Right 213.36
7 Brett 6' Right 182.88
8 Matt 5'5" Right 165.10
After applying the function, all body sizes were converted from imperial system to metric system.