Data accuracy is a key characteristic of data quality. This metric refers to the fact that values for a specific object are mapped in the correct format and size. This metric is used to emphasise the correctness and consistency of data and to prevent ambiguity.
If one questions whether the recorded data reflects the real world. Here, accuracy can refer to decimal places of decimal numbers or even trivially to the incorrect spelling of names and addresses.
Accuracy can be determined using ground truth, an alternative data recording. Data accuracy can be described by the standard error.
Tools and Libraries
Install numpy and scipy via command:
pip install numpy pip install scipy
Using numpy to calculate standard error
# import numpy import numpy as np # define dataset data = np.array( [3, 4, 4, 5, 7, 8, 12, 14, 14, 15, 17, 19, 22, 24, 24, 24, 25, 28, 28, 29] ) # calculate standard error of the mean np.std(data, ddof=1) / np.sqrt(np.size(data))
Using scipy.stats to calculate standard error
# import scipy from scipy.stats import sem # define dataset data = [3, 4, 4, 5, 7, 8, 12, 14, 14, 15, 17, 19, 22, 24, 24, 24, 25, 28, 28, 29] # calculate standard error of the mean sem(data)