Accuracy
Description
Data accuracy is a key characteristic of data quality. This metric refers to the fact that values for a specific object are mapped in the correct format and size. This metric is used to emphasise the correctness and consistency of data and to prevent ambiguity.
If one questions whether the recorded data reflects the real world. Here, accuracy can refer to decimal places of decimal numbers or even trivially to the incorrect spelling of names and addresses.
Accuracy can be determined using ground truth, an alternative data recording. Data accuracy can be described by the standard error.
Tools and Libraries
Python
Install numpy and scipy via command:
pip install numpy
pip install scipy
Using numpy to calculate standard error
# import numpy
import numpy as np
# define dataset
data = np.array(
[3, 4, 4, 5, 7, 8, 12, 14, 14, 15, 17, 19, 22, 24, 24, 24, 25, 28, 28, 29]
)
# calculate standard error of the mean
np.std(data, ddof=1) / np.sqrt(np.size(data))
Using scipy.stats to calculate standard error
# import scipy
from scipy.stats import sem
# define dataset
data = [3, 4, 4, 5, 7, 8, 12, 14, 14, 15, 17, 19, 22, 24, 24, 24, 25, 28, 28, 29]
# calculate standard error of the mean
sem(data)