Precision and Recall

Description

While the Precision is the ratio of correctly predicted positive observations to the total predicted positives, Recall is the ratio of correctly predicted positive observations to all the actual positives. These two metrics can be defined as:

\[\text{Precision} = \frac{\text{True Positives}}{\text{True Positives + False Positives}}\]
\[\text{Recall} = \frac{\text{True Positives}}{\text{True Positives + False Negatives}}\]

Tools and Libraries

To illustrate the classification metrics, the digits dataset provided by the scikit-learn library will be used. It is a dataset of hand-written digits which contains 1797 samples and each sample is an 8x8-image. The number of classes in this dataset is 10 (corresponding to the digits from 0 to 9). This dataset will be loaded and split in train and test test (20%) sets. As a classifier, a logistic regression model will be used and trained. After the training, the model’s performance will be evaluated.

Python

To compute the precision and recall for a classification problem, the following example can be used.

Install scikit-learn using this command:

pip install -U scikit-learn
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import precision_score, recall_score

digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target, test_size=0.2, random_state=0)

clf = LogisticRegression(max_iter=10000)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

PRECISION = precision_score(y_test, y_pred, average='macro')
RECALL = recall_score(y_test, y_pred, average='macro')
print(f"Precision: {PRECISION}")
print(f"Recall: {RECALL}")

MATLAB

C++

Literature