The F1-score is a combination of the precision and recall, which is defined by calculating the harmonic mean of the two metrics. Its mathematical equation is described as follows:

\[\text{F1-score} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}\]

Tools and Libraries

To illustrate the classification metrics, the digits dataset provided by the scikit-learn library will be used. It is a dataset of hand-written digits which contains 1797 samples and each sample is an 8x8-image. The number of classes in this dataset is 10 (corresponding to the digits from 0 to 9). This dataset will be loaded and split in train and test test (20%) sets. As a classifier, a logistic regression model will be used and trained. After the training, the model’s performance will be evaluated.


To compute the F1-score for a classification problem, the following example can be used.

Install scikit-learn using this command:

pip install -U scikit-learn
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import f1_score

digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target, test_size=0.2, random_state=0)

clf = LogisticRegression(max_iter=10000)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

F1 = f1_score(y_test, y_pred, average='macro')
print(f"F1-score: {F1}")