Skip to content

[Machine Learning] Introduction the indicators of the three evaluation models of Precision、Recall、F1-score

Precision, Recall, and F1-score are three fairly well-known model evaluation indicators, which are mostly used for binary classification (if it is a multi-classification, it is suitable for macro and micro). The following is a brief description of these different indicators:

Actual PositiveActual Negative
Prediction PositiveTP
(True Positive)
FP
(False Positive)
Prediction NegativeFN
(False Negative)
TN
(True Negative)
  • Precision = TP / (TP + FP)
  • Recall = TP / (TP + FN)
  • F1-scroe = (2 x Precision x Recall) / (Precision + Recall)


The advantage of using multiple different indicators to evaluate the model is that, assuming that the training data we are training today is unbalanced, it is likely that our model will only guess the same label, this is of course undesirable.

By measuring between such different indicators, we can quickly see whether our model is universal.

If we have scikit-learn package in our environment (You can install it).

pip3 install scikit-learn

Then we can try to see the result of our classification effect.

import random
from sklearn import metrics


true = [random.randint(1, 2) for _ in range(10)]
prediction = [1 for _ in range(9)]
prediction.append(2)

print('True:', true)
print('Pred:', prediction)

print('Precision:', metrics.precision_score(true, prediction))
print('Recall:', metrics.recall_score(true, prediction))
print('F1:', metrics.f1_score(true, prediction))



Output:

True: [1, 1, 2, 1, 2, 1, 1, 1, 1, 1]
Pred: [1, 1, 1, 1, 1, 1, 1, 1, 1, 2]
Precision: 0.7777777777777778
Recall: 0.875
F1: 0.823529411764706

Because I an randomly to generate true values, it is quite reasonable that your results are different from mine.

For Scikit-Learn related usage methods, please refer to here: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_recall_fscore_support.html

Leave a Reply