Precision, Recall, and F1-score are three fairly well-known model evaluation indicators, which are mostly used for binary classification (if it is a multi-classification, it is suitable for macro and micro). The following is a brief description of these different indicators:
Actual Positive | Actual Negative | |
Prediction Positive | TP (True Positive) | FP (False Positive) |
Prediction Negative | FN (False Negative) | TN (True Negative) |
- Precision = TP / (TP + FP)
- Recall = TP / (TP + FN)
- F1-scroe = (2 x Precision x Recall) / (Precision + Recall)
The advantage of using multiple different indicators to evaluate the model is that, assuming that the training data we are training today is unbalanced, it is likely that our model will only guess the same label, this is of course undesirable.
By measuring between such different indicators, we can quickly see whether our model is universal.
If we have scikit-learn package in our environment (You can install it).
pip3 install scikit-learn
Then we can try to see the result of our classification effect.
import random from sklearn import metrics true = [random.randint(1, 2) for _ in range(10)] prediction = [1 for _ in range(9)] prediction.append(2) print('True:', true) print('Pred:', prediction) print('Precision:', metrics.precision_score(true, prediction)) print('Recall:', metrics.recall_score(true, prediction)) print('F1:', metrics.f1_score(true, prediction))
Output:
True: [1, 1, 2, 1, 2, 1, 1, 1, 1, 1]
Pred: [1, 1, 1, 1, 1, 1, 1, 1, 1, 2]
Precision: 0.7777777777777778
Recall: 0.875
F1: 0.823529411764706
Because I an randomly to generate true values, it is quite reasonable that your results are different from mine.
For Scikit-Learn related usage methods, please refer to here: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_recall_fscore_support.html