Object Detection Evaluation#

This notebook aims at showing what kind of graph you can draw thank’s to Lours evaluator

[1]:

%load_ext autoreload

%autoreload 2
import warnings

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

from lours.dataset import from_coco
from lours.evaluation.detection import DetectionEvaluator as de
from lours.evaluation.detection.util import display_confusion_matrix
from lours.utils.grouper import ContinuousGroup

warnings.simplefilter(action="ignore", category=FutureWarning)

Loading the dataset and the predictions#

Note that they are both treated as datasets at first, and only when creating the eval object we have a detection evaluator

As a second Note, you can add several prediction datasets at the same time

[2]:

coco_eval = from_coco("notebook_data/coco_valid.json").remap_from_preset(
    "coco", "supercategory"
)
coco_darknet = from_coco(
    "notebook_data/yolov4_prediction_coco_eval.json"
).remap_from_preset("coco", "supercategory")
evaluator = de(
    groundtruth=coco_eval, predictions=coco_darknet, predictions2=coco_darknet
)

[3]:

evaluator

Compute the matches#

This is arguably the slowest part.

Hopefully, we can multiprocess it in the future

You can compute them by taking category into account or not.

The category agnostic is useful for e.g. computing confusion matrices
The category specific is useful for e.g. computing precision-recall curves

[4]:

matches = evaluator.compute_matches("predictions", category_agnostic=True)
display(matches["predictions"])
matches = evaluator.compute_matches("predictions", category_agnostic=False)
display(matches["predictions"])

computing matches between groundtruth and predictions (category agnostic)

	prediction_id	iou	groundtruth_id
0	48019	0.954652	34646
1	48033	0.912193	104368
2	48034	0.939746	103487
3	48042	0.895466	230831
4	48020	0.922641	35802
...	...	...	...
60	17979	0.000000	<NA>
61	17980	0.000000	<NA>
62	17981	0.000000	<NA>
63	17982	0.000000	<NA>
64	17983	0.000000	<NA>

85388 rows × 3 columns

computing matches between groundtruth and predictions (category specific)

	prediction_id	iou	groundtruth_id
0	48042	0.895466	230831
1	48030	0.840631	233201
0	47998	0.000000	<NA>
1	48009	0.000000	<NA>
2	48008	0.000000	<NA>
...	...	...	...
60	17979	0.000000	<NA>
61	17980	0.000000	<NA>
62	17981	0.000000	<NA>
63	17982	0.000000	<NA>
64	17983	0.000000	<NA>

86339 rows × 3 columns

See how two new tabs have been added to the dataset widget

[5]:

evaluator

Here, we just plot the IOU distribution. As you can see more than half the detections have a IoU of 0. These predictions typically have a very low confidence as well, which means they will be easily filtered and won’t have a great influence on evaluation.

[6]:

plt.plot(
    evaluator.matches["category_specific"]["predictions"]["iou"].sort_values().values
)

[6]:

[<matplotlib.lines.Line2D at 0x10387e780>]

../_images/notebooks_3_demo_evaluation_detection_10_1.png

Computing confusion matrix#

The confusion matrix can be computed for all matches or by groups if the argument groups is defined.

The values are normalized over the groundtruth.

Notes:

The class None corresponds to the False Positive and False Negative.
The model indicated the name of given predictions. Here, we get the data for confusion for predictions named predictions and predictions2 (which are the same, for the sake of the example)
Since matches have already been computed for predictions we only have to compute them for predictions2

[7]:

confusion_data = evaluator.compute_confusion_matrix()
confusion_data

computing matches between groundtruth and predictions2 (category agnostic)

Processing confusion matrix for model=predictions
Processing confusion matrix for model=predictions2

[7]:

	accessory	animal	appliance	electronic	food	furniture	indoor	kitchen	outdoor	person	sports	vehicle	None	model
label
accessory	0.798511	0.002127	0.000532	0.001063	0.002658	0.010101	0.003721	0.003190	0.007443	0.114833	0.005316	0.018075	0.032430	predictions
animal	0.000741	0.938148	0.000370	0.002222	0.000000	0.001852	0.001481	0.000741	0.001481	0.011852	0.001111	0.003333	0.036667	predictions
appliance	0.000000	0.000000	0.862007	0.007168	0.010753	0.026882	0.001792	0.055556	0.000000	0.010753	0.000000	0.001792	0.023297	predictions
electronic	0.005291	0.002268	0.001512	0.876039	0.002268	0.024943	0.012850	0.008314	0.001512	0.020408	0.000000	0.006047	0.038549	predictions
food	0.000000	0.000000	0.004233	0.000000	0.901587	0.029277	0.001411	0.029982	0.000000	0.004586	0.000353	0.001764	0.026808	predictions
furniture	0.008154	0.002912	0.003203	0.007863	0.018637	0.803727	0.015143	0.034362	0.008154	0.055329	0.001165	0.003494	0.037857	predictions
indoor	0.005500	0.002000	0.002500	0.010000	0.002000	0.033500	0.869500	0.014000	0.001500	0.011000	0.000500	0.002500	0.045500	predictions
kitchen	0.002441	0.001627	0.010035	0.002983	0.021969	0.034445	0.009493	0.862219	0.001899	0.009493	0.001356	0.001627	0.040412	predictions
outdoor	0.008554	0.001555	0.000000	0.001555	0.000000	0.020995	0.001555	0.000778	0.797045	0.048212	0.002333	0.017107	0.100311	predictions
person	0.009269	0.002181	0.000273	0.001363	0.001727	0.009724	0.001091	0.002635	0.001272	0.927208	0.004271	0.013086	0.025900	predictions
sports	0.002511	0.001005	0.000000	0.000502	0.000000	0.004520	0.000502	0.000502	0.001507	0.037167	0.898041	0.006027	0.047715	predictions
vehicle	0.004900	0.001960	0.000000	0.000245	0.000000	0.002205	0.000735	0.000980	0.003675	0.033072	0.001225	0.914748	0.036257	predictions
None	0.075421	0.048265	0.012344	0.025860	0.107330	0.114963	0.091818	0.108297	0.035406	0.230337	0.039706	0.110252	0.000000	predictions
accessory	0.798511	0.002127	0.000532	0.001063	0.002658	0.010101	0.003721	0.003190	0.007443	0.114833	0.005316	0.018075	0.032430	predictions2
animal	0.000741	0.938148	0.000370	0.002222	0.000000	0.001852	0.001481	0.000741	0.001481	0.011852	0.001111	0.003333	0.036667	predictions2
appliance	0.000000	0.000000	0.862007	0.007168	0.010753	0.026882	0.001792	0.055556	0.000000	0.010753	0.000000	0.001792	0.023297	predictions2
electronic	0.005291	0.002268	0.001512	0.876039	0.002268	0.024943	0.012850	0.008314	0.001512	0.020408	0.000000	0.006047	0.038549	predictions2
food	0.000000	0.000000	0.004233	0.000000	0.901587	0.029277	0.001411	0.029982	0.000000	0.004586	0.000353	0.001764	0.026808	predictions2
furniture	0.008154	0.002912	0.003203	0.007863	0.018637	0.803727	0.015143	0.034362	0.008154	0.055329	0.001165	0.003494	0.037857	predictions2
indoor	0.005500	0.002000	0.002500	0.010000	0.002000	0.033500	0.869500	0.014000	0.001500	0.011000	0.000500	0.002500	0.045500	predictions2
kitchen	0.002441	0.001627	0.010035	0.002983	0.021969	0.034445	0.009493	0.862219	0.001899	0.009493	0.001356	0.001627	0.040412	predictions2
outdoor	0.008554	0.001555	0.000000	0.001555	0.000000	0.020995	0.001555	0.000778	0.797045	0.048212	0.002333	0.017107	0.100311	predictions2
person	0.009269	0.002181	0.000273	0.001363	0.001727	0.009724	0.001091	0.002635	0.001272	0.927208	0.004271	0.013086	0.025900	predictions2
sports	0.002511	0.001005	0.000000	0.000502	0.000000	0.004520	0.000502	0.000502	0.001507	0.037167	0.898041	0.006027	0.047715	predictions2
vehicle	0.004900	0.001960	0.000000	0.000245	0.000000	0.002205	0.000735	0.000980	0.003675	0.033072	0.001225	0.914748	0.036257	predictions2
None	0.075421	0.048265	0.012344	0.025860	0.107330	0.114963	0.091818	0.108297	0.035406	0.230337	0.039706	0.110252	0.000000	predictions2

Display confusion matrix for prediction dataframe named “predictions”#

[8]:

display_confusion_matrix(
    confusion_data.loc[confusion_data["model"] == "predictions"].drop(columns="model"),
    title="All data",
)

../_images/notebooks_3_demo_evaluation_detection_14_0.png

Display confusion matrix for a specific group of prediction dataframe named “predictions”#

Here, we divide the evaluation dataset in 3 groups of equal size based on box_height

[9]:

box_height_group = ContinuousGroup(name="box_height", bins=3, qcut=True)
confusion_data = evaluator.compute_confusion_matrix(
    "predictions", groups=[box_height_group]
)
for (range_data, data), name in zip(
    confusion_data.groupby("box_height"), ["small", "medium", "big"]
):
    display_confusion_matrix(
        data.drop("model", axis=1),
        title=(
            f"Confusion for {name} bounding boxes ({range_data.left:.1f}px to"
            f" {range_data.right:.1f}px)"
        ),
    )

Processing confusion matrix for model=predictions

../_images/notebooks_3_demo_evaluation_detection_16_1.png

../_images/notebooks_3_demo_evaluation_detection_16_2.png

../_images/notebooks_3_demo_evaluation_detection_16_3.png

Computing AP + Yolov5 metrics#

Here, we follow usual convention, by computing Average precision per class and per iou threshold.

The we get the AP per category, the AP@0.5:0.95 per class and finally the mAP and the mAP@0.5:0.95

see original code for yolov5 (if you dare) here : ultralytics/yolov5

Namely, In addition to AP and mAP, we want the precision@0.5 at best F1 score averaged over categories, and the recall@0.5 at best F1 score averaged over categories

Notice, how we use the “index column” and “index_values” argument, to enforce that every category has the same confidence_threshold coordinates, i.e. 100 evenly spaced points between 0 and 1

ious are the different minimum iou values to consider a detection valid
index_column is the name of the value we want to use as index. This will force all values in the PR curve to be aligned. If not set, the resulting PR dataframe will no longer have aligned values, only where it actually changes, which depends on the category. This value can be recall, precision or confidence_threshold.
index_values are the values we want the curves to be aligned on. Typically, a set of increasing values between 0 and 1

[10]:

pr, ap = evaluator.compute_precision_recall(
    predictions_names="predictions",
    ious=np.linspace(0.5, 0.95, 10).round(3),
    index_column=None,
)

print(f"mAP@0.5 = {ap[ap['iou_threshold'] == 0.5]['AP'].mean()}")
print(f"mAP@0.5:0.95 = {ap['AP'].mean()}")

pr50, ap50 = evaluator.compute_precision_recall(
    predictions_names="predictions",
    ious=0.5,
    index_column="confidence_threshold",
    index_values=np.linspace(0, 1, 101),
    f_scores_betas=(0.5, 1, 2),
)


# Note that next line would be invalid if we did not force the data points
# to be aligned on the same confidence thresholds
mean_f1 = pr50.groupby("confidence_threshold").mean(numeric_only=True)
best_mean_f1_score = mean_f1.loc[mean_f1["f1_score"].idxmax()]
print("F1 scores averaged over classes")
print(f"best F1 = {best_mean_f1_score['f1_score']}")
print(f"precision @ best F1 = {best_mean_f1_score['precision']}")
print(f"recall @ best F1 = {best_mean_f1_score['recall']}")

Processing PR curves for 10 IoU values and 1 prediction set

Processing PR curve for model=predictions and IOU=0.5

Processing PR curve for model=predictions and IOU=0.55

Processing PR curve for model=predictions and IOU=0.6

Processing PR curve for model=predictions and IOU=0.65

Processing PR curve for model=predictions and IOU=0.7

Processing PR curve for model=predictions and IOU=0.75

Processing PR curve for model=predictions and IOU=0.8

Processing PR curve for model=predictions and IOU=0.85

Processing PR curve for model=predictions and IOU=0.9

Processing PR curve for model=predictions and IOU=0.95

mAP@0.5 = 0.6498968780927252
mAP@0.5:0.95 = 0.42891943233976043
Processing PR curves for 1 IoU value and 1 prediction set

Processing PR curve for model=predictions and IOU=0.5

F1 scores averaged over classes
best F1 = 0.6796615734692385
precision @ best F1 = 0.7775421755626403
recall @ best F1 = 0.6053440216989199

Detailed view of Average Precision

[11]:

display(ap)
ap_consolidated = pd.pivot_table(
    ap, values=["AP"], index="category_id", columns="iou_threshold"
)
ap_consolidated["mean"] = ap_consolidated["AP"].mean(axis=1)
ap_consolidated

	category_id	iou_threshold	model	AP	category_str
0	3	0.50	predictions	0.597134	outdoor
1	6	0.50	predictions	0.721360	sports
2	10	0.50	predictions	0.764937	electronic
3	1	0.50	predictions	0.753037	person
4	2	0.50	predictions	0.705385	vehicle
...	...	...	...	...	...
115	7	0.95	predictions	0.007749	kitchen
116	8	0.95	predictions	0.008157	food
117	5	0.95	predictions	0.004675	accessory
118	9	0.95	predictions	0.010017	furniture
119	12	0.95	predictions	0.005585	indoor

120 rows × 5 columns

[11]:

	AP										mean
iou_threshold	0.5	0.55	0.6	0.65	0.7	0.75	0.8	0.85	0.9	0.95
category_id
1	0.753037	0.733141	0.706142	0.675050	0.633008	0.572555	0.480146	0.347291	0.166231	0.016167	0.508277
2	0.705385	0.684499	0.654428	0.617852	0.568129	0.509032	0.430359	0.300238	0.145796	0.016360	0.463208
3	0.597134	0.573298	0.542712	0.505465	0.460946	0.402181	0.321357	0.227830	0.112498	0.014179	0.375760
4	0.809120	0.792082	0.771095	0.752684	0.710266	0.661858	0.587514	0.472446	0.279467	0.028271	0.586480
5	0.545883	0.522724	0.496112	0.451250	0.401432	0.351153	0.273807	0.163924	0.058987	0.004675	0.326995
6	0.721360	0.698412	0.675466	0.631736	0.585244	0.511800	0.397510	0.275971	0.114413	0.008772	0.462068
7	0.633795	0.608658	0.589705	0.553673	0.513385	0.459115	0.378472	0.265622	0.115670	0.007749	0.412584
8	0.534568	0.515136	0.497975	0.475205	0.443999	0.404173	0.339061	0.243377	0.119867	0.008157	0.358152
9	0.559138	0.535121	0.509976	0.478727	0.438717	0.381115	0.315271	0.215113	0.094218	0.010017	0.353741
10	0.764937	0.741044	0.722345	0.699577	0.662266	0.607944	0.524356	0.387244	0.176262	0.012471	0.529845
11	0.708608	0.685999	0.669677	0.637319	0.611345	0.545455	0.458379	0.354014	0.180022	0.020759	0.487158
12	0.465797	0.441133	0.409100	0.377720	0.347016	0.307648	0.242834	0.159999	0.070817	0.005585	0.282765

[12]:

mAP = ap_consolidated.mean(axis=0)
mAP

[12]:

      iou_threshold
AP    0.5              0.649897
      0.55             0.627604
      0.6              0.603728
      0.65             0.571355
      0.7              0.531313
      0.75             0.476169
      0.8              0.395755
      0.85             0.284422
      0.9              0.136187
      0.95             0.012763
mean                   0.428919
dtype: float64

mAP@0.5:0.95 is thus equal to \(0.510150\)

Showing Curves#

Now we can show the PR curve to have a look at the precision vs recall for a particular class and different IOU values. Here is an example with class 2 (persons)

First, we plot the different PR curves for different IOU threshold values,

and then we plot the f1 score vs confidence_threshold.

Finally, for an IoU threshold of 0.5, we plot recall, precision and F1_score vs confidence threshold

Recall vs Precision vs IoU threshold#

[13]:

pr_persons = pr[pr["category_id"] == 2]
sns.relplot(
    data=pr_persons,
    x="recall",
    y="precision",
    hue="iou_threshold",
    kind="line",
    estimator=None,
    sort=False,
)
plt.show()

../_images/notebooks_3_demo_evaluation_detection_23_0.png

F1 score vs confidence_threshold vs IoU threshold#

Notice how the optimal confidence threshold is lower with the IoU

[14]:

sns.relplot(
    data=pr_persons,
    x="confidence_threshold",
    y="f1_score",
    hue="iou_threshold",
    kind="line",
    estimator=None,
    sort=False,
)
plt.show()

../_images/notebooks_3_demo_evaluation_detection_25_0.png

Precision, recall, \(F_\beta\) score @0.5 vs confidence threshold for persons#

Here, we graph recall, precision and F0.5, F1, and F2 with respect to confidence_threshold, for an IoU threshold of 0.5

In addition, we annotate the confidence values where the F05, F1 and F2 scores are the highest, to show how each score weights precision and recall.

Note that we don’t use seaborn for this plot

Side note, We can very clearly see that this set of predictions was cut off at a confidence threshold of 0.05

We could lower that threshold, but it would dramatically increase the number of predictions without adding much information to the plot.

[15]:

to_plot = pr50[pr50["category_id"] == 2].set_index("confidence_threshold")

f_scores = to_plot[["f1_score", "f0.5_score", "f2_score"]]
best_confidences = f_scores.idxmax()

fig, ax = plt.subplots()
to_plot[["precision", "recall"]].plot(ax=ax)
to_plot[["f1_score", "f0.5_score", "f2_score"]].plot(
    style=["r--", "b--", "g--"], ax=ax, linewidth=0.5
)
plt.scatter(f_scores.idxmax(), f_scores.max(), marker="+")
for x, y in zip(f_scores.idxmax(), f_scores.max()):
    ax.annotate(
        f"{x:.2f}",
        [x + 0.01, y + 0.01],
    )
plt.show()

../_images/notebooks_3_demo_evaluation_detection_27_0.png

Computing grouped pr and ap curves#

Now is time to make things more interesting

box_group is how we want to split the data. Most usual group is category_id, but here we add the box_height group with 10 bins. Be careful, the more groups you add, the more granular your curves become but the less data you have for each.
image_group is not used here but could be used the same as box_groups with e.g. weather condition or focal length

Notice we don’t use index alignment anymore

[16]:

from lours.utils.grouper import ContinuousGroup

box_height_group = ContinuousGroup(name="box_height", bins=10, qcut=True)
pr, ap = evaluator.compute_precision_recall(
    predictions_names="predictions",
    ious=(0.3, 0.5, 0.7, 0.9),
    groups=["category_id", box_height_group],
    index_column=None,
)

Processing PR curves for 4 IoU values and 1 prediction set

Processing PR curve for model=predictions and IOU=0.3

Processing PR curve for model=predictions and IOU=0.5

Processing PR curve for model=predictions and IOU=0.7

Processing PR curve for model=predictions and IOU=0.9

Exploring the `pr` and `ap` DataFrames#

Each given group in the former function call will have its dedicated column

[17]:

ap[(ap["iou_threshold"] == 0.5) & (ap["category_id"] == 1)].sort_values(
    by="AP"
).reset_index()

[17]:

	index	category_id	box_height	iou_threshold	model	AP	category_str
0	224	1	(0.859, 12.196]	0.5	predictions	0.185249	person
1	216	1	(12.196, 18.645]	0.5	predictions	0.391076	person
2	208	1	(18.645, 25.596]	0.5	predictions	0.509320	person
3	203	1	(25.596, 33.533]	0.5	predictions	0.565361	person
4	183	1	(33.533, 43.947]	0.5	predictions	0.661096	person
5	170	1	(43.947, 59.039]	0.5	predictions	0.733885	person
6	134	1	(59.039, 83.073]	0.5	predictions	0.776807	person
7	124	1	(83.073, 124.26]	0.5	predictions	0.826248	person
8	133	1	(124.26, 209.234]	0.5	predictions	0.855516	person
9	128	1	(209.234, 773.969]	0.5	predictions	0.911366	person

[18]:

pr[
    (pr["category_id"] == 2)
    & (pr["iou_threshold"] == 0.5)
    & (pr["box_height"].apply(lambda x: x.left) == 12.196)
]

[18]:

	category_id	box_height	precision	recall	confidence_threshold	f1_score	iou_threshold	model	category_str
25369	2	(12.196, 18.645]	1.000000	0.000000	1.000000	0.000000	0.5	predictions	vehicle
25370	2	(12.196, 18.645]	1.000000	0.154489	0.830277	0.267629	0.5	predictions	vehicle
25371	2	(12.196, 18.645]	0.988764	0.154489	0.830062	0.267222	0.5	predictions	vehicle
25372	2	(12.196, 18.645]	0.988764	0.183716	0.798789	0.309857	0.5	predictions	vehicle
25373	2	(12.196, 18.645]	0.979592	0.183716	0.797168	0.309403	0.5	predictions	vehicle
...	...	...	...	...	...	...	...	...	...
25604	2	(12.196, 18.645]	0.250206	0.634656	0.052245	0.358910	0.5	predictions	vehicle
25605	2	(12.196, 18.645]	0.248980	0.634656	0.052017	0.357646	0.5	predictions	vehicle
25606	2	(12.196, 18.645]	0.248980	0.636743	0.051700	0.357977	0.5	predictions	vehicle
25607	2	(12.196, 18.645]	0.000000	0.636743	0.000000	0.000000	0.5	predictions	vehicle
25608	2	(12.196, 18.645]	0.000000	1.000000	0.000000	0.000000	0.5	predictions	vehicle

240 rows × 9 columns

Plotting Precision - Recall curves#

Here we used a filtered dataframe with only the 41 category and the easiest iou_threshold (0.5) notice the parameters estimator=None and sort=False to be able to plot vertical lines

[19]:

sns.relplot(
    data=pr[(pr["category_id"] == 1) & (pr["iou_threshold"] == 0.5)],
    x="recall",
    y="precision",
    hue="box_height",
    kind="line",
    estimator=None,
    sort=False,
)
plt.show()

../_images/notebooks_3_demo_evaluation_detection_34_0.png

Here is a more complicated example for Persons (class id = 1, the most represented class, by far)

colors and line styles can help you understand strengths and weakness of the network

[20]:

sns.relplot(
    data=pr[(pr["category_id"] == 1)],
    x="recall",
    y="precision",
    hue="box_height",
    style="iou_threshold",
    kind="line",
    estimator=None,
    sort=False,
)
plt.show()

../_images/notebooks_3_demo_evaluation_detection_36_0.png

Getting Average Precision wrt to other parameters#

Usually, mean AP is just a single number giving you a general idea of the network quality.

Here, we try to have a better understanding of the influence of some parameters.

Namely here, we want to know if the network is better with small or large targets.

Seaborn can let us visualise several dimensions at the same time like in the following graph

[21]:

data = ap.copy()
data["box_mean_height"] = data["box_height"].apply(lambda x: x.mid)
data["category_str"] = data["category_id"].replace(evaluator.label_map)
display(data)
g = sns.relplot(
    data=data, x="box_mean_height", y="AP", kind="line", hue="iou_threshold"
)
g.set(xscale="log")
plt.show()

	category_id	box_height	iou_threshold	model	AP	category_str	box_mean_height
0	3	(124.26, 209.234]	0.3	predictions	0.799908	outdoor	166.7470
1	6	(124.26, 209.234]	0.3	predictions	0.874707	sports	166.7470
2	10	(83.073, 124.26]	0.3	predictions	0.897088	electronic	103.6665
3	10	(124.26, 209.234]	0.3	predictions	0.915319	electronic	166.7470
4	1	(83.073, 124.26]	0.3	predictions	0.863140	person	103.6665
...	...	...	...	...	...	...	...
475	3	(0.859, 12.196]	0.9	predictions	0.000000	outdoor	6.5275
476	7	(0.859, 12.196]	0.9	predictions	0.004310	kitchen	6.5275
477	9	(0.859, 12.196]	0.9	predictions	0.004237	furniture	6.5275
478	11	(12.196, 18.645]	0.9	predictions	0.017868	appliance	15.4205
479	11	(0.859, 12.196]	0.9	predictions	0.000000	appliance	6.5275

480 rows × 7 columns

../_images/notebooks_3_demo_evaluation_detection_38_1.png

Former plot would present mean AP across all categories.

The next (very large !) grid will let you see AP vs box height for each class.

[22]:

g = sns.relplot(
    data=data[data["iou_threshold"] == 0.5],
    x="box_mean_height",
    y="AP",
    col="category_str",
    col_wrap=4,
    kind="line",
)
g.set(xscale="log")
for axis in g.axes.flat:
    axis.tick_params(labelbottom=True)
plt.subplots_adjust(hspace=0.15)
plt.show()

../_images/notebooks_3_demo_evaluation_detection_40_0.png

Dealing with more absolute metrics : target precision#

The next usecase aims at being closer to real life metrics than AP.

In real world, AP is not that interesting because you ultimately have to choose a confidence threshold and thus a single point in the Precision/Recall curve. You will then have to make compromises between precision and recall.

Here we are interested in a target precision. Given a wanted precision (because I want to minimize the fals positive) what Recall can I hope for ? Of course this problem can easily be transposed with a target recall and the corresponding precisions

Next graphs shows an example where we want a precision of 60%. The recall values are where the different curves cross the horizontal line of value 0.6

[23]:

persons = pr[(pr["category_id"] == 1) & (pr["iou_threshold"] == 0.5)]
plt.figure(figsize=(7, 7))
precision = plt.plot([0, 1], [0.6, 0.6], label="precision @0.6", linestyle="--")
pl = sns.lineplot(
    data=persons,
    x="recall",
    y="precision",
    hue="box_height",
    estimator=None,
    sort=False,
    palette="bright",
)
plt.show()

../_images/notebooks_3_demo_evaluation_detection_42_0.png

For this example, we want the recall values for 10 different wanted precisions

[24]:

from functools import partial


def interpolate_precision(data, value):
    if isinstance(value, float):
        value = [value]
    recall_values = np.interp(
        value, xp=data["precision"][::-1], fp=data["recall"][::-1]
    )
    recall_values = pd.Series(
        recall_values, index=pd.Index(value, name="target_precision"), name="recall"
    ).to_frame()
    return recall_values

[25]:

recall_at_precision_persons = persons.groupby("box_height").apply(
    partial(interpolate_precision, value=np.linspace(0.1, 0.9, 5).round(3))
)
recall_at_precision_persons = recall_at_precision_persons.reset_index()
recall_at_precision_persons["box_mean_height"] = recall_at_precision_persons[
    "box_height"
].apply(lambda x: x.mid)

[26]:

g = sns.relplot(
    data=recall_at_precision_persons,
    x="box_mean_height",
    hue="target_precision",
    y="recall",
    kind="line",
)
g.set(xscale="log")
plt.show()

../_images/notebooks_3_demo_evaluation_detection_46_0.png

[27]:

sns.relplot(
    data=recall_at_precision_persons,
    x="target_precision",
    hue="box_height",
    y="recall",
    kind="line",
    palette="bright",
)
plt.show()

../_images/notebooks_3_demo_evaluation_detection_47_0.png

Next example covers all classes

[28]:

all_classes_iou_05 = pr[pr["iou_threshold"] == 0.5]
recall_at_precision = all_classes_iou_05.groupby(["box_height", "category_id"]).apply(
    partial(interpolate_precision, value=np.linspace(0.1, 0.9, 5).round(2))
)
recall_at_precision = recall_at_precision.reset_index()
recall_at_precision["box_mean_height"] = recall_at_precision["box_height"].apply(
    lambda x: x.mid
)
recall_at_precision["category_str"] = recall_at_precision["category_id"].replace(
    evaluator.label_map
)

[29]:

sns.relplot(
    data=recall_at_precision,
    x="target_precision",
    hue="box_height",
    y="recall",
    kind="line",
    palette="bright",
)
plt.show()

../_images/notebooks_3_demo_evaluation_detection_50_0.png

[30]:

g = sns.relplot(
    data=recall_at_precision,
    x="box_mean_height",
    hue="target_precision",
    y="recall",
    kind="line",
    palette="bright",
)
g.set(xscale="log")
plt.show()

../_images/notebooks_3_demo_evaluation_detection_51_0.png

[31]:

g = sns.relplot(
    data=recall_at_precision,
    x="box_mean_height",
    hue="target_precision",
    y="recall",
    col="category_str",
    col_wrap=4,
    kind="line",
    palette="bright",
)
g.set(xscale="log")
for axis in g.axes.flat:
    axis.tick_params(labelbottom=True)
plt.subplots_adjust(hspace=0.15)
plt.show()

../_images/notebooks_3_demo_evaluation_detection_52_0.png