compute_precision_recall#

DetectionEvaluator.compute_precision_recall(predictions_names: str | Iterable[str] | None = None, groups: str | ContinuousGroup | Sequence[str | ContinuousGroup] = ('category_id',), ious: float | Iterable[float] = (0.0,), index_column: str | None = 'recall', index_values: Iterable[float] | None = None, f_scores_betas: Iterable[float] = (1,)) tuple[DataFrame, DataFrame][source]#

Compute Precision Recall curves, along with Average precision, with respect to recall, for different minimum IoU values.

The dataset can be grouped, so that you get multiple pr curves in the end.

It can be either groups of images (applied on self.images) or groups of bbox (applied on self.groundtruth and self.predictions_dictionary).

In the case the data is not categorical, you must provide the number of desired bins of desired bin boundaries, and the cut method will be used to construct groups.

See also

Note

For bbox groups, the value used will be the one of the target, except for false positive (no matching target) where the prediction data will be used. For example, the bbox size used for grouping will be the target one and not the prediction. So even if the prediction is out of bound, the detection will be considered valid as long as the IoU is high enough. However, when there is a false positive, the size of prediction will be used to decide in which group the precision needs to be decreased

Parameters:
  • predictions_names – names of predictions DataFrames, contained in self.predictions_dictionary to compute the PR curves on. If set to None, will compute PR curves for all predictions DataFrames.

  • groups – Groups of image or annotation attributes to use to partition evaluation results to compute multiple PR curves. Must be a group_list . Defaults to ("category_id", ).

  • ious – minimum IoU values above which detection are considered valid. The higher, the harder it is for a detection to be valid. Defaults to 0.

  • index_column – If set, will force the values of given column to be in the same bins. This will decrease data granularity, but make it possible to us this column as index. If not set, each category will have its own values, set exactly where recall and precision changes, making the curve more precise. Possible arguments are the only monotonous values (either increasing or decreasing), i.e. recall, precision and confidence_threshold. Defaults to recall to match pycocotools and fiftyone evaluation workflows.

  • index_values – Iterable of bins, increasing float values from 0 to 1. used to reindex the dataframe. If set to None, will be 101 points evenly spaced from 0 to 1, to match pycocotools and fiftyone evaluation workflows. Defaults to None.

  • f_scores_betas – beta values to compute the corresponding \(F_\beta\) values in addition to precision and recall.

Returns:

PR curve dataset and corresponding average precision. The PR curve dataframe will have at least precision, recall, confidence_threshold, and iou_threshold columns, plus the \(F_\beta\) score columns, plus all the columns from the given groups. The AP dataframe will have at least AP and iou_threshold columns, plus all the columns from the given groups.

Example

>>> from lours.utils.doc_utils import dummy_dataset
>>> groundtruth = dummy_dataset(
...     10,
...     1000,
...     label_map={0: "person", 1: "car"},
...     n_attribute_columns_images={"attribute": 2},
... )
>>> predictions1 = dummy_dataset(
...     10,
...     10000,
...     label_map=groundtruth.label_map,
...     images=groundtruth.images,
...     add_confidence=True,
...     seed=0,
... )
>>> predictions2 = dummy_dataset(
...     10,
...     10000,
...     label_map=groundtruth.label_map,
...     images=groundtruth.images,
...     add_confidence=True,
...     seed=1,
... )
>>> evaluator = DetectionEvaluator(
...     groundtruth=groundtruth, A=predictions1, B=predictions2
... )

Get the Precision Recall curves and the Average Precision dataframe

>>> pr, ap = evaluator.compute_precision_recall(ious=[0, 0.5])
computing matches between groundtruth and A (category specific)
computing matches between groundtruth and B (category specific)
Processing PR curves for 2 IoU values and 2 prediction sets
Processing PR curve for model=A and IOU=0
Processing PR curve for model=B and IOU=0
Processing PR curve for model=A and IOU=0.5
Processing PR curve for model=B and IOU=0.5
>>> ap
   category_id  iou_threshold model        AP category_str
0            1            0.0     A  0.939509          car
1            0            0.0     A  0.961933       person
2            1            0.0     B  0.956845          car
3            0            0.0     B  0.946684       person
4            1            0.5     A  0.040764          car
5            0            0.5     A  0.026722       person
6            1            0.5     B  0.025094          car
7            0            0.5     B  0.025750       person
>>> pr
     category_id  recall  precision  ...  iou_threshold  model  category_str
0              1    0.00   1.000000  ...            0.0      A           car
1              1    0.01   1.000000  ...            0.0      A           car
2              1    0.02   1.000000  ...            0.0      A           car
3              1    0.03   0.985714  ...            0.0      A           car
4              1    0.04   0.985714  ...            0.0      A           car
..           ...     ...        ...  ...            ...    ...           ...
803            0    0.96   0.000000  ...            0.5      B        person
804            0    0.97   0.000000  ...            0.5      B        person
805            0    0.98   0.000000  ...            0.5      B        person
806            0    0.99   0.000000  ...            0.5      B        person
807            0    1.00   0.000000  ...            0.5      B        person

[808 rows x 8 columns]

For each class, iou and model, get the confidence threshold with the best f1 and print the corresponding f1, recall and precision

>>> best_f1 = pr.groupby(["model", "iou_threshold", "category_id"])[
...     "f1_score"
... ].idxmax()
>>> pr.loc[best_f1, ["f1_score", "recall", "precision"]].set_index(
...     best_f1.index
... )
                                 f1_score  recall  precision
model iou_threshold category_id
A     0.0           0            0.904181    0.89   0.920502
                    1            0.884258    0.88   0.888889
      0.5           0            0.131444    0.10   0.194030
                    1            0.158687    0.13   0.208202
B     0.0           0            0.883191    0.87   0.903158
                    1            0.898718    0.90   0.898039
      0.5           0            0.124986    0.10   0.168350
                    1            0.131136    0.10   0.203922

Use a grouper to have PR values with respect to “attribute” image column, box height columns, thanks to a continuous group.

>>> from lours.utils.grouper import ContinuousGroup
>>> height_group = ContinuousGroup("box_height", bins=2, qcut=True)
>>> pr, ap = evaluator.compute_precision_recall(
...     ious=0.1,
...     groups=["attribute", "category_id", height_group],
...     predictions_names="B",
... )
Processing PR curves for 1 IoU value and 1 prediction set
Processing PR curve for model=B and IOU=0.1
>>> ap.set_index(["box_height", "attribute", "category_str"])["AP"]
box_height          attribute  category_str
(209.059, 938.398]  return     car             0.687178
(0.0295, 209.059]   return     car             0.486098
                    to         person          0.394769
(209.059, 938.398]  to         car             0.670351
                    return     person          0.727590
(0.0295, 209.059]   to         car             0.517899
                    return     person          0.372749
(209.059, 938.398]  to         person          0.586228
Name: AP, dtype: float64