difftools#

Set of tools to differentiate datasets or evaluators

Functions

dataset_diff

Differentiate two datasets and construct the difference datasets, only containing elements that are in one of the two datasets but not the other

dataset_diff(left_dataset: Dataset, right_dataset: Dataset, exclude_image_columns: Iterable[str] = (), exclude_annotations_columns: Iterable[str] = ()) tuple[Dataset, Dataset, Dataset][source]#

Differentiate two datasets and construct the difference datasets, only containing elements that are in one of the two datasets but not the other

this function outputs the differences with 2 datasets that are constructed with images and annotations specific to each dataset and a third dataset with common images and annotations.

As such, you should theoretically be able to reconstruct the left dataset with the first difference dataset and the common dataset, and reconstruct the right dataset with the second difference dataset and the common dataset.

Note

if one dataset has a column in its dataframes the other dataset doesn’t have, and that column is not included in exclude_image_columns or exclude_annotations_column, the dataframes and thus the datasets will be considered entirely different, and the common dataset will be empty

Note

if exclude_image_columns or exclude_annotations_columns is not empty, it is not guaranteed to be able to reconstruct left or right dataset with common datasets and difference datasets, only the datasets minus the excluded columns.

Parameters:
  • left_dataset – left dataset to compare

  • right_dataset – right dataset to compare

  • exclude_image_columns – list of names of columns to ignore in image dataframes for the comparison.

  • exclude_annotations_columns – list of names of columns to ignore in annotations dataframes for the comparison.

Returns:

tuple with 3 datasets
  • dataset with images and annotations that are specific to left_dataset

  • dataset with images and annotations that are specific to right_dataset

  • dataset with images and annotations that are common to both input datasets.