merge#

Functions

merge_datasets

Merge two datasets and return a unique dataset object containing Samples from both.

merge_datasets(dataset1: Dataset, dataset2: Dataset, allow_overlapping_image_ids: bool = True, realign_label_map: bool = False, ignore_index: bool = False, mark_origin: bool = False, overwrite_origin: bool = False) Dataset[source]#

Merge two datasets and return a unique dataset object containing Samples from both. Result’s images_root will be the common path of both datasets, and the image relative paths will be updated accordingly. Result’s label map will be the superset of both label map, provided one is included in the other.

Notes

  • If possible, booleanized columns for images and annotations will be broadcast together. See lours.utils.column_booleanizer.broadcast_booleanization()

  • If one of the dataset has an absolute path as images_root, the other dataset images root path will also be converted to absolute.

  • If both datasets have the same name, the output will have the same name as well.

  • If datasets have a different name, the output will have the concatenation of both names separate by a “+” sign. The merge output of “A” and “B” will be thus names “A+B”.

  • If one dataset has no name (dataset.name is None), the output will take the name of the other.

  • If mark_origin is selected, it will be effective only if datasets have different actual names (not None)

Parameters:
  • dataset1 – First dataset to merge.

  • dataset2 – Second dataset to merge with dataset1. This dataset must be compatible with the first one, i.e. one label map is included with the other, image and annotation ids are mutually exclusives between datasets (unless ignore_index is False), and booleanized columns are compatible with each other.

  • allow_overlapping_image_ids – if set to True, will try to join images dataframes with overlapping ids. The whole rows (i.e. with values from columns present in both dataframes) must match, as well as the images_root. In that case, annotations with this image_id (from self or other) will be assumed to come from the same image. Defaults to True

  • realign_label_map – If set to True, will try to remap classes of dataset2 to match the label map fo dataset1, to avoid a potential error due to incompatible label maps.

  • ignore_index – if set to True, will ignore overlapping ids for images and annotations and reset them. Will update the image_id column in the annotations accordingly. Note that this option makes the former option useless. Defaults to False.

  • mark_origin – If set to True, and if both datasets have a different name, will add two columns “origina_dataset_name” and “origin” for images and annotations dataframes, indicating respectively the name of the origin dataset, and its id in the original dataset. Defaults to True.

  • overwrite_origin – If set to True, will overwrite already existing columns in input datasets dataframes. Otherwise, will only mark origin if it’s not present. Defaults to False.

Raises:

ValueError – Error if the two datasets are incompatible (see above)

Returns:

Merged dataset.