match_index#

Dataset.match_index(other_images: DataFrame | Dataset, on: str = 'relative_path', remove_unmatched: bool = False) Self[source]#

Reindex a dataset from another images DataFrame.

The given on column is used to retrieve the index values from the reference images dataframe.

Note

If index of rows which value in on column does not match any row in other_images, DataFrame’s index will be reset to a range index without sorting it.

Parameters:
  • other_images – images DataFrame taken from another dataset. Must have the column specified in on

  • on – name of the column to use to retrieve indexes. Must be present in both columns of self.images and other_images. Defaults to “relative_path”.

  • remove_unmatched – if set to True, will remove images from dataset that don’t match any row in the other_images dataframe. The corresponding annotations will also be removed.

Returns:

Dataset with updated image indexes, along with values in image_id column of annotations.

Example

>>> from lours.utils.doc_utils import dummy_dataset
>>> example = dummy_dataset(5, 5, seed=2)
>>> example
Dataset object containing 5 images and 5 objects
Name :
    argue_be_structure
Images root :
    what/way
Images :
    width  height            relative_path   type  split
id
0     368     401        police/enter.jpeg  .jpeg  train
1     472     640          also/policy.gif   .gif    val
2     832     831  cold/responsibility.png   .png  train
3     506     755        increase/pull.jpg   .jpg  train
4     182     993            Mr/trade.tiff  .tiff  train
Annotations :
    image_id category_str  category_id  ...   box_y_min   box_width  box_height
id                                      ...
0          0       simply           25  ...  273.908994  168.756932    4.288302
1          4        table            7  ...  106.456857   19.340529  282.426602
2          0       simply           25  ...   41.921967   38.506811   33.166314
3          2        table            7  ...  167.785089  242.139038  119.708224
4          1       simply           25  ...  327.082223  234.360304  238.965568

[5 rows x 8 columns]
Label map :
{3: 'relationship', 7: 'table', 25: 'simply'}
>>> images_modified = example.images.iloc[::2].reset_index(drop=True)
>>> images_modified
   width  height            relative_path   type  split
0    368     401        police/enter.jpeg  .jpeg  train
1    832     831  cold/responsibility.png   .png  train
2    182     993            Mr/trade.tiff  .tiff  train
>>> example.match_index(images_modified)
Dataset object containing 5 images and 5 objects
Name :
    argue_be_structure
Images root :
    what/way
Images :
    width  height            relative_path   type  split
id
0     368     401        police/enter.jpeg  .jpeg  train
1     832     831  cold/responsibility.png   .png  train
2     182     993            Mr/trade.tiff  .tiff  train
3     472     640          also/policy.gif   .gif    val
4     506     755        increase/pull.jpg   .jpg  train
Annotations :
    image_id category_str  category_id  ...   box_y_min   box_width  box_height
id                                      ...
0          0       simply           25  ...  273.908994  168.756932    4.288302
1          2        table            7  ...  106.456857   19.340529  282.426602
2          0       simply           25  ...   41.921967   38.506811   33.166314
3          1        table            7  ...  167.785089  242.139038  119.708224
4          3       simply           25  ...  327.082223  234.360304  238.965568

[5 rows x 8 columns]
Label map :
{3: 'relationship', 7: 'table', 25: 'simply'}