simple_split#

Dataset.simple_split(input_seed: int = 0, split_names: Sequence[str] = ('train', 'valid'), target_split_shares: Sequence[float] = (0.8, 0.2), inplace: bool = False) Self[source]#

Simple version of splitting method, splitting images randomly.

Parameters:
  • input_seed – Random seed for splitting images. Defaults to 0.

  • split_names – Names of splits. Must be more than 1 element long and the same size as target_split_shares. Defaults to ("train", "valid").

  • target_split_shares – Share values of each split. Must be the same size as split_names. Must add up to 1. Defaults to (0.8, 0.2).

  • inplace – If set to True, will perform the splitting inplace without creating a new dataset. Defaults to False.

Returns:

Dataset with new splits applied to its images DataFrame.

See also

Example

>>> from lours.utils.doc_utils import dummy_dataset
>>> example = dummy_dataset(200, 200, seed=1, split_names=None)
>>> example
Dataset object containing 200 images and 200 objects
Name :
    shake_effort_many
Images root :
    care/suggest
Images :
    width  height        relative_path   type
id
0      955     488  determine/story.jpg   .jpg
1      131     895       air/method.bmp   .bmp
2      229     880   political/lead.jpg   .jpg
3      840     384        like/safe.bmp   .bmp
4      953     668      suffer/set.jpeg  .jpeg
..     ...     ...                  ...    ...
195    122     437    state/almost.tiff  .tiff
196    752     300     weight/tend.jpeg  .jpeg
197    554     228  remember/summer.png   .png
198    688     605       yet/though.png   .png
199    243     227   describe/road.tiff  .tiff

[200 rows x 4 columns]
Annotations :
    image_id category_str  category_id  ...   box_y_min   box_width  box_height
id                                       ...
0          77     marriage           15  ...  425.688592   29.159255   39.517594
1         137     marriage           15  ...  383.838546  551.353799  285.211136
2         158     marriage           15  ...  174.889594  144.774339  183.531195
3         111        reach           22  ...  151.265769   97.611967  282.485307
4         121     marriage           15  ...   38.236459  522.170458   36.783181
..        ...          ...          ...  ...         ...         ...         ...
195       129        reach           22  ...  190.935508  104.385252    3.669239
196        33       listen           14  ...  322.704987  469.556266  193.375897
197       181       listen           14  ...  403.794364  349.250089   66.745395
198        55        reach           22  ...    2.534284  119.223978  110.346924
199        89        reach           22  ...  172.664334  658.570932  282.920285

[200 rows x 7 columns]
Label map :
{14: 'listen', 15: 'marriage', 22: 'reach'}
>>> splitted = example.simple_split()
>>> splitted
Dataset object containing 200 images and 200 objects
Name :
    shake_effort_many
Images root :
    care/suggest
Images :
    width  height        relative_path   type  split
id
0      955     488  determine/story.jpg   .jpg  train
1      131     895       air/method.bmp   .bmp  train
2      229     880   political/lead.jpg   .jpg  train
3      840     384        like/safe.bmp   .bmp  train
4      953     668      suffer/set.jpeg  .jpeg  valid
..     ...     ...                  ...    ...    ...
195    122     437    state/almost.tiff  .tiff  train
196    752     300     weight/tend.jpeg  .jpeg  train
197    554     228  remember/summer.png   .png  train
198    688     605       yet/though.png   .png  valid
199    243     227   describe/road.tiff  .tiff  train

[200 rows x 5 columns]
Annotations :
    image_id category_str  category_id  ...   box_y_min   box_width  box_height
id                                       ...
0          77     marriage           15  ...  425.688592   29.159255   39.517594
1         137     marriage           15  ...  383.838546  551.353799  285.211136
2         158     marriage           15  ...  174.889594  144.774339  183.531195
3         111        reach           22  ...  151.265769   97.611967  282.485307
4         121     marriage           15  ...   38.236459  522.170458   36.783181
..        ...          ...          ...  ...         ...         ...         ...
195       129        reach           22  ...  190.935508  104.385252    3.669239
196        33       listen           14  ...  322.704987  469.556266  193.375897
197       181       listen           14  ...  403.794364  349.250089   66.745395
198        55        reach           22  ...    2.534284  119.223978  110.346924
199        89        reach           22  ...  172.664334  658.570932  282.920285

[200 rows x 8 columns]
Label map :
{14: 'listen', 15: 'marriage', 22: 'reach'}
>>> splitted.images["split"].value_counts() / len(splitted)
split
train    0.725
valid    0.275
Name: count, dtype: float64