simple_split#
- Dataset.simple_split(input_seed: int = 0, split_names: Sequence[str] = ('train', 'valid'), target_split_shares: Sequence[float] = (0.8, 0.2), inplace: bool = False) Self[source]#
Simple version of splitting method, splitting images randomly.
- Parameters:
input_seed – Random seed for splitting images. Defaults to 0.
split_names – Names of splits. Must be more than 1 element long and the same size as
target_split_shares. Defaults to("train", "valid").target_split_shares – Share values of each split. Must be the same size as
split_names. Must add up to 1. Defaults to(0.8, 0.2).inplace – If set to True, will perform the splitting inplace without creating a new dataset. Defaults to False.
- Returns:
Dataset with new splits applied to its images DataFrame.
Example
>>> from lours.utils.doc_utils import dummy_dataset >>> example = dummy_dataset(200, 200, seed=1, split_names=None) >>> example Dataset object containing 200 images and 200 objects Name : shake_effort_many Images root : care/suggest Images : width height relative_path type id 0 955 488 determine/story.jpg .jpg 1 131 895 air/method.bmp .bmp 2 229 880 political/lead.jpg .jpg 3 840 384 like/safe.bmp .bmp 4 953 668 suffer/set.jpeg .jpeg .. ... ... ... ... 195 122 437 state/almost.tiff .tiff 196 752 300 weight/tend.jpeg .jpeg 197 554 228 remember/summer.png .png 198 688 605 yet/though.png .png 199 243 227 describe/road.tiff .tiff [200 rows x 4 columns] Annotations : image_id category_str category_id ... box_y_min box_width box_height id ... 0 77 marriage 15 ... 425.688592 29.159255 39.517594 1 137 marriage 15 ... 383.838546 551.353799 285.211136 2 158 marriage 15 ... 174.889594 144.774339 183.531195 3 111 reach 22 ... 151.265769 97.611967 282.485307 4 121 marriage 15 ... 38.236459 522.170458 36.783181 .. ... ... ... ... ... ... ... 195 129 reach 22 ... 190.935508 104.385252 3.669239 196 33 listen 14 ... 322.704987 469.556266 193.375897 197 181 listen 14 ... 403.794364 349.250089 66.745395 198 55 reach 22 ... 2.534284 119.223978 110.346924 199 89 reach 22 ... 172.664334 658.570932 282.920285 [200 rows x 7 columns] Label map : {14: 'listen', 15: 'marriage', 22: 'reach'} >>> splitted = example.simple_split() >>> splitted Dataset object containing 200 images and 200 objects Name : shake_effort_many Images root : care/suggest Images : width height relative_path type split id 0 955 488 determine/story.jpg .jpg train 1 131 895 air/method.bmp .bmp train 2 229 880 political/lead.jpg .jpg train 3 840 384 like/safe.bmp .bmp train 4 953 668 suffer/set.jpeg .jpeg valid .. ... ... ... ... ... 195 122 437 state/almost.tiff .tiff train 196 752 300 weight/tend.jpeg .jpeg train 197 554 228 remember/summer.png .png train 198 688 605 yet/though.png .png valid 199 243 227 describe/road.tiff .tiff train [200 rows x 5 columns] Annotations : image_id category_str category_id ... box_y_min box_width box_height id ... 0 77 marriage 15 ... 425.688592 29.159255 39.517594 1 137 marriage 15 ... 383.838546 551.353799 285.211136 2 158 marriage 15 ... 174.889594 144.774339 183.531195 3 111 reach 22 ... 151.265769 97.611967 282.485307 4 121 marriage 15 ... 38.236459 522.170458 36.783181 .. ... ... ... ... ... ... ... 195 129 reach 22 ... 190.935508 104.385252 3.669239 196 33 listen 14 ... 322.704987 469.556266 193.375897 197 181 listen 14 ... 403.794364 349.250089 66.745395 198 55 reach 22 ... 2.534284 119.223978 110.346924 199 89 reach 22 ... 172.664334 658.570932 282.920285 [200 rows x 8 columns] Label map : {14: 'listen', 15: 'marriage', 22: 'reach'} >>> splitted.images["split"].value_counts() / len(splitted) split train 0.725 valid 0.275 Name: count, dtype: float64