Dataset#

Bases: object

Dataset base class for manipulation

The behaviour of the Dataset is inspired from numpy arrays or pandas dataframes.

See also

from_template()

Example

>>> Dataset()
Dataset object containing 0 image and 0 object
Name :
    None
Images root :
    .
Images :
Empty DataFrame
Columns: [width, height, relative_path, type]
Index: []
Annotations :
Empty DataFrame
Columns: [image_id, category_str, category_id, box_x_min, box_y_min, box_width, box_height]
Index: []
Label map :
{}

>>> images = pd.DataFrame(
...     data={
...         "width": [1920, 1280],
...         "height": [1080, 720],
...         "relative_path": [Path("0.jpg"), Path("1.jpg")],
...         "split": ["train", "valid"],
...     },
...     index=[0, 1],
... )
>>> annotations = pd.DataFrame(
...     data={
...         "image_id": [0, 1],
...         "category_id": [1, 0],
...         "box_x_min": [10, 20],
...         "box_y_min": [30, 40],
...         "box_width": [100, 200],
...         "box_height": [200, 300],
...     },
...     index=[2, 3],
... )
>>> label_map = {0: "this", 1: "that"}
>>> Dataset(
...     images=images,
...     annotations=annotations,
...     label_map=label_map,
...     dataset_name="my_dataset",
... )
Dataset object containing 2 images and 2 objects
Name :
    my_dataset
Images root :
    .
Images :
    width  height relative_path  type  split
id
0    1920    1080         0.jpg  .jpg  train
1    1280     720         1.jpg  .jpg  valid
Annotations :
    image_id category_str  category_id  ... box_y_min  box_width  box_height
id                                      ...
2          0         that            1  ...      30.0      100.0       200.0
3          1         this            0  ...      40.0      200.0       300.0

[2 rows x 8 columns]
Label map :
{0: 'this', 1: 'that'}

Attributes

booleanized_columns: dict[str, set[str]] = {'annotations': {}, 'images': {}}#

dataset_name: str | None#

images_root: Path#

images: DataFrame#

annotations: DataFrame#

label_map: dict[int, str]#

Methods

`__getitem__`(args)	`__getitem__` implementation for the Dataset object.
`__len__`()	Return number of images in dataset.
`add_detection_annotation`(image_id, ...[, ...])	Add one or multiple detection annotations to the current dataset.
`annotation_append`([format_string, ...])	Create a context manager to add detection tensors to the current dataset with the `AnnotationAppender.append()` method, as if the Dataset was a list.
`booleanize`([column_names, missing_ok])	Convert given column in `self.images` or `self.annotations` from lists to columns of booleans.
`cap_bounding_box_coordinates`()	Method to ensure the bounding box coordinates are inside the picture frame.
`check`([check_symlink, allow_keypoints, ...])	Make a full check of dataset, Ids, Bounding boxes, label maps and images
`debooleanize`([dataframe])	Convert booleanized columns back to list form, for exporting purpose.
`empty_annotations`()	Create a dataset object with an empty annotation dataframe, but with the same columns, and the same images dataframe.
`filter_annotations`(index[, mode, ...])	Method equivalent of `loc_annot` and `iloc_annot`, except you can choose to remove emptied images as well.
`filter_images`(index[, mode])	Method equivalent of `Dataset.loc` and `Dataset.iloc`
`from_template`([reset_booleanized])	Create a new Dataset object from an existing Dataset.
`get_annotations_attributes`()	Get the name of columns related to annotations attributes.
`get_image_attributes`()	Get the name of columns related to image attributes.
`get_one_frame`(n)	Sample a single image from the dataset.
`get_split`(split)	Get a particular split from the dataset
`init_annotations`()	Initialize annotations by adding info and checking index
`init_images`()	Initialize images by checking required fields are present and converting fields to the right dtype.
`iter_images`()	Iterate through images, by yielding
`iter_splits`()	Iterate though split values of the dataset, by yielding for each split the split name and the corresponding sub-dataset.
`keep_classes`(to_keep[, remove_emptied_images])	Perform a simple remapping, where given classes kept, and other are removed
`len_annot`()	Return number of annotations in total
`match_index`(other_images[, on, remove_unmatched])	Reindex a dataset from another images DataFrame.
`merge`(other[, allow_overlapping_image_ids, ...])	Merge two datasets and return a unique dataset object containing Samples from both.
`remap_classes`(class_mapping[, new_names, ...])	Remap classes ids and names according to a dictionary
`remap_from_csv`(csv[, remove_not_mapped, ...])	Same as class remap, but instead of taking a dictionary, you give the path to a csv file.
`remap_from_dataframe`(df[, ...])	Same as class remap, but instead of taking a dictionary, you give a dataframe.
`remap_from_other`(other[, remove_not_mapped, ...])	Try to remap classes of dataset to match the ones in another dataset by retrieving categories with the same name.
`remap_from_preset`(input_dataset_map, ...[, ...])	Same as class remap, but instead of taking a dictionary, you give the name of a preset.
`remove_classes`(to_remove[, ...])	Perform a simple remapping, where given classes are removed
`remove_empty_images`()	Remove images without annotations from dataset.
`remove_invalid_annotations`([...])	Remove Invalid annotations from dataset.
`remove_invalid_images`([load_images])	Remove invalid images from dataset.
`rename`(dataset_name)	Simple function to change the name fo the dataset.
`reset_images_root`(new_path)	Replace the images_root with a new path.
`reset_index`([start_image_id, ...])	Reset index of `self.images` dataframe, and reset index of self.annotations However, keep the 'image_id' column in `self.annotations` pointing to the right rows in the `self.images` dataframe.
`reset_index_from_mapping`([images_index_map, ...])	Reset index of images and annotations dataframe with index maps (index -> new_index) where the value is new index to apply.
`simple_split`([input_seed, split_names, ...])	Simple version of splitting method, splitting images randomly.
`split`([input_seed, split_names, ...])	Perform the split operation on annotations and images.
`to_caipy`(output_path[, use_schema, ...])	Convert dataset to cAIpy format.
`to_caipy_generic`(output_images_folder, ...)	Convert dataset to cAIpy format, but with the possibility to specify images and annotations folders rather than a root folder with Images and Annotations sub-folders.
`to_coco`(output_path[, copy_images, to_jpg, ...])	Save dataset in coco format.
`to_darknet`(output_path[, copy_images, ...])	Save dataset in darknet format, readable by darknet .
`to_fiftyone`([dataset_name, ...])	Convert the dataset into a `fiftyone dataset`, that can then be inspected with Fiftyone's webapp.
`to_parquet`(output_dir[, overwrite])	Save dataset object to a folder containing parquet files for dataframes and a metadata.yaml file for other attributes.
`to_yolov5`(output_path[, copy_images, ...])	Save dataset in format readable by Yolov5 .
`to_yolov7`(output_path[, copy_images, ...])	Save dataset in format readable by Yolov7 .
`iloc`	Filter a dataset by indexing the images you want with their row number.
`iloc_annot`	Filter a dataset by indexing the annotations you want with their row number.
`loc`	Filter a dataset by indexing the images you want with their ids
`loc_annot`	Filter a dataset by indexing the annotations you want with their id.

Dataset#

This Page