Dataset#
- class Dataset(images_root: Path | None = None, images: DataFrame | None = None, annotations: DataFrame | None = None, label_map: dict[int, str] | None = None, dataset_name: str | None = None)[source]#
Bases:
objectDataset base class for manipulation
The behaviour of the Dataset is inspired from numpy arrays or pandas dataframes.
See also
related doc for a complete explanation of main principles.
Main Constructor
- Parameters:
images_root – root path from where the
relative_pathvalues are relative to, in imagesimages – DataFrame comprising image data. This dataframe should be referred to by annotations with the
image_idcolumnannotations – DataFrame comprising annotation data. Must have at least
image_idcolumnlabel_map – Mapping from
category_idtocategory_str, in the case the annotations have acategory_idid. Useful for detections and classificationdataset_name – Optional name for dataset. Will be used in function that need a name when the name cannot be easily deduced from images_root
See also
Example
>>> Dataset() Dataset object containing 0 image and 0 object Name : None Images root : . Images : Empty DataFrame Columns: [width, height, relative_path, type] Index: [] Annotations : Empty DataFrame Columns: [image_id, category_str, category_id, box_x_min, box_y_min, box_width, box_height] Index: [] Label map : {}
>>> images = pd.DataFrame( ... data={ ... "width": [1920, 1280], ... "height": [1080, 720], ... "relative_path": [Path("0.jpg"), Path("1.jpg")], ... "split": ["train", "valid"], ... }, ... index=[0, 1], ... ) >>> annotations = pd.DataFrame( ... data={ ... "image_id": [0, 1], ... "category_id": [1, 0], ... "box_x_min": [10, 20], ... "box_y_min": [30, 40], ... "box_width": [100, 200], ... "box_height": [200, 300], ... }, ... index=[2, 3], ... ) >>> label_map = {0: "this", 1: "that"} >>> Dataset( ... images=images, ... annotations=annotations, ... label_map=label_map, ... dataset_name="my_dataset", ... ) Dataset object containing 2 images and 2 objects Name : my_dataset Images root : . Images : width height relative_path type split id 0 1920 1080 0.jpg .jpg train 1 1280 720 1.jpg .jpg valid Annotations : image_id category_str category_id ... box_y_min box_width box_height id ... 2 0 that 1 ... 30.0 100.0 200.0 3 1 this 0 ... 40.0 200.0 300.0 [2 rows x 8 columns] Label map : {0: 'this', 1: 'that'}
Attributes
- images: DataFrame#
- annotations: DataFrame#
Methods
__getitem__(args)__getitem__implementation for the Dataset object.__len__()Return number of images in dataset.
add_detection_annotation(image_id, ...[, ...])Add one or multiple detection annotations to the current dataset.
annotation_append([format_string, ...])Create a context manager to add detection tensors to the current dataset with the
AnnotationAppender.append()method, as if the Dataset was a list.booleanize([column_names, missing_ok])Convert given column in
self.imagesorself.annotationsfrom lists to columns of booleans.Method to ensure the bounding box coordinates are inside the picture frame.
check([check_symlink, allow_keypoints, ...])Make a full check of dataset, Ids, Bounding boxes, label maps and images
debooleanize([dataframe])Convert booleanized columns back to list form, for exporting purpose.
Create a dataset object with an empty annotation dataframe, but with the same columns, and the same images dataframe.
filter_annotations(index[, mode, ...])Method equivalent of
loc_annotandiloc_annot, except you can choose to remove emptied images as well.filter_images(index[, mode])Method equivalent of
Dataset.locandDataset.ilocfrom_template([reset_booleanized])Create a new Dataset object from an existing Dataset.
Get the name of columns related to annotations attributes.
Get the name of columns related to image attributes.
Sample a single image from the dataset.
get_split(split)Get a particular split from the dataset
Initialize annotations by adding info and checking index
Initialize images by checking required fields are present and converting fields to the right dtype.
Iterate through images, by yielding
Iterate though split values of the dataset, by yielding for each split the split name and the corresponding sub-dataset.
keep_classes(to_keep[, remove_emptied_images])Perform a simple remapping, where given classes kept, and other are removed
Return number of annotations in total
match_index(other_images[, on, remove_unmatched])Reindex a dataset from another images DataFrame.
merge(other[, allow_overlapping_image_ids, ...])Merge two datasets and return a unique dataset object containing Samples from both.
remap_classes(class_mapping[, new_names, ...])Remap classes ids and names according to a dictionary
remap_from_csv(csv[, remove_not_mapped, ...])Same as class remap, but instead of taking a dictionary, you give the path to a csv file.
remap_from_dataframe(df[, ...])Same as class remap, but instead of taking a dictionary, you give a dataframe.
remap_from_other(other[, remove_not_mapped, ...])Try to remap classes of dataset to match the ones in another dataset by retrieving categories with the same name.
remap_from_preset(input_dataset_map, ...[, ...])Same as class remap, but instead of taking a dictionary, you give the name of a preset.
remove_classes(to_remove[, ...])Perform a simple remapping, where given classes are removed
Remove images without annotations from dataset.
remove_invalid_annotations([...])Remove Invalid annotations from dataset.
remove_invalid_images([load_images])Remove invalid images from dataset.
rename(dataset_name)Simple function to change the name fo the dataset.
reset_images_root(new_path)Replace the images_root with a new path.
reset_index([start_image_id, ...])Reset index of
self.imagesdataframe, and reset index of self.annotations However, keep the 'image_id' column inself.annotationspointing to the right rows in theself.imagesdataframe.reset_index_from_mapping([images_index_map, ...])Reset index of images and annotations dataframe with index maps (index -> new_index) where the value is new index to apply.
simple_split([input_seed, split_names, ...])Simple version of splitting method, splitting images randomly.
split([input_seed, split_names, ...])Perform the split operation on annotations and images.
to_caipy(output_path[, use_schema, ...])Convert dataset to cAIpy format.
to_caipy_generic(output_images_folder, ...)Convert dataset to cAIpy format, but with the possibility to specify images and annotations folders rather than a root folder with Images and Annotations sub-folders.
to_coco(output_path[, copy_images, to_jpg, ...])Save dataset in coco format.
to_darknet(output_path[, copy_images, ...])Save dataset in darknet format, readable by darknet .
to_fiftyone([dataset_name, ...])Convert the dataset into a
fiftyone dataset, that can then be inspected with Fiftyone's webapp.to_parquet(output_dir[, overwrite])Save dataset object to a folder containing parquet files for dataframes and a metadata.yaml file for other attributes.
to_yolov5(output_path[, copy_images, ...])Save dataset in format readable by Yolov5 .
to_yolov7(output_path[, copy_images, ...])Save dataset in format readable by Yolov7 .
Filter a dataset by indexing the images you want with their row number.
Filter a dataset by indexing the annotations you want with their row number.
Filter a dataset by indexing the images you want with their ids
Filter a dataset by indexing the annotations you want with their id.