Dataset#

The Dataset object#

Dataset

Dataset base class for manipulation

Input Output#

Dataset.to_parquet

Save dataset object to a folder containing parquet files for dataframes and a metadata.yaml file for other attributes.

Dataset.to_caipy

Convert dataset to cAIpy format.

Dataset.to_darknet

Save dataset in darknet format, readable by darknet .

Dataset.to_coco

Save dataset in coco format.

Dataset.to_fiftyone

Convert the dataset into a fiftyone dataset, that can then be inspected with Fiftyone's webapp.

from_parquet

Load a Dataset object from a folder with parquet files for its dataframes.

from_caipy

Load a dataset stored in the cAIpy format

from_caipy_generic

Load a dataset stored in the cAIpy format, but you can specify images and annotations folders rather than giving a folder with Images and Annotations sub-folders.

from_coco

Load a coco json file into a dictionary.

from_folder

Load a folder of images into a dataset without annotations.

from_mot

Load a dataset stored in the MOT format.

from_crowd_human

Read a dataset in the format described for CrowdHuman

from_darknet

Creates dataset object from a darknet dataset.

from_darknet_yolov5

Creates dataset object from a darknet dataset.

from_darknet_generic

Generic function to load a darknet like dataset by only giving it folders, class names and optionally file list instead of a data file.

from_darknet_json

Same as from_darknet, expect the data file replaced with a json file containing directly annotations information.

from_pascalVOC_generic

Load a dataset in pascalVOC format

from_pascalVOC_detection

Load a pascalVOC detection dataset that follows the official structure.

Remapping#

Dataset.remap_classes

Remap classes ids and names according to a dictionary

Dataset.remap_from_preset

Same as class remap, but instead of taking a dictionary, you give the name of a preset.

Dataset.remap_from_csv

Same as class remap, but instead of taking a dictionary, you give the path to a csv file.

Dataset.remap_from_dataframe

Same as class remap, but instead of taking a dictionary, you give a dataframe.

Dataset.remap_from_other

Try to remap classes of dataset to match the ones in another dataset by retrieving categories with the same name.

Dataset.remove_classes

Perform a simple remapping, where given classes are removed

Merging#

Dataset.merge

Merge two datasets and return a unique dataset object containing Samples from both.

Dataset.__add__

Overloading of the "+" operator for Datasets.

Splitting#

Dataset.split

Perform the split operation on annotations and images.

Dataset.simple_split

Simple version of splitting method, splitting images randomly.

Indexing#

Dataset.loc

Filter a dataset by indexing the images you want with their ids

Dataset.iloc

Filter a dataset by indexing the images you want with their row number.

Dataset.loc_annot

Filter a dataset by indexing the annotations you want with their id.

Dataset.iloc_annot

Filter a dataset by indexing the annotations you want with their row number.

Dataset.filter_images

Method equivalent of Dataset.loc and Dataset.iloc

Dataset.filter_annotations

Method equivalent of loc_annot and iloc_annot, except you can choose to remove emptied images as well.

Re-Indexing#

Dataset.match_index

Reindex a dataset from another images DataFrame.

Dataset.reset_index

Reset index of self.images dataframe, and reset index of self.annotations However, keep the 'image_id' column in self.annotations pointing to the right rows in the self.images dataframe.

Dataset.reset_index_from_mapping

Reset index of images and annotations dataframe with index maps (index -> new_index) where the value is new index to apply.

Internal API#

io

remap_presets

Registry for known useful preset.

split

merge

indexing

Module dedicated to Dataset indexers, to be able to index Dataset with pandas style loc and iloc methods