caipy#

Functions

`dataset_to_caipy`	Save dataset to cAIpy format Note that depending on the splits present in your dataset, the folder structure might change
`dataset_to_caipy_generic`	Save dataset to cAIpy format Note that depending on the splits present in your dataset, the folder structure might change
`from_caipy`	Load a dataset stored in the cAIpy format
`from_caipy_generic`	Load a dataset stored in the cAIpy format, but you can specify images and annotations folders rather than giving a folder with Images and Annotations sub-folders.
`load_caipy_annot_folder`	Glob all json in folder path and construct image and annotation dataframe
`load_caipy_split`	Load a particular caipy split folder and convert it to a lours Dataset
`split_to_caipy`	Save a particular split to cAIpy.

dataset_to_caipy(dataset: Dataset, output_path: Path | str, use_schema: bool = False, json_schema: str | Path | None = 'default', copy_images: bool = True, to_jpg: bool = True, overwrite_images: bool = True, overwrite_labels: bool = True, flatten_paths: bool = True) → None[source]#

Save dataset to cAIpy format Note that depending on the splits present in your dataset, the folder structure might change

Note

Unless specified otherwise, relative paths of images a flattened during the export, which modifies the dataset if the images and annotations were stored in subfolders, but will put all images and annotations of a particular split in their respective root folder.

Note

If schema is not given, the nested dictionary will be deduced from column names with the separator “.”

Parameters:

dataset – dataset to save
output_path – root folder where the dataset folder structure will be created.
use_schema – If set to True, and json_schema is not None, will use schema for validation and formatting (see option json_schema)
json_schema – Path to a schema that output json dicts will be tested against for compliance. They will also be used to remove columns for fields no included in the schema. Can be either a url or a local path. If set to None, or use_schema is set to False, will not perform any test or reformatting. Defaults to default schema.
copy_images – If set to False, will create a symbolic link instead of copying. Much faster, but needs to keep original images in the same relative path. Defaults to False.
to_jpg – if True, will convert images to jpg if needed. Defaults to True.
overwrite_images – if set to False, will skip images that are already copied. Defaults to True.
overwrite_labels – if set to False, will skip annotation that are already created. Defaults to True.
flatten_paths – if set to True, will put all files in the root Annotations and Images folders by replacing folder separation (“/”) with “_” in relative path. Defaults to True

dataset_to_caipy_generic(dataset: Dataset, output_images_folder: Path | str | None, output_annotations_folder: Path | str, use_schema: bool = False, json_schema: str | Path | None = 'default', copy_images: bool = True, to_jpg: bool = True, overwrite_images: bool = True, overwrite_labels: bool = True, flatten_paths: bool = True) → None[source]#

Save dataset to cAIpy format Note that depending on the splits present in your dataset, the folder structure might change

Notes

Unless specified otherwise, relative paths of images a flattened during the export, which modifies the dataset if the images and annotations were stored in subfolders, but will put all images and annotations of a particular split in their respective root folder.
If schema is not given, the nested dictionary will be deduced from column names with the separator “.”

Parameters:

dataset – dataset to save
output_images_folder – root folder where the images will be saved. If None, will not save images. Useful when only saving predictions or a variations of annotations.
output_annotations_folder – root folder where the json file will be saved.
use_schema – If set to True, and json_schema is not None, will use schema for validation and formatting (see option json_schema)
json_schema – Path to a schema that output json dicts will be tested against for compliance. They will also be used to remove columns for fields no included in the schema. Can be either a url or a localt path. If set to None, or use_schema is set to False, will not perform any test. Defaults to the example schema.
copy_images – If set to False, will create a symbolic link instead of copying. Much faster, but needs to keep original images in the same relative path. Defaults to False.
to_jpg – if True, will convert images to jpg if needed. Defaults to True.
overwrite_images – if set to False, will skip images that are already copied. Defaults to True.
overwrite_labels – if set to False, will skip annotation that are already created. Defaults to True.
flatten_paths – if set to True, will put all files in the root Annotations and Images folders by replacing folder separation (“/”) with “_” in relative path. Defaults to True

Load a dataset stored in the cAIpy format

See specifications

This will error if

two annotations have the same category_id but not the same category_str
two annotations have a different category_id but the same category_str
two images have the same file_name, but not the same id

Parameters:

dataset_path – folder root of dataset. Should contain the folders “Images” and “Annotations”.
dataset_name – If specified, will be the dataset name, used when showing the dataset or exporting in other formats such as fiftyone. If not specified, the dataset name will be the name of the root folder.
split – if data is at the root of Images and Annotations folder, the split will be set to this option. Defaults to None
splits_to_read – if given, will only read the specified splits. Useful for a faster loading.
use_schema – If set to True, and json_schema is not None, will use schema for validation and formatting (see option json_schema)
json_schema – schema dictionary or Path to a schema that json files will be tested against for compliance. If its not a dictionary, it can be either a url, or a local path. If set to None, or use_schema is set to False, will not perform any test. Defaults to default schema.
booleanize – In the case some attributes are array of enum with unique elements, they will be booleanized (see booleanize()). Note that this option is only used if json_schema` is not None and use_schema is set to True. Defaults to True.

Raises:

ValueError – Inconsistency between two annotations or images (see description above)

Returns:

Loaded dataset object