caipy#

Functions

dataset_to_caipy

Save dataset to cAIpy format Note that depending on the splits present in your dataset, the folder structure might change

dataset_to_caipy_generic

Save dataset to cAIpy format Note that depending on the splits present in your dataset, the folder structure might change

from_caipy

Load a dataset stored in the cAIpy format

from_caipy_generic

Load a dataset stored in the cAIpy format, but you can specify images and annotations folders rather than giving a folder with Images and Annotations sub-folders.

load_caipy_annot_folder

Glob all json in folder path and construct image and annotation dataframe

load_caipy_split

Load a particular caipy split folder and convert it to a lours Dataset

split_to_caipy

Save a particular split to cAIpy.

dataset_to_caipy(dataset: Dataset, output_path: Path | str, use_schema: bool = False, json_schema: str | Path | None = 'default', copy_images: bool = True, to_jpg: bool = True, overwrite_images: bool = True, overwrite_labels: bool = True, flatten_paths: bool = True) None[source]#

Save dataset to cAIpy format Note that depending on the splits present in your dataset, the folder structure might change

Note

Unless specified otherwise, relative paths of images a flattened during the export, which modifies the dataset if the images and annotations were stored in subfolders, but will put all images and annotations of a particular split in their respective root folder.

Note

If schema is not given, the nested dictionary will be deduced from column names with the separator “.”

Parameters:
  • dataset – dataset to save

  • output_path – root folder where the dataset folder structure will be created.

  • use_schema – If set to True, and json_schema is not None, will use schema for validation and formatting (see option json_schema)

  • json_schema – Path to a schema that output json dicts will be tested against for compliance. They will also be used to remove columns for fields no included in the schema. Can be either a url or a local path. If set to None, or use_schema is set to False, will not perform any test or reformatting. Defaults to default schema.

  • copy_images – If set to False, will create a symbolic link instead of copying. Much faster, but needs to keep original images in the same relative path. Defaults to False.

  • to_jpg – if True, will convert images to jpg if needed. Defaults to True.

  • overwrite_images – if set to False, will skip images that are already copied. Defaults to True.

  • overwrite_labels – if set to False, will skip annotation that are already created. Defaults to True.

  • flatten_paths – if set to True, will put all files in the root Annotations and Images folders by replacing folder separation (“/”) with “_” in relative path. Defaults to True

dataset_to_caipy_generic(dataset: Dataset, output_images_folder: Path | str | None, output_annotations_folder: Path | str, use_schema: bool = False, json_schema: str | Path | None = 'default', copy_images: bool = True, to_jpg: bool = True, overwrite_images: bool = True, overwrite_labels: bool = True, flatten_paths: bool = True) None[source]#

Save dataset to cAIpy format Note that depending on the splits present in your dataset, the folder structure might change

Notes

  • Unless specified otherwise, relative paths of images a flattened during the export, which modifies the dataset if the images and annotations were stored in subfolders, but will put all images and annotations of a particular split in their respective root folder.

  • If schema is not given, the nested dictionary will be deduced from column names with the separator “.”

Parameters:
  • dataset – dataset to save

  • output_images_folder – root folder where the images will be saved. If None, will not save images. Useful when only saving predictions or a variations of annotations.

  • output_annotations_folder – root folder where the json file will be saved.

  • use_schema – If set to True, and json_schema is not None, will use schema for validation and formatting (see option json_schema)

  • json_schema – Path to a schema that output json dicts will be tested against for compliance. They will also be used to remove columns for fields no included in the schema. Can be either a url or a localt path. If set to None, or use_schema is set to False, will not perform any test. Defaults to the example schema.

  • copy_images – If set to False, will create a symbolic link instead of copying. Much faster, but needs to keep original images in the same relative path. Defaults to False.

  • to_jpg – if True, will convert images to jpg if needed. Defaults to True.

  • overwrite_images – if set to False, will skip images that are already copied. Defaults to True.

  • overwrite_labels – if set to False, will skip annotation that are already created. Defaults to True.

  • flatten_paths – if set to True, will put all files in the root Annotations and Images folders by replacing folder separation (“/”) with “_” in relative path. Defaults to True

from_caipy(dataset_path: Path | str, dataset_name: str | None = None, split: str | None = None, splits_to_read: str | Iterable[str] | None = None, use_schema: bool = False, json_schema: dict | str | Path | None = 'default', booleanize: bool = True) Dataset[source]#

Load a dataset stored in the cAIpy format

See specifications

This will error if

  • two annotations have the same category_id but not the same category_str

  • two annotations have a different category_id but the same category_str

  • two images have the same file_name, but not the same id

Parameters:
  • dataset_path – folder root of dataset. Should contain the folders “Images” and “Annotations”.

  • dataset_name – If specified, will be the dataset name, used when showing the dataset or exporting in other formats such as fiftyone. If not specified, the dataset name will be the name of the root folder.

  • split – if data is at the root of Images and Annotations folder, the split will be set to this option. Defaults to None

  • splits_to_read – if given, will only read the specified splits. Useful for a faster loading.

  • use_schema – If set to True, and json_schema is not None, will use schema for validation and formatting (see option json_schema)

  • json_schema – schema dictionary or Path to a schema that json files will be tested against for compliance. If its not a dictionary, it can be either a url, or a local path. If set to None, or use_schema is set to False, will not perform any test. Defaults to default schema.

  • booleanize – In the case some attributes are array of enum with unique elements, they will be booleanized (see booleanize()). Note that this option is only used if json_schema` is not None and use_schema is set to True. Defaults to True.

Raises:

ValueError – Inconsistency between two annotations or images (see description above)

Returns:

Loaded dataset object

from_caipy_generic(images_folder: Path | str | None, annotations_folder: Path | str, dataset_name: str | None = None, split: str | None = None, splits_to_read: str | Iterable[str] | None = None, use_schema: bool = False, json_schema: dict | str | Path | None = 'default', booleanize: bool = True) Dataset[source]#

Load a dataset stored in the cAIpy format, but you can specify images and annotations folders rather than giving a folder with Images and Annotations sub-folders. This gives much more flexibility, especially when working predictions and annotations variations.

See specifications

this will error if

  • two annotations have the same category_id but not the same category_str

  • two annotations have a different category_id but the same category_str

  • two images have the same file_name, but not the same id

Parameters:
  • images_folder – folder root of images.

  • annotations_folder – folder root of annotations.

  • dataset_name – If specified, will be the dataset name, used when showing the dataset or exporting in other formats such as fiftyone.

  • split – if data is at the root of Images and Annotations folder, the split will be set to this option. Defaults to None

  • splits_to_read – if given, will only read the specified splits. Useful for a faster loading.

  • use_schema – If set to True, and json_schema is not None, will use schema for validation and formatting (see option json_schema)

  • json_schema – schema dictionary or Path to a schema that json files will be tested against for compliance. If its not a dictionary, it can be either a url or a local path. If set to None, or use_schema is set to False, will not perform any test. Defaults to default schema.

  • booleanize – In the case some attributes are array of enum with unique elements, they will be booleanized (see booleanize()). Note that this option is only used if json_schema` is not None and use_schema is set to True. Defaults to True.

Raises:

ValueError – Inconsistency between two annotations or images (see description above)

Returns:

Loaded dataset object

load_caipy_annot_folder(folder_path: Path, schema: dict | None = None) tuple[DataFrame | None, DataFrame | None][source]#

Glob all json in folder path and construct image and annotation dataframe

Parameters:
  • folder_path – folder where we will search for json files

  • schema – Optional JSON schema dict used to check the conformity of loaded JSON files.

Returns:

A pair of dataframes, representing image and annotations data, most likely used to construct the dataset object.

load_caipy_split(images_folder: Path, annotations_folder: Path, dataset_name: str | None = None, split_name: str | None = None, schema: dict | None = None) Dataset[source]#

Load a particular caipy split folder and convert it to a lours Dataset

Parameters:
  • images_folder – folder where images are stored

  • annotations_folder – folder where annotations are stored as json files

  • dataset_name – If specified, will be the dataset name, used when showing the dataset or exporting in other formats such as fiftyone. If not specified, the dataset name will be the name of the root folder.

  • split_name – name of the split to give to split column of images DataFrame. Defaults to None.

  • schema – JSON schema dict used to check the conformity of loaded JSON files. If set to None, will not check the conformity. Defaults to None.

Raises:

ValueError – If image ids are not mutually exclusives

Returns:

Dataset containing only one split from caipy, expected to be merged with other caipy splits

split_to_caipy(dataset: Dataset, split_images_folder: Path | None, split_annotations_folder: Path, schema: dict | None = None, copy_images: bool = True, to_jpg: bool = True, overwrite_images: bool = True, overwrite_labels: bool = True, flatten_paths: bool = True) None[source]#

Save a particular split to cAIpy. images and annotations folder must be given, as it can be the root of “Images” and “Annotations”, or a subfolder based on split name, e.g. “Images/train”

Note

Unless specified otherwise, relative paths of images a flattened during the export, which modifies the dataset if the images and annotations were stored in subfolders, but will put all images and annotations in their respective root folder.

Note

If schema is not given, the nested dictionary will be deduced from column names with the separator “.”

Parameters:
  • dataset – dataset object to save. Normally, should be a unique split

  • split_images_folder – dataset where to save images, either as links or files. If None, will not save images. This is useful when you just want to save predictions or a variation of annotations.

  • split_annotations_folder – dataset where to save caipyjson files.

  • schema – JSON schema dict used to check the conformity of output JSON files. It will also be used to remove columns for fields no included in the schema. If set to None, will not check the conformity. Defaults to None.

  • copy_images – If set to False, will create a symbolic link instead of copying. Much faster, but needs to keep original images in the same relative path. Defaults to False.

  • to_jpg – if True, will convert images to jpg if needed. Defaults to True.

  • overwrite_images – if set to False, will skip images that are already copied. Defaults to True.

  • overwrite_labels – if set to False, will skip annotation that are already created. Defaults to True.

  • flatten_paths – if set to True, will put all files in the root Annotations and Images folders by replacing folder separation (“/”) with “_” in relative path. Defaults to True