caipy#
Functions
Save dataset to cAIpy format Note that depending on the splits present in your dataset, the folder structure might change |
|
Save dataset to cAIpy format Note that depending on the splits present in your dataset, the folder structure might change |
|
Load a dataset stored in the cAIpy format |
|
Load a dataset stored in the cAIpy format, but you can specify images and annotations folders rather than giving a folder with Images and Annotations sub-folders. |
|
Glob all json in folder path and construct image and annotation dataframe |
|
Load a particular caipy split folder and convert it to a lours Dataset |
|
Save a particular split to cAIpy. |
- dataset_to_caipy(dataset: Dataset, output_path: Path | str, use_schema: bool = False, json_schema: str | Path | None = 'default', copy_images: bool = True, to_jpg: bool = True, overwrite_images: bool = True, overwrite_labels: bool = True, flatten_paths: bool = True) None[source]#
Save dataset to cAIpy format Note that depending on the splits present in your dataset, the folder structure might change
Note
Unless specified otherwise, relative paths of images a flattened during the export, which modifies the dataset if the images and annotations were stored in subfolders, but will put all images and annotations of a particular split in their respective root folder.
Note
If schema is not given, the nested dictionary will be deduced from column names with the separator “.”
- Parameters:
dataset – dataset to save
output_path – root folder where the dataset folder structure will be created.
use_schema – If set to True, and
json_schemais not None, will use schema for validation and formatting (see optionjson_schema)json_schema – Path to a schema that output json dicts will be tested against for compliance. They will also be used to remove columns for fields no included in the schema. Can be either a url or a local path. If set to None, or
use_schemais set to False, will not perform any test or reformatting. Defaults to default schema.copy_images – If set to False, will create a symbolic link instead of copying. Much faster, but needs to keep original images in the same relative path. Defaults to False.
to_jpg – if True, will convert images to jpg if needed. Defaults to True.
overwrite_images – if set to False, will skip images that are already copied. Defaults to True.
overwrite_labels – if set to False, will skip annotation that are already created. Defaults to True.
flatten_paths – if set to True, will put all files in the root Annotations and Images folders by replacing folder separation (“/”) with “_” in relative path. Defaults to True
- dataset_to_caipy_generic(dataset: Dataset, output_images_folder: Path | str | None, output_annotations_folder: Path | str, use_schema: bool = False, json_schema: str | Path | None = 'default', copy_images: bool = True, to_jpg: bool = True, overwrite_images: bool = True, overwrite_labels: bool = True, flatten_paths: bool = True) None[source]#
Save dataset to cAIpy format Note that depending on the splits present in your dataset, the folder structure might change
Notes
Unless specified otherwise, relative paths of images a flattened during the export, which modifies the dataset if the images and annotations were stored in subfolders, but will put all images and annotations of a particular split in their respective root folder.
If schema is not given, the nested dictionary will be deduced from column names with the separator “.”
- Parameters:
dataset – dataset to save
output_images_folder – root folder where the images will be saved. If None, will not save images. Useful when only saving predictions or a variations of annotations.
output_annotations_folder – root folder where the json file will be saved.
use_schema – If set to True, and
json_schemais not None, will use schema for validation and formatting (see optionjson_schema)json_schema – Path to a schema that output json dicts will be tested against for compliance. They will also be used to remove columns for fields no included in the schema. Can be either a url or a localt path. If set to None, or
use_schemais set to False, will not perform any test. Defaults to the example schema.copy_images – If set to False, will create a symbolic link instead of copying. Much faster, but needs to keep original images in the same relative path. Defaults to False.
to_jpg – if True, will convert images to jpg if needed. Defaults to True.
overwrite_images – if set to False, will skip images that are already copied. Defaults to True.
overwrite_labels – if set to False, will skip annotation that are already created. Defaults to True.
flatten_paths – if set to True, will put all files in the root Annotations and Images folders by replacing folder separation (“/”) with “_” in relative path. Defaults to True
- from_caipy(dataset_path: Path | str, dataset_name: str | None = None, split: str | None = None, splits_to_read: str | Iterable[str] | None = None, use_schema: bool = False, json_schema: dict | str | Path | None = 'default', booleanize: bool = True) Dataset[source]#
Load a dataset stored in the cAIpy format
See specifications
This will error if
two annotations have the same
category_idbut not the samecategory_strtwo annotations have a different
category_idbut the samecategory_strtwo images have the same
file_name, but not the sameid
- Parameters:
dataset_path – folder root of dataset. Should contain the folders “Images” and “Annotations”.
dataset_name – If specified, will be the dataset name, used when showing the dataset or exporting in other formats such as fiftyone. If not specified, the dataset name will be the name of the root folder.
split – if data is at the root of Images and Annotations folder, the split will be set to this option. Defaults to
Nonesplits_to_read – if given, will only read the specified splits. Useful for a faster loading.
use_schema – If set to True, and
json_schemais not None, will use schema for validation and formatting (see optionjson_schema)json_schema – schema dictionary or Path to a schema that json files will be tested against for compliance. If its not a dictionary, it can be either a url, or a local path. If set to None, or
use_schemais set to False, will not perform any test. Defaults to default schema.booleanize – In the case some attributes are array of enum with unique elements, they will be booleanized (see
booleanize()). Note that this option is only used if json_schema` is not None anduse_schemais set to True. Defaults to True.
- Raises:
ValueError – Inconsistency between two annotations or images (see description above)
- Returns:
Loaded dataset object
- from_caipy_generic(images_folder: Path | str | None, annotations_folder: Path | str, dataset_name: str | None = None, split: str | None = None, splits_to_read: str | Iterable[str] | None = None, use_schema: bool = False, json_schema: dict | str | Path | None = 'default', booleanize: bool = True) Dataset[source]#
Load a dataset stored in the cAIpy format, but you can specify images and annotations folders rather than giving a folder with Images and Annotations sub-folders. This gives much more flexibility, especially when working predictions and annotations variations.
See specifications
this will error if
two annotations have the same
category_idbut not the samecategory_strtwo annotations have a different
category_idbut the samecategory_strtwo images have the same
file_name, but not the sameid
- Parameters:
images_folder – folder root of images.
annotations_folder – folder root of annotations.
dataset_name – If specified, will be the dataset name, used when showing the dataset or exporting in other formats such as fiftyone.
split – if data is at the root of Images and Annotations folder, the split will be set to this option. Defaults to
Nonesplits_to_read – if given, will only read the specified splits. Useful for a faster loading.
use_schema – If set to True, and
json_schemais not None, will use schema for validation and formatting (see optionjson_schema)json_schema – schema dictionary or Path to a schema that json files will be tested against for compliance. If its not a dictionary, it can be either a url or a local path. If set to None, or
use_schemais set to False, will not perform any test. Defaults to default schema.booleanize – In the case some attributes are array of enum with unique elements, they will be booleanized (see
booleanize()). Note that this option is only used if json_schema` is not None anduse_schemais set to True. Defaults to True.
- Raises:
ValueError – Inconsistency between two annotations or images (see description above)
- Returns:
Loaded dataset object
- load_caipy_annot_folder(folder_path: Path, schema: dict | None = None) tuple[DataFrame | None, DataFrame | None][source]#
Glob all json in folder path and construct image and annotation dataframe
- Parameters:
folder_path – folder where we will search for json files
schema – Optional JSON schema dict used to check the conformity of loaded JSON files.
- Returns:
A pair of dataframes, representing image and annotations data, most likely used to construct the dataset object.
- load_caipy_split(images_folder: Path, annotations_folder: Path, dataset_name: str | None = None, split_name: str | None = None, schema: dict | None = None) Dataset[source]#
Load a particular caipy split folder and convert it to a lours Dataset
- Parameters:
images_folder – folder where images are stored
annotations_folder – folder where annotations are stored as json files
dataset_name – If specified, will be the dataset name, used when showing the dataset or exporting in other formats such as fiftyone. If not specified, the dataset name will be the name of the root folder.
split_name – name of the split to give to
splitcolumn of images DataFrame. Defaults toNone.schema – JSON schema dict used to check the conformity of loaded JSON files. If set to
None, will not check the conformity. Defaults toNone.
- Raises:
ValueError – If image ids are not mutually exclusives
- Returns:
Dataset containing only one split from caipy, expected to be merged with other caipy splits
- split_to_caipy(dataset: Dataset, split_images_folder: Path | None, split_annotations_folder: Path, schema: dict | None = None, copy_images: bool = True, to_jpg: bool = True, overwrite_images: bool = True, overwrite_labels: bool = True, flatten_paths: bool = True) None[source]#
Save a particular split to cAIpy. images and annotations folder must be given, as it can be the root of “Images” and “Annotations”, or a subfolder based on split name, e.g. “Images/train”
Note
Unless specified otherwise, relative paths of images a flattened during the export, which modifies the dataset if the images and annotations were stored in subfolders, but will put all images and annotations in their respective root folder.
Note
If schema is not given, the nested dictionary will be deduced from column names with the separator “.”
- Parameters:
dataset – dataset object to save. Normally, should be a unique split
split_images_folder – dataset where to save images, either as links or files. If None, will not save images. This is useful when you just want to save predictions or a variation of annotations.
split_annotations_folder – dataset where to save caipyjson files.
schema – JSON schema dict used to check the conformity of output JSON files. It will also be used to remove columns for fields no included in the schema. If set to
None, will not check the conformity. Defaults toNone.copy_images – If set to False, will create a symbolic link instead of copying. Much faster, but needs to keep original images in the same relative path. Defaults to False.
to_jpg – if True, will convert images to jpg if needed. Defaults to True.
overwrite_images – if set to False, will skip images that are already copied. Defaults to True.
overwrite_labels – if set to False, will skip annotation that are already created. Defaults to True.
flatten_paths – if set to True, will put all files in the root Annotations and Images folders by replacing folder separation (“/”) with “_” in relative path. Defaults to True