from_caipy_generic#

from_caipy_generic(images_folder: Path | str | None, annotations_folder: Path | str, dataset_name: str | None = None, split: str | None = None, splits_to_read: str | Iterable[str] | None = None, use_schema: bool = False, json_schema: dict | str | Path | None = 'default', booleanize: bool = True) Dataset[source]#

Load a dataset stored in the cAIpy format, but you can specify images and annotations folders rather than giving a folder with Images and Annotations sub-folders. This gives much more flexibility, especially when working predictions and annotations variations.

See specifications

this will error if

  • two annotations have the same category_id but not the same category_str

  • two annotations have a different category_id but the same category_str

  • two images have the same file_name, but not the same id

Parameters:
  • images_folder – folder root of images.

  • annotations_folder – folder root of annotations.

  • dataset_name – If specified, will be the dataset name, used when showing the dataset or exporting in other formats such as fiftyone.

  • split – if data is at the root of Images and Annotations folder, the split will be set to this option. Defaults to None

  • splits_to_read – if given, will only read the specified splits. Useful for a faster loading.

  • use_schema – If set to True, and json_schema is not None, will use schema for validation and formatting (see option json_schema)

  • json_schema – schema dictionary or Path to a schema that json files will be tested against for compliance. If its not a dictionary, it can be either a url or a local path. If set to None, or use_schema is set to False, will not perform any test. Defaults to default schema.

  • booleanize – In the case some attributes are array of enum with unique elements, they will be booleanized (see booleanize()). Note that this option is only used if json_schema` is not None and use_schema is set to True. Defaults to True.

Raises:

ValueError – Inconsistency between two annotations or images (see description above)

Returns:

Loaded dataset object