from_caipy#

from_caipy(dataset_path: Path | str, dataset_name: str | None = None, split: str | None = None, splits_to_read: str | Iterable[str] | None = None, use_schema: bool = False, json_schema: dict | str | Path | None = 'default', booleanize: bool = True) Dataset[source]#

Load a dataset stored in the cAIpy format

See specifications

This will error if

  • two annotations have the same category_id but not the same category_str

  • two annotations have a different category_id but the same category_str

  • two images have the same file_name, but not the same id

Parameters:
  • dataset_path – folder root of dataset. Should contain the folders “Images” and “Annotations”.

  • dataset_name – If specified, will be the dataset name, used when showing the dataset or exporting in other formats such as fiftyone. If not specified, the dataset name will be the name of the root folder.

  • split – if data is at the root of Images and Annotations folder, the split will be set to this option. Defaults to None

  • splits_to_read – if given, will only read the specified splits. Useful for a faster loading.

  • use_schema – If set to True, and json_schema is not None, will use schema for validation and formatting (see option json_schema)

  • json_schema – schema dictionary or Path to a schema that json files will be tested against for compliance. If its not a dictionary, it can be either a url, or a local path. If set to None, or use_schema is set to False, will not perform any test. Defaults to default schema.

  • booleanize – In the case some attributes are array of enum with unique elements, they will be booleanized (see booleanize()). Note that this option is only used if json_schema` is not None and use_schema is set to True. Defaults to True.

Raises:

ValueError – Inconsistency between two annotations or images (see description above)

Returns:

Loaded dataset object