parquet_saver#

Functions

dict_from_parquet

Create dictionary from folder created with the function dict_to_parquet()

dict_to_parquet

Save a dictionary containing dataframes as a yaml file and parquet files.

dict_from_parquet(input_dir: Path, fields_to_path: Iterable[str] = ('relative_path', 'images_root')) dict[source]#

Create dictionary from folder created with the function dict_to_parquet()

Parameters:
  • input_dir – folder containing yaml and parquet files

  • fields_to_path – Iterable of strings to specify which columns will need to be converted to Path objects.

Returns:

created dictionary. Will be used to reconstruct objects with dataframes, such as Dataset or Evaluator.

dict_to_parquet(output_dict: dict, output_dir: Path, version: str = '1.0.1', fields: Iterable[str] | None = None, fields_to_str: Iterable[str] = ('images_root', 'relative_path', 'images_root'), overwrite: bool = False) None[source]#

Save a dictionary containing dataframes as a yaml file and parquet files.

The dataset can be nested.

Parameters:
  • output_dict – dictionary to save, containing yaml serializable objects and dataframes. Can be nested.

  • output_dir – path to folder where to save the yaml and parquets files

  • version – data version info for future compatibility. Defaults to current Lours version.

  • fields – fields to save. Will ignore other fields in the output dictionary. If set to None, will save all fields. Defaults to None.

  • fields_to_str – fields to convert to str. Useful for non-serializable objects like Path

  • overwrite – if set to True, will remove the output_dir directory if it already exists. If set to False, will check that the directory either does not exist or is empty. Defaults to False

Raises:

OSError – Raised when the output directory is not empty and overwrite is set to False