parquet_saver#
Functions
Create dictionary from folder created with the function |
|
Save a dictionary containing dataframes as a yaml file and parquet files. |
- dict_from_parquet(input_dir: Path, fields_to_path: Iterable[str] = ('relative_path', 'images_root')) dict[source]#
Create dictionary from folder created with the function
dict_to_parquet()- Parameters:
input_dir – folder containing yaml and parquet files
fields_to_path – Iterable of strings to specify which columns will need to be converted to Path objects.
- Returns:
created dictionary. Will be used to reconstruct objects with dataframes, such as Dataset or Evaluator.
- dict_to_parquet(output_dict: dict, output_dir: Path, version: str = '1.2.0', fields: Iterable[str] | None = None, fields_to_str: Iterable[str] = ('images_root', 'relative_path', 'images_root'), overwrite: bool = False) None[source]#
Save a dictionary containing dataframes as a yaml file and parquet files.
The dataset can be nested.
- Parameters:
output_dict – dictionary to save, containing yaml serializable objects and dataframes. Can be nested.
output_dir – path to folder where to save the yaml and parquets files
version – data version info for future compatibility. Defaults to current Lours version.
fields – fields to save. Will ignore other fields in the output dictionary. If set to None, will save all fields. Defaults to None.
fields_to_str – fields to convert to str. Useful for non-serializable objects like Path
overwrite – if set to True, will remove the
output_dirdirectory if it already exists. If set to False, will check that the directory either does not exist or is empty. Defaults to False
- Raises:
OSError – Raised when the output directory is not empty and
overwriteis set to False