to_parquet#

Dataset.to_parquet(output_dir: Path | str, overwrite: bool = False) None[source]#

Save dataset object to a folder containing parquet files for dataframes and a metadata.yaml file for other attributes.

Note

The dataframe dtypes must be serializable as parquet. This includes int, float, strings, lists; but not custom objects like e.g. pathlib.Path

Parameters:
  • output_dir – folder path where to save the object’s attributes. If overwrite is set to False, it must not already exist.

  • overwrite – If set to True, will remove the output_dir directory if it already exists. Defaults to False