common#

Functions

construct_label_map

Construct label map from annotation DataFrame, with category_id and category_str columns.

convert_str

String converter tool to read a file, parse and automatically convert the string to integer or float if possible.

get_image_info

Get image information, either from image info dataframe or from image itself, getting the image dimension by reading its header

get_images_from_folder

Function to scrape all images in a folder, starting from a list of img formats

get_relative_image_path

Tool function to get relative path between dataset_path and image_path, which might be absolute.

parse_annotation_name

Deduce name of dataset and name of split by assuming it is in the form '<dataset_name>_<split_name>.<extension>'

to_dataset_object

Create the dataset object from aggregated lists of dictionaries

construct_label_map(annotations: DataFrame) dict[int, str][source]#

Construct label map from annotation DataFrame, with category_id and category_str columns. Get all category string associated with each category id. Normally, there should be only one per id

Parameters:

annotations – DataFrame containing category id and category name information. Should contain at least category_id and category_str columns.

Raises:

ValueError – Inconsistency in category ids and names. The id -> name mapping should be bijective.

Returns:

dictionary containing label map, with category id as key, and category name as value

convert_str(string: str) str | int | float[source]#

String converter tool to read a file, parse and automatically convert the string to integer or float if possible. Will first try to convert to int, then float, then will return as is.

Parameters:

string – string containing information to be parsed

Returns:

converted string, in the most convenient format

get_image_info(image_number: int, relative_path: Path, absolute_path: Path | None, image_info: DataFrame | None = None) dict[str, Any][source]#

Get image information, either from image info dataframe or from image itself, getting the image dimension by reading its header

Parameters:
  • image_number – number of image in the file list. If image_info is not available, will be used for image id

  • relative_path – path that will be used to find the image in the image_info dataframe, if given

  • absolute_path – absolute to load the image data directly from the file. Can be None if image_info has an entry with the same relative_path value

  • image_info – DataFrame including image size and image id to match the ids of another dataset for example. Must have at least relative_path, width and height columns. Defaults to None.

Returns:

dictionary with width height and id keys

get_images_from_folder(folder_path: Path, img_formats: Iterable[str] = ('bmp', 'dng', 'jpeg', 'jpg', 'mpo', 'png', 'tif', 'tiff', 'webp', 'pfm')) list[Path][source]#

Function to scrape all images in a folder, starting from a list of img formats

Parameters:
  • folder_path – where to search images

  • img_formats – list of file extensions to consider during the globbing

Returns:

list of all paths leading to an image with the desired extensions.

get_relative_image_path(dataset_path: Path, image_path: Path | str) Path[source]#

Tool function to get relative path between dataset_path and image_path, which might be absolute. Used to populate the relative_path in the images dataframe of the dataset or evaluator object, which should check the fact that dataset.images_root / relative_path should always lead to a valid image file

Parameters:
  • dataset_path – root path of considered dataset

  • image_path – image path of a particular image. May be absolute, and need to be converted to be relative the dataset_path

Raises:

ValueError – image_path is not included in dataset_path. Probably means the dataset path is too specific and should be higher in file hierarchy

Returns:

Converted image path to be relative to dataset path

parse_annotation_name(annotations_file_path: str | Path, split_name_mapping: dict[str, list[str]] | None = None) tuple[str | None, str | None][source]#

Deduce name of dataset and name of split by assuming it is in the form ‘<dataset_name>_<split_name>.<extension>’

For example, ‘coco_train.json’ will be parsed to return ‘coco’ and ‘train’

Parameters:
  • annotations_file_path – name of the annotation file without extension or path to the annotation file which name will be parsed.

  • split_name_mapping – Dictionary with split names you want to appear in the lours dataset as keys and a list of possible words you want this name to replace as values. For example, remap split names abbreviations to their full name so that “val” becomes “validation”. If set to None, will simply map variations of ‘train’, ‘valid’, ‘eval’ to them, i.e. ‘training’ gets replaced by ‘train’, ‘val’ and ‘validation’ get replaced by ‘valid’ and ‘evaluation’ and ‘test’ get replaced by ‘eval’. Defaults to None.

Returns:

dataset name and split name. They can be none in the case parsing was not successful.

Return type:

tuple containing two names

Example

>>> parse_annotation_name("my_dataset_test")
('my_dataset', 'eval')
>>> parse_annotation_name("my_dataset_hello", {"hey": ["hello", "hi"]})
('my_dataset', 'hey')
>>> parse_annotation_name("my_dataset")
('my', 'dataset')
>>> parse_annotation_name("mydataset")
('mydataset', None)
to_dataset_object(images_root: Path, label_map: dict[int, str] | None, images: Sequence[dict], annotations: Sequence[dict], box_format: str = 'cxwcyh', ids_map: dict[int, dict[str, Any]] | None = None) Dataset[source]#

Create the dataset object from aggregated lists of dictionaries

Parameters:
  • images_root – path where the images are located and from where relative paths are given

  • label_map – dictionary of category id vs category name

  • images – list of image dictionaries. Each dictionary is one image

  • annotations – list of annotations dictionaries

  • box_format – expected type of box format. See lours.utils.bbox_converter Defaults to “cxwcyh”

  • ids_map – dictionary to remap classes back to their original id values. This is a special case of darknet where the ids are almost always changed because they need to be sequential

Returns:

created dataset objects with the right category ids