common#
Functions
Construct label map from annotation DataFrame, with |
|
String converter tool to read a file, parse and automatically convert the string to integer or float if possible. |
|
Get image information, either from image info dataframe or from image itself, getting the image dimension by reading its header |
|
Function to scrape all images in a folder, starting from a list of img formats |
|
Tool function to get relative path between dataset_path and image_path, which might be absolute. |
|
Deduce name of dataset and name of split by assuming it is in the form '<dataset_name>_<split_name>.<extension>' |
|
Create the dataset object from aggregated lists of dictionaries |
- construct_label_map(annotations: DataFrame) dict[int, str][source]#
Construct label map from annotation DataFrame, with
category_idandcategory_strcolumns. Get all category string associated with each category id. Normally, there should be only one per id- Parameters:
annotations – DataFrame containing category id and category name information. Should contain at least
category_idandcategory_strcolumns.- Raises:
ValueError – Inconsistency in category ids and names. The
id -> namemapping should be bijective.- Returns:
dictionary containing label map, with category id as key, and category name as value
- convert_str(string: str) str | int | float[source]#
String converter tool to read a file, parse and automatically convert the string to integer or float if possible. Will first try to convert to int, then float, then will return as is.
- Parameters:
string – string containing information to be parsed
- Returns:
converted string, in the most convenient format
- get_image_info(image_number: int, relative_path: Path, absolute_path: Path | None, image_info: DataFrame | None = None) dict[str, Any][source]#
Get image information, either from image info dataframe or from image itself, getting the image dimension by reading its header
- Parameters:
image_number – number of image in the file list. If image_info is not available, will be used for image id
relative_path – path that will be used to find the image in the image_info dataframe, if given
absolute_path – absolute to load the image data directly from the file. Can be None if image_info has an entry with the same
relative_pathvalueimage_info – DataFrame including image size and image id to match the ids of another dataset for example. Must have at least
relative_path,widthandheightcolumns. Defaults to None.
- Returns:
dictionary with width height and id keys
- get_images_from_folder(folder_path: Path, img_formats: Iterable[str] = ('bmp', 'dng', 'jpeg', 'jpg', 'mpo', 'png', 'tif', 'tiff', 'webp', 'pfm')) list[Path][source]#
Function to scrape all images in a folder, starting from a list of img formats
- Parameters:
folder_path – where to search images
img_formats – list of file extensions to consider during the globbing
- Returns:
list of all paths leading to an image with the desired extensions.
- get_relative_image_path(dataset_path: Path, image_path: Path | str) Path[source]#
Tool function to get relative path between dataset_path and image_path, which might be absolute. Used to populate the
relative_pathin the images dataframe of the dataset or evaluator object, which should check the fact thatdataset.images_root / relative_pathshould always lead to a valid image file- Parameters:
dataset_path – root path of considered dataset
image_path – image path of a particular image. May be absolute, and need to be converted to be relative the dataset_path
- Raises:
ValueError – image_path is not included in dataset_path. Probably means the dataset path is too specific and should be higher in file hierarchy
- Returns:
Converted image path to be relative to dataset path
- parse_annotation_name(annotations_file_path: str | Path, split_name_mapping: dict[str, list[str]] | None = None) tuple[str | None, str | None][source]#
Deduce name of dataset and name of split by assuming it is in the form ‘<dataset_name>_<split_name>.<extension>’
For example, ‘coco_train.json’ will be parsed to return ‘coco’ and ‘train’
- Parameters:
annotations_file_path – name of the annotation file without extension or path to the annotation file which name will be parsed.
split_name_mapping – Dictionary with split names you want to appear in the lours dataset as keys and a list of possible words you want this name to replace as values. For example, remap split names abbreviations to their full name so that “val” becomes “validation”. If set to None, will simply map variations of ‘train’, ‘valid’, ‘eval’ to them, i.e. ‘training’ gets replaced by ‘train’, ‘val’ and ‘validation’ get replaced by ‘valid’ and ‘evaluation’ and ‘test’ get replaced by ‘eval’. Defaults to None.
- Returns:
dataset name and split name. They can be none in the case parsing was not successful.
- Return type:
tuple containing two names
Example
>>> parse_annotation_name("my_dataset_test") ('my_dataset', 'eval') >>> parse_annotation_name("my_dataset_hello", {"hey": ["hello", "hi"]}) ('my_dataset', 'hey') >>> parse_annotation_name("my_dataset") ('my', 'dataset') >>> parse_annotation_name("mydataset") ('mydataset', None)
- to_dataset_object(images_root: Path, label_map: dict[int, str] | None, images: Sequence[dict], annotations: Sequence[dict], box_format: str = 'cxwcyh', ids_map: dict[int, dict[str, Any]] | None = None) Dataset[source]#
Create the dataset object from aggregated lists of dictionaries
- Parameters:
images_root – path where the images are located and from where relative paths are given
label_map – dictionary of category id vs category name
images – list of image dictionaries. Each dictionary is one image
annotations – list of annotations dictionaries
box_format – expected type of box format. See
lours.utils.bbox_converterDefaults to “cxwcyh”ids_map – dictionary to remap classes back to their original id values. This is a special case of darknet where the ids are almost always changed because they need to be sequential
- Returns:
created dataset objects with the right category ids