Changelog (archived)#

Note

This CHANGELOG refers to the time this project was maintained internally by XXII under the name “Libia”. Since the commit history has been removed for security reasons, the chaneglog is kept for informational purpose and should not be modified. The new CHANGELOG is here

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[2.1.1 - 2024-06-21]#

Pin numpy version to be <2

[2.1.0 - 2024-06-21]#

Added#

Add Dataset.remove_invalid_images and Dataset.remove_invalid_annotations methods.
Add mark_origin and overwrite_origin options to Dataset.merge method
Add from_pascalVOC_detection and from_pascalVOC_generic functions to load pascal datasets
Add dataset_regression fixture for pytest that will test that datasets are the same
Add more examples to documentation

Fixed#

Fix spelling errors

Changed#

Upgrade minimum version to 3.10, so long python 3.9!
Upgrade pre-commit template and run it
Change most dataset method return types to Self instead of simply "Dataset"
Change classmethod Dataset.from_template to be a simple method. Note that this change is not breaking, as Dataset.from_template(input_dataset, **kwargs) is equivalent to input_dataset.from_template(**kwargs)
from_coco and from_crowdhuman both try to parse intelligently the annotation file path to extract both the dataset name and the split name, thanks to a new function libia.dataset.io.common.parse_annotation_name
Dataset.merge now automatically convert images root of a dataset to absolute if the other is also absolute
to_fiftyone methods (for dataset and evaluator) now accept a existing option to handle existing dataset. You can now erase the existing dataset before uploading yours, or raise an error if it exists. Possibly breaking : default behaviour of to_fiftyone methods was “update” and is now “error”
Dataset.match_index now accepts a dataset as well as an image dataframe like before
Dataset.remap_from_other now accepts remove_not_mapped and remove_emptied_images options to remove classes that are not present in the other dataset.
Evaluator now accepts a prediction label map that is neither a subset nor a superset of ground truth label map, and will assume only false negative and false positive for the not mutual classes.
dummy_dataset now accepts options keypoints_share and add_confidence to make crowd datasets and predictions
Dataset.add_annotations and annotations_appender.append now accepts more flexible attributes shapes, and then broadcast them together.

[2.0.1 - 2024-05-29]#

Added#

Add the possibility to test dataset equality modulo columns that are all NaNs
Add warning message when label map is incomplete, and complete it with the simple id -> str(id) mapping for missing ids
Add check_exhaustive option to Dataset.check and assert_images_valid functions

Fixed#

Fix c2p CLI tool to effectively remove a detection when it is modified
Dataset.remove_empty_images now keeps the dataset name
add docs for darknet IO
Suppress some FutureWarning from pandas during tests
fix bug for caipy when split is pd.NA instead of None or np.NaN
fix bug when loading caipy with splits_to_read set to non existing splits
Code spelling

[2.0.0 - 2024-04-02]#

Added#

Add input format option for COCO loading, making it possible to load XY coordinates instead of just bounding boxes
Add from_coco_keypoints function for loading COCO data with points and only one class.
Add compatibility with caipyjson tags and attributes, and more generally any kind of nested dictionary
Add column boooleanizer (and debooleanizer) to go from a list objects to columns of boolean value for better queries
Add Crowd detection evaluator with Mean Average Error metric for count
Add reindex function
Add from_mot function for loading datasets in MOT format. See https://motchallenge.net/instructions/
Add a method to compute confusion matrix for DetectionEvaluator
Add reindex function
Add yolov7 compatibility with a Dataset.to_yolov7 method.
Add automatic compliance with schema when saving to caipy
Add compatibility with caipy splits independently indexed
Add iterator helper methods to Dataset like Dataset.iter_images and Dataset.iter_splits to make it easier to iterate by a specific attribute
When loaded with a schema, from_caipy automatically set missing arrays to the empty list and other fields to their default value specified in the schema when at least one sample in the caipy folder has the field set to a particular value in its caipyjson file, avoiding NaN values in the resulting dataframe.
Add to_parquet and from_parquet method to save and load dataset efficiently with pyarrow.
Add dataframe booleanized columns broadcasting functions, useful for merging datasets
Add better error messages when calling check functions from utils.testing
Add remap_from_other method to remap label map to match another dataset.
Add realign_label_map argument in Dataset.merge to avoid incompatible label maps error
Add assert_columns_properly_normalized for caipy json reading
Add Dataset.empty() method to create the same dataset object as before, but with an empty dataframe of annotations. This is useful when creating a prediction dataset.
Add AnnotationAppender.reset() and AnnotationAppender.finish() methods to be able to use the annotation appender outside a context window
Add category_ids_mapping optional argument to AnnotationAppender and related functions in order to remap the category ids from predictions
Add flatten_paths to cAIpy export function, which lets you save a dataset without subfolders.
Add c2f standalone script to quickly open a caipy dataset into fiftyone
Add from_files function, similar to ``from_folder` but when you already know what files or file patterns you want in the root folder.
Add difftools in libia.utils to compute difference between datasets. Useful when we want to update something related to it (like fiftyone)
Add libia.utils.doc_utils for examples in docstring, with a dummy dataset creator
Add Examples in all methods of Dataset object.
Add Dataset.reset_index_from_mapping method to remap index of images and annotationbs dataframes
BREAKING Remove Dataset.reindex method and rename it Dataset.match_index to avoid confusion with pandas.reindex
Add “See Also” admonitions in many methods to link methods together and to see the related tutorial each time
Add schemas tutorial

Changed#

Caipy save is much faster
Up-to-date dependencies
from_coco function now has label_map option in case the categories field is empty in the input json
from_coco assumes category_id to be 0 in case it is absent from annotations fields. It will error if it’s not absent from ALL annotations though.
BREAKING Evaluator.predictions renamed to Evaluator.predictions_dictionary for better clarity
BREAKING DetectionEvaluator.compute_matches and DetectionEvaluator.compute_precision_recall have changed their predictions option to predictions_names for better clarity.
Dataset.merge now tries to fuse dataframes with overlapping ids, as long as the common subset is the same
Dataset.reset_index now accepts a start_image_id.
BREAKING Dataset.dataset_path is deprecated in favor Dataset.images_root, similar to Evaluator.
Introduce the optional dataset_name attribute to be used when dataset name is not the folder name of images root but can be deduced from the loader function, e.g. in from_caipy
dataset merging now merge image indexes before concatenating the annotations. Useful when merging a dataset with annotations and the same dataset with pre-annotations.
refactor dataset merge logic in a dedicated module
dataset addition falls back to realing_label_map in merge when a IncompatibleLabelMapsError is raised.
add create_split_folder option in dataset_to_darknet function and related Dataset methods, allowing to save all images of a particular split in its dedicated folder.
Dataset.get_split now accept None value to get all images with a null split value if needed.
BREAKING Dataset.remap_from_DataFrame renamed to Dataset.remap_from_dataframe
Replace warning types from UserWarning to the right warning type (DeprecationWarning or RuntimeWarning)
Add pandas style Dataset.loc, Dataset.iloc, Dataset.loc_annot and Dataset.iloc_annot indexers, along with filter_images and filter_annotations method.
Add record_fo_ids options in Dataset.to_fiftyone and DetectionEvaluator.to_fiftyone methods to keep track of fiftyone’s UUID of each corresponding image and annotation.
Add markdownlint pre-commit hook (and make markdown documents compliant with it)
Add --watch argument in caipy_to_fiftyone script to perform live update of fiftyone datasets each time a file is modified in the caipy dataset. Useful when constructing a dataset progressively.
Add start_annotations_id option to Dataset.reset_index method.
Add supplementary checks and formatting to the Dataset basic constructor.
Add more explanation on crowd counting tutorial.

Fixed#

Get split does not rely on split being present in annotations anymore
crowdhuman head visibility is unknown
Class remapping is now compatible when label map is only a subset of remap dict
PNG to JPG conversion now works for RGBA images (note that the Alpha channel will be lost)
to_yolov5 now automatically convert split values like eval and valid to their yolov5 accepted equivalent (resp. test and val)
fix DetectionEvalutator.matches being tied to the class instead of the instance.
fix dependencies problem: sklearn is in core dependencies and matplotlib in optional “plot-utils” group
fix yolov7 problem, image path in txt files are also absolute. Please don’t use yolov7 export if you don’t need to, the dataset specs are terrible.
diverse pycharm warnings fixed
type hint of from_folder improved
from_folder method does not crash when folder is empty, but returns an empty dataset with a warning.
Warnings and pyright errors from last pandas version are suppressed
Use tight layout for confusion matrix plot result
Use json normalize when loading COCO so that it can be converted to fiftyone
Skip processing steps when converting an empty dataset to fiftyone or when appending empty annotations to the dataset with the annotation appender context manager
Prevent annotations index to be reset when using annotations appender
Prevent loss of dataset name when calling merge, reset_index, remap_classes

Removed#

libia.model subpackage (dead legacy code) got deleted

[1.4.0] - 2023-02-01#

Added#

Add CrowdHuman loading module See https://www.crowdhuman.org/
Add darknet_generic loading module
Add more test to improve coverage
introduce a BBOX_COLUMN_NAMES convention for bounding column names in dataset’s annotation dataframe

Fixed#

sum of datasets is now functional and tested (was not working before)

[1.3.1] - 2023-01-16#

Fixed#

Fix bug regarding confidence subsampling for PR curves
Proper extremal point for PR curves
Caipy split stays to None if no split is given when loading and data is in root
Caipy save keep added attributes during runtime when saving

[1.3.0] - 2023-01-10#

Added#

Add remove empty images method to dataset
Add remove emptied images option in remap classes
Add remove not mapped classes option in remap classes (not mapped were always removed before)
Add f_scores_betas to compute all wanted F-scores, F1, F0.5, F2, etc…

Changed#

PR curves are now indexed by recall with 101 evenly spaced values between 0 and 1 by default. The old behaviour can be retrieved by setting the option index_column to None.
Reworked evaluation demo
Improved documentation

[1.2.0] - 2023-01-06#

Added#

Add bounding box converter
Add image folder io, when input is simply a folder with images, but no annotation
load caipy generic does not have to specify an image folder anymore
conversion to fiftyone for datasets and evaluators
bugfix regarding annotation index when it’s duplicated
group continuous data with either interval labels (by default), mid-point, mean point or median point

Changed#

BREAKING evaluation predictions and matches are now dictionaries and can be used to evaluate multiple predictions sets at the same time
BREAKING group type alias is now either a column or a ContinuousGroup object (a dictionary that does the same thing but with better checking)

Fixed#

Fix several failing pyright tests because pandas stubs was updated

[1.1.0] - 2022-11-17#

Added#

Add caipy generic format
Add testing module in utils
More thorough tests for io
More complete notebook for demo_dataset

Fixed#

pre-commit’s flake8 repo url was moved from gitlab to gitHub

[1.0.0] - 2022-11-04#

Added#

dataset evaluation tool : see tutorials/demo_evaluation
dataset split tool : see tutorials/demo_split
new code checkers, including pyright and pandas stubs

[0.2.0] - 2022-07-18#

Added#

Features: Merge, Class remapping, etc.