Changelog (archived)#
Note
This CHANGELOG refers to the time this project was maintained internally by XXII under the name “Libia”. Since the commit history has been removed for security reasons, the chaneglog is kept for informational purpose and should not be modified. The new CHANGELOG is here
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[2.1.1 - 2024-06-21]#
Pin numpy version to be <2
[2.1.0 - 2024-06-21]#
Added#
Add
Dataset.remove_invalid_imagesandDataset.remove_invalid_annotationsmethods.Add
mark_originandoverwrite_originoptions toDataset.mergemethodAdd
from_pascalVOC_detectionandfrom_pascalVOC_genericfunctions to load pascal datasetsAdd
dataset_regressionfixture for pytest that will test that datasets are the sameAdd more examples to documentation
Fixed#
Fix spelling errors
Changed#
Upgrade minimum version to 3.10, so long python 3.9!
Upgrade pre-commit template and run it
Change most dataset method return types to
Selfinstead of simply"Dataset"Change classmethod
Dataset.from_templateto be a simple method. Note that this change is not breaking, asDataset.from_template(input_dataset, **kwargs)is equivalent toinput_dataset.from_template(**kwargs)from_cocoandfrom_crowdhumanboth try to parse intelligently the annotation file path to extract both the dataset name and the split name, thanks to a new functionlibia.dataset.io.common.parse_annotation_nameDataset.mergenow automatically convert images root of a dataset to absolute if the other is also absoluteto_fiftyonemethods (for dataset and evaluator) now accept aexistingoption to handle existing dataset. You can now erase the existing dataset before uploading yours, or raise an error if it exists. Possibly breaking : default behaviour ofto_fiftyonemethods was “update” and is now “error”Dataset.match_indexnow accepts a dataset as well as an image dataframe like beforeDataset.remap_from_othernow acceptsremove_not_mappedandremove_emptied_imagesoptions to remove classes that are not present in the other dataset.Evaluatornow accepts a prediction label map that is neither a subset nor a superset of ground truth label map, and will assume only false negative and false positive for the not mutual classes.dummy_datasetnow accepts optionskeypoints_shareandadd_confidenceto make crowd datasets and predictionsDataset.add_annotationsandannotations_appender.appendnow accepts more flexible attributes shapes, and then broadcast them together.
[2.0.1 - 2024-05-29]#
Added#
Add the possibility to test dataset equality modulo columns that are all NaNs
Add warning message when label map is incomplete, and complete it with the simple id -> str(id) mapping for missing ids
Add
check_exhaustiveoption toDataset.checkandassert_images_validfunctions
Fixed#
Fix c2p CLI tool to effectively remove a detection when it is modified
Dataset.remove_empty_imagesnow keeps the dataset nameadd docs for darknet IO
Suppress some FutureWarning from pandas during tests
fix bug for caipy when split is
pd.NAinstead ofNoneornp.NaNfix bug when loading caipy with
splits_to_readset to non existing splitsCode spelling
[2.0.0 - 2024-04-02]#
Added#
Add input format option for COCO loading, making it possible to load XY coordinates instead of just bounding boxes
Add
from_coco_keypointsfunction for loading COCO data with points and only one class.Add compatibility with caipyjson tags and attributes, and more generally any kind of nested dictionary
Add column boooleanizer (and debooleanizer) to go from a list objects to columns of boolean value for better queries
Add Crowd detection evaluator with Mean Average Error metric for count
Add reindex function
Add
from_motfunction for loading datasets in MOT format. See https://motchallenge.net/instructions/Add a method to compute confusion matrix for DetectionEvaluator
Add reindex function
Add yolov7 compatibility with a
Dataset.to_yolov7method.Add automatic compliance with schema when saving to caipy
Add compatibility with caipy splits independently indexed
Add iterator helper methods to
DatasetlikeDataset.iter_imagesandDataset.iter_splitsto make it easier to iterate by a specific attributeWhen loaded with a schema,
from_caipyautomatically set missing arrays to the empty list and other fields to their default value specified in the schema when at least one sample in the caipy folder has the field set to a particular value in its caipyjson file, avoiding NaN values in the resulting dataframe.Add
to_parquetandfrom_parquetmethod to save and load dataset efficiently with pyarrow.Add dataframe booleanized columns broadcasting functions, useful for merging datasets
Add better error messages when calling check functions from
utils.testingAdd
remap_from_othermethod to remap label map to match another dataset.Add
realign_label_mapargument inDataset.mergeto avoid incompatible label maps errorAdd
assert_columns_properly_normalizedfor caipy json readingAdd
Dataset.empty()method to create the same dataset object as before, but with an empty dataframe of annotations. This is useful when creating a prediction dataset.Add
AnnotationAppender.reset()andAnnotationAppender.finish()methods to be able to use the annotation appender outside a context windowAdd
category_ids_mappingoptional argument toAnnotationAppenderand related functions in order to remap the category ids from predictionsAdd
flatten_pathsto cAIpy export function, which lets you save a dataset without subfolders.Add
c2fstandalone script to quickly open a caipy dataset into fiftyoneAdd
from_filesfunction, similar to ``from_folder` but when you already know what files or file patterns you want in the root folder.Add
difftoolsinlibia.utilsto compute difference between datasets. Useful when we want to update something related to it (like fiftyone)Add
libia.utils.doc_utilsfor examples in docstring, with a dummy dataset creatorAdd Examples in all methods of
Datasetobject.Add
Dataset.reset_index_from_mappingmethod to remap index of images and annotationbs dataframesBREAKING Remove
Dataset.reindexmethod and rename itDataset.match_indexto avoid confusion withpandas.reindexAdd “See Also” admonitions in many methods to link methods together and to see the related tutorial each time
Add schemas tutorial
Changed#
Caipy save is much faster
Up-to-date dependencies
from_cocofunction now haslabel_mapoption in case the categories field is empty in the input jsonfrom_cocoassumescategory_idto be 0 in case it is absent from annotations fields. It will error if it’s not absent from ALL annotations though.BREAKING
Evaluator.predictionsrenamed toEvaluator.predictions_dictionaryfor better clarityBREAKING
DetectionEvaluator.compute_matchesandDetectionEvaluator.compute_precision_recallhave changed theirpredictionsoption topredictions_namesfor better clarity.Dataset.mergenow tries to fuse dataframes with overlapping ids, as long as the common subset is the sameDataset.reset_indexnow accepts astart_image_id.BREAKING
Dataset.dataset_pathis deprecated in favorDataset.images_root, similar toEvaluator.Introduce the optional
dataset_nameattribute to be used when dataset name is not the folder name of images root but can be deduced from the loader function, e.g. infrom_caipydataset merging now merge image indexes before concatenating the annotations. Useful when merging a dataset with annotations and the same dataset with pre-annotations.
refactor dataset merge logic in a dedicated module
dataset addition falls back to
realing_label_mapin merge when aIncompatibleLabelMapsErroris raised.add
create_split_folderoption indataset_to_darknetfunction and relatedDatasetmethods, allowing to save all images of a particular split in its dedicated folder.Dataset.get_splitnow acceptNonevalue to get all images with a null split value if needed.BREAKING
Dataset.remap_from_DataFramerenamed toDataset.remap_from_dataframeReplace warning types from
UserWarningto the right warning type (DeprecationWarningorRuntimeWarning)Add pandas style
Dataset.loc,Dataset.iloc,Dataset.loc_annotandDataset.iloc_annotindexers, along withfilter_imagesandfilter_annotationsmethod.Add
record_fo_idsoptions inDataset.to_fiftyoneandDetectionEvaluator.to_fiftyonemethods to keep track of fiftyone’s UUID of each corresponding image and annotation.Add markdownlint pre-commit hook (and make markdown documents compliant with it)
Add
--watchargument incaipy_to_fiftyonescript to perform live update of fiftyone datasets each time a file is modified in the caipy dataset. Useful when constructing a dataset progressively.Add
start_annotations_idoption toDataset.reset_indexmethod.Add supplementary checks and formatting to the Dataset basic constructor.
Add more explanation on crowd counting tutorial.
Fixed#
Get split does not rely on split being present in annotations anymore
crowdhuman head visibility is unknown
Class remapping is now compatible when label map is only a subset of remap dict
PNG to JPG conversion now works for RGBA images (note that the Alpha channel will be lost)
to_yolov5now automatically convert split values likeevalandvalidto their yolov5 accepted equivalent (resp.testandval)fix
DetectionEvalutator.matchesbeing tied to the class instead of the instance.fix dependencies problem: sklearn is in core dependencies and matplotlib in optional “plot-utils” group
fix yolov7 problem, image path in txt files are also absolute. Please don’t use yolov7 export if you don’t need to, the dataset specs are terrible.
diverse pycharm warnings fixed
type hint of
from_folderimprovedfrom_foldermethod does not crash when folder is empty, but returns an empty dataset with a warning.Warnings and pyright errors from last pandas version are suppressed
Use tight layout for confusion matrix plot result
Use json normalize when loading COCO so that it can be converted to fiftyone
Skip processing steps when converting an empty dataset to fiftyone or when appending empty annotations to the dataset with the annotation appender context manager
Prevent annotations index to be reset when using annotations appender
Prevent loss of dataset name when calling
merge,reset_index,remap_classes
Removed#
libia.model subpackage (dead legacy code) got deleted
[1.4.0] - 2023-02-01#
Added#
Add CrowdHuman loading module See https://www.crowdhuman.org/
Add
darknet_genericloading moduleAdd more test to improve coverage
introduce a
BBOX_COLUMN_NAMESconvention for bounding column names in dataset’s annotation dataframe
Fixed#
sum of datasets is now functional and tested (was not working before)
[1.3.1] - 2023-01-16#
Fixed#
Fix bug regarding confidence subsampling for PR curves
Proper extremal point for PR curves
Caipy split stays to None if no split is given when loading and data is in root
Caipy save keep added attributes during runtime when saving
[1.3.0] - 2023-01-10#
Added#
Add remove empty images method to dataset
Add remove emptied images option in remap classes
Add remove not mapped classes option in remap classes (not mapped were always removed before)
Add
f_scores_betasto compute all wanted F-scores, F1, F0.5, F2, etc…
Changed#
PR curves are now indexed by recall with 101 evenly spaced values between 0 and 1 by default. The old behaviour can be retrieved by setting the option
index_columnto None.Reworked evaluation demo
Improved documentation
[1.2.0] - 2023-01-06#
Added#
Add bounding box converter
Add image folder io, when input is simply a folder with images, but no annotation
load caipy generic does not have to specify an image folder anymore
conversion to fiftyone for datasets and evaluators
bugfix regarding annotation index when it’s duplicated
group continuous data with either interval labels (by default), mid-point, mean point or median point
Changed#
BREAKING evaluation predictions and matches are now dictionaries and can be used to evaluate multiple predictions sets at the same time
BREAKING group type alias is now either a column or a ContinuousGroup object (a dictionary that does the same thing but with better checking)
Fixed#
Fix several failing pyright tests because pandas stubs was updated
[1.1.0] - 2022-11-17#
Added#
Add caipy generic format
Add testing module in utils
More thorough tests for io
More complete notebook for demo_dataset
Fixed#
pre-commit’s flake8 repo url was moved from gitlab to gitHub
[1.0.0] - 2022-11-04#
Added#
dataset evaluation tool : see tutorials/demo_evaluation
dataset split tool : see tutorials/demo_split
new code checkers, including pyright and pandas stubs
[0.2.0] - 2022-07-18#
Added#
Features: Merge, Class remapping, etc.