booleanize#

Dataset.booleanize(column_names: str | Iterable[str] | None = None, missing_ok: bool = False, **possible_values: set) → Self[source]#

Convert given column in self.images or self.annotations from lists to columns of booleans.

See util.column_booleanize.booleanize()

Note

in the case column name is present in both images and annotations, the column in self.images takes precedence

Parameters:

column_names – columns to convert. After conversion, it will be dropped from corresponding DataFrames
missing_ok – If set to True, will not raise a KeyError if the column name is neither in self.images nor self.annotations
**possible_values – keyword arguments dictionary for possible values. If a column name in column_names is not present in this dictionary, will deduce from occurrence in the dataset

Raises:

KeyError – if missing_ok is set to False, the given column_name must be either in self.images columns or in self.annotations columns.
TypeError – When for a particular column possible values need to be deduced, the column must have value that are all iterable except strings.

Returns:

New dataset with multiple boolean columns in the form {column_name}.{value}.

related tutorial

Example

>>> from lours.utils.doc_utils import dummy_dataset
>>> example = dummy_dataset(
...     n_imgs=3,
...     n_annot=3,
...     n_list_columns_images=[2, 3],
...     n_list_columns_annotations=1,
... )
>>> example
Dataset object containing 3 images and 3 objects
Name :
    inside_else_memory
Images root :
    such/serious
Images :
    width  height  ...                         beyond                father
id                 ...
0     342     167  ...                       [enough]  [challenge, someone]
1     377     114  ...          [present, successful]           [challenge]
2     136     257  ...  [present, successful, enough]  [challenge, someone]

[3 rows x 7 columns]
Annotations :
    image_id category_str  ...  box_height                                   where
id                         ...
0          2          why  ...  138.451739  [no, season, play, choice, force, bit]
1          1          why  ...   63.576932                     [no, choice, force]
2          2         step  ...   99.999123           [no, season, play, week, bit]

[3 rows x 9 columns]
Label map :
{15: 'step', 19: 'why', 25: 'interview'}
>>> modified = example.booleanize(column_names=["beyond", "where"])
>>> modified
Dataset object containing 3 images and 3 objects
Name :
    inside_else_memory
Images root :
    such/serious
Images :
    width  height  ... beyond.present beyond.successful
id                 ...
0     342     167  ...          False             False
1     377     114  ...           True              True
2     136     257  ...           True              True

[3 rows x 9 columns]
Annotations :
    image_id category_str  category_id  ... where.play  where.season  where.week
id                                      ...
0          2          why           19  ...       True          True       False
1          1          why           19  ...      False         False       False
2          2         step           15  ...       True          True        True

[3 rows x 15 columns]
Label map :
{15: 'step', 19: 'why', 25: 'interview'}
>>> modified.annotations.dtypes
image_id          int64
category_str     object
category_id       int64
split            object
box_x_min       float64
box_y_min       float64
box_width       float64
box_height      float64
where.bit          bool
where.choice       bool
where.force        bool
where.no           bool
where.play         bool
where.season       bool
where.week         bool
dtype: object
>>> modified.booleanized_columns
{'images': {'beyond'}, 'annotations': {'where'}}

>>> example.booleanize(beyond={"enough", "successful"})
Dataset object containing 3 images and 3 objects
Name :
    inside_else_memory
Images root :
    such/serious
Images :
    width  height  ... beyond.enough beyond.successful
id                 ...
0     342     167  ...          True             False
1     377     114  ...         False              True
2     136     257  ...          True              True

[3 rows x 8 columns]
Annotations :
    image_id category_str  ...  box_height                                   where
id                         ...
0          2          why  ...  138.451739  [no, season, play, choice, force, bit]
1          1          why  ...   63.576932                     [no, choice, force]
2          2         step  ...   99.999123           [no, season, play, week, bit]

[3 rows x 9 columns]
Label map :
{15: 'step', 19: 'why', 25: 'interview'}