Skip to content
Snippets Groups Projects

MaggotUBA backend adapter

Wrapper project to allow the Nyx tagger UI to call MaggotUBA.

This project heavily depends on the TaggingBackends package that drives the development of automatic tagging backends.

Principle

MaggotUBA is an autoencoder trained on randomly sampled 20-time-step time segments drawn from the t5 and t15 databases, with a computational budget of 1000 training epochs. In its original "unsupervised" or self-supervised form, it reconstructs series of spines from a compressed latent representation.

For the automatic tagging, the encoder is extracted and a classifier is stacked atop the encoder. On the same dataset, the combined encoder and classifier are (re-)trained to predict discrete behaviors.

Prototypes and validated taggers

As a first prototype, the 20220418 trained model is based on a simple random forest classifier, and only the classifier was trained; the encoder was not retrained. See module maggotuba.models.randomforest.

It was trained on the entire t5+t15 database. No interpolation was performed and the prototype does not properly handle data with different frame rates.

A second tagger called 20221005 involves a classifier with dense layers, and the encoder was fine-tuned while training the combined encoder+classifier. See modules maggotuba.models.trainers and maggotuba.models.modules.

This second tagger was dimensioned following a parametric exploration for the 6-behavior classification task: 2-second time segments, 100-dimension latent space, 3 dense layers.

It was trained on a subset of 5000 files from the t5 and t15 databases. Spines were/are linearly interpolated at 10 Hz in each time segment individually.

Usage

For installation, see TaggingBackends' README.

A MaggotUBA-based tagger is typically called using the poetry run tagging-backend command from the root directory of the backend's project.

All the command arguments supported by TaggingBackends are also supported by MaggotUBA-adapter.

Automatic tagging

Using the 20221005 branch, the 20221005 tagger can be called on a supported tracking data file with:

poetry run tagging-backend predict --model-instance 20221005 --skip-make-dataset

The --skip-make-dataset option is optional. It only makes tagging-backend slightly faster.

For the above command to work, the track data file must be placed (e.g. copied) in the data/raw/20221005 directory, to be first created or cleared.

The resulting label file can be found as data/processed/20221005/predicted.label. Like all .label files, this file should be stored as a sibling of the corresponding track data file (in the same directory).

Similarly, with an arbitrary tagger named, say mytagger, in the above explanation all occurences of 20221005 or 20221005 must be replaced by the tagger's name. For example, the input data file would go into data/raw/mytagger.

Retraining a tagger

A new model instance can be trained on a data repository, using the main or dev branch of MaggotUBA-adapter (the 20221005 branch is also suitable) with:

poetry run tagging-backend train --model-instance <tagger-name>

Similarly to the predict command, for this one to work, the data repository must be made available in the data/raw/<tagger-name> directory (again, to be created or cleared).

The above command will first load a pretrained model (pretrained_models/default in MaggotUBA-adapter) to determine additional parameters, such as whether to interpolate the spines or not and at which frequency, or the window length.

The current default pretrained model involves linearly interpolating the spines at 10 Hz, and relies on a 20-time-step window (2 seconds). The dimensionality of the latent space is 100.

Alternative pretrained models can be specified using the --pretrained-model-instance option.

The data files are discovered in the repository (more exactly in data/raw/<tagger-name>) and behavior tags are counted. A subset of tags can be specified using the --labels option followed by a list of comma-separated tags.

A two-level balancing rule is followed to randomly select time segments and thus form a training dataset in the shape of a larva_dataset hdf5 file. See also the make_dataset.py script.

Training operates in two steps, first pretraining the dense-layer classifier, second simultaneously fine-tuning the encoder and classifier. See also the train_model.py script.

This generates a new sub-directory in the models directory of the MaggotUBA-adapter project, which makes the trained model discoverable for automatic tagging (predict command).