MaggotUBA backend adapter
Wrapper project to allow the Nyx tagger UI to call MaggotUBA
.
This project heavily depends on the TaggingBackends
package that drives the development of automatic tagging backends.
Principle
MaggotUBA is an autoencoder trained on randomly sampled 20-time-step time segments drawn from the t5 and t15 databases, with a computational budget of 1000 training epochs. In its original "unsupervised" or self-supervised form, it reconstructs series of spines from a compressed latent representation.
For the automatic tagging, the encoder is extracted and a classifier is stacked atop the encoder. On the same dataset, the combined encoder and classifier are (re-)trained to predict discrete behaviors.
Prototypes and validated taggers
As a first prototype, the 20220418
trained model is based on a simple random forest classifier, and only the classifier was trained; the encoder was not retrained.
See module maggotuba.models.randomforest
.
It was trained on the entire t5+t15 database. No interpolation was performed and the prototype does not properly handle data with different frame rates.
A second tagger called 20221005
involves a classifier with dense layers, and the encoder was fine-tuned while training the combined encoder+classifier.
See modules maggotuba.models.trainers
and maggotuba.models.modules
.
This second tagger was dimensioned following a parametric exploration for the 6-behavior classification task: 2-second time segments, 100-dimension latent space, 3 dense layers.
It was trained on a subset of 5000 files from the t5 and t15 databases. Spines were/are linearly interpolated at 10 Hz in each time segment individually.
Usage
For installation, see TaggingBackends' README.
A MaggotUBA-based tagger is typically called using the poetry run tagging-backend
command from the root directory of the backend's project.
All the command arguments supported by TaggingBackends
are also supported by MaggotUBA-adapter
.
Automatic tagging
Using the 20221005
branch, the 20221005
tagger can be called on a supported tracking data file with:
poetry run tagging-backend predict --model-instance 20221005 --skip-make-dataset
The --skip-make-dataset
option is optional. It only makes tagging-backend slightly faster.
For the above command to work, the track data file must be placed (e.g. copied) in the data/raw/20221005
directory, to be first created or cleared.
The resulting label file can be found as data/processed/20221005/predicted.label. Like all .label files, this file should be stored as a sibling of the corresponding track data file (in the same directory).
Similarly, with an arbitrary tagger named, say mytagger, in the above explanation all occurences of 20221005
or 20221005 must be replaced by the tagger's name.
For example, the input data file would go into data/raw/mytagger.
Retraining a tagger
A new model instance can be trained on a data repository, using the main
or dev
branch of MaggotUBA-adapter
(the 20221005
branch is also suitable) with:
poetry run tagging-backend train --model-instance <tagger-name>
Similarly to the predict command, for this one to work, the data repository must be made available in the data/raw/<tagger-name> directory (again, to be created or cleared).
The above command will first load a pretrained model (pretrained_models/default
in MaggotUBA-adapter
) to determine additional parameters, such as whether to interpolate the spines or not and at which frequency, or the window length.
The current default pretrained model involves linearly interpolating the spines at 10 Hz, and relies on a 20-time-step window (2 seconds). The dimensionality of the latent space is 100.
Alternative pretrained models can be specified using the --pretrained-model-instance
option.
The data files are discovered in the repository (more exactly in data/raw/<tagger-name>) and behavior tags are counted.
A subset of tags can be specified using the --labels
option followed by a list of comma-separated tags.
A two-level balancing rule is followed to randomly select time segments and thus form a training dataset in the shape of a larva_dataset hdf5 file.
See also the make_dataset.py
script.
Training operates in two steps, first pretraining the dense-layer classifier, second simultaneously fine-tuning the encoder and classifier.
See also the train_model.py
script.
This generates a new sub-directory in the models
directory of the MaggotUBA-adapter
project, which makes the trained model discoverable for automatic tagging (predict command).