Skip to content
Snippets Groups Projects
Commit 49e64df6 authored by François  LAURENT's avatar François LAURENT
Browse files

training instructions

parent 12eb1052
No related branches found
No related tags found
No related merge requests found
......@@ -25,3 +25,42 @@ See modules [`maggotuba.models.trainers`](https://gitlab.pasteur.fr/nyx/MaggotUB
This second tagger was dimensioned following a [parametric exploration for the 6-behavior classification task](https://gitlab.pasteur.fr/nyx/MaggotUBA-adapter/-/blob/design/notebooks/parametric_exploration_6-behavior_classification.ipynb): 2-second time segments, 100-dimension latent space, 3 dense layers.
It was trained on a subset of 5000 files from the t5 and t15 databases. Spines were/are linearly interpolated at 10 Hz in each time segment individually.
## Usage
A MaggotUBA-based tagger is typically called using the `poetry run scripts/tagging-backend` command from the `TaggingBackends` project.
All the [command arguments supported by `TaggingBackends`](https://gitlab.pasteur.fr/nyx/TaggingBackends/-/blob/dev/src/taggingbackends/main.py) are also supported by `MaggotUBA-adapter`.
### Automatic tagging
Using the [`20221005`](https://gitlab.pasteur.fr/nyx/MaggotUBA-adapter/-/tree/20221005) branch, the `20221005` tagger can be called on a supported tracking data file with:
```
poetry run scripts/tagging-backend predict --backend path/to/maggotuba-adapter --model-instance 20221005 path/to/datafile
```
### Retraining a tagger
A new model instance can be trained on a data repository, using the `dev` branch of `MaggotUBA-adapter` (soon the `main` branch; the `20221005` branch is also suitable) with:
```
poetry run scripts/tagging-backend train --backend path/to/maggotuba-adapter --model-instance tagger-name path/to/repository
```
This will first load a pretrained model (`pretrained_models/default` in `MaggotUBA-adapter`) to determine additional parameters, such as whether to interpolate the spines or not and at which frequency, or the window length.
The current default pretrained model involves linearly interpolating the spines at 10 Hz, and relies on a 20-time-step window (2 seconds). The dimensionality of the latent space is 100.
Alternative pretrained models can be specified using the `--pretrained-model-instance` option.
The data files are discovered in the repository and behavior tags are counted.
A subset of tags can be specified using the `--labels` option.
A two-level balancing rule is followed to randomly select time segments and thus form a training dataset in the shape of a *larva_dataset* hdf5 file.
See also the [`make_dataset.py`](https://gitlab.pasteur.fr/nyx/MaggotUBA-adapter/-/blob/20221005/src/maggotuba/data/make_dataset.py) script.
Training operates in two steps, first pretraining the dense-layer classifier, second simultaneously fine-tuning the encoder and classifier.
See also the [`train_model.py`](https://gitlab.pasteur.fr/nyx/MaggotUBA-adapter/-/blob/20221005/src/maggotuba/models/train_model.py) script.
This generates a new sub-directory in the `models` directory of the `MaggotUBA-adapter` project, which makes the trained model available for automatic tagging (*predict* command).
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment