@@ -12,16 +12,16 @@ In its original "unsupervised" or self-supervised form, it reconstructs series o
For the automatic tagging, the encoder is extracted and a classifier is stacked atop the encoder.
On the same dataset, the combined encoder and classifier are (re-)trained to predict discrete behaviors.
## Prototypes
Two flavors have been tried out as proofs of concept, although no thorough validation has been performed on data yet.
## Prototypes and validated taggers
As a first prototype, the [`20220418`](https://gitlab.pasteur.fr/nyx/MaggotUBA-adapter/-/tree/20220418) trained model is based on a simple random forest classifier, and only the classifier was trained; the encoder was not retrained.
See module [`maggotuba.models.randomforest`](https://gitlab.pasteur.fr/nyx/MaggotUBA-adapter/-/blob/dev/src/maggotuba/models/randomforest.py).
See module [`maggotuba.models.randomforest`](https://gitlab.pasteur.fr/nyx/MaggotUBA-adapter/-/blob/20220418/src/maggotuba/models/randomforest.py).
It was trained on the entire t5+t15 database. No interpolation was performed and the prototype does not properly handle data with different frame rates.
A second prototype called `20220517` (not shared) involves a dense layer as classifier, and the encoder was fine-tuned while training the combined encoder+classifier.
See module [`maggotuba.models.denselayer`](https://gitlab.pasteur.fr/nyx/MaggotUBA-adapter/-/blob/dev/src/maggotuba/models/denselayer.py).
A second tagger called [`20221005`](https://gitlab.pasteur.fr/nyx/MaggotUBA-adapter/-/tree/20221005) involves a classifier with dense layers, and the encoder was fine-tuned while training the combined encoder+classifier.
See modules[`maggotuba.models.trainers`](https://gitlab.pasteur.fr/nyx/MaggotUBA-adapter/-/blob/20221005/src/maggotuba/models/trainers.py) and [`maggotuba.models.modules`](https://gitlab.pasteur.fr/nyx/MaggotUBA-adapter/-/blob/20221005/src/maggotuba/models/modules.py).
The flavor this second prototype is based on will implement retraining on arbritrary data repositories and discrete behaviors.
This second tagger was dimensioned following a [parametric exploration for the 6-behavior classification task](https://gitlab.pasteur.fr/nyx/MaggotUBA-adapter/-/blob/design/notebooks/parametric_exploration_6-behavior_classification.ipynb): 2-second time segments, 100-dimension latent space, 3 dense layers.
If used on data for scientific purposes, as the prototypes are not validated, at best they allow **assisted manual tagging**.
It was trained on a subset of 5000 files from the t5 and t15 databases. Spines were/are linearly interpolated at 10 Hz in each time segment individually.