Skip to content
Snippets Groups Projects
Commit 103fc304 authored by François  LAURENT's avatar François LAURENT
Browse files

Merge branch 'dev' into main

parents 5e6705de da969257
No related branches found
No related tags found
No related merge requests found
...@@ -6,7 +6,7 @@ This project heavily depends on the [`TaggingBackends`](https://gitlab.pasteur.f ...@@ -6,7 +6,7 @@ This project heavily depends on the [`TaggingBackends`](https://gitlab.pasteur.f
## Principle ## Principle
MaggotUBA is an autoencoder trained on randomly sampled 20-time-step time segments drawn from the t5 and t15 databases, with a computational budget of 1000 training epochs. MaggotUBA is an autoencoder trained on randomly sampled 20-time-step time segments drawn from the t5 and t15 databases (from t15 only for current default), with a computational budget of 1,000 training epochs (10,000 for current default).
In its original "unsupervised" or self-supervised form, it reconstructs series of spines from a compressed latent representation. In its original "unsupervised" or self-supervised form, it reconstructs series of spines from a compressed latent representation.
For the automatic tagging, the encoder is combined with a classifier. For the automatic tagging, the encoder is combined with a classifier.
...@@ -56,6 +56,19 @@ As a stronger default tagger, the `small_motion` was reintroduced to lower the d ...@@ -56,6 +56,19 @@ As a stronger default tagger, the `small_motion` was reintroduced to lower the d
The `20230111` tagger uses a 2-s time window, features 25 latent dimensions and a single dense layer as classifier. The `20230111` tagger uses a 2-s time window, features 25 latent dimensions and a single dense layer as classifier.
It applies a post-prediction rule referred to as *ABC -> AAC* that consists in correcting all single-step actions with the previous action. It applies a post-prediction rule referred to as *ABC -> AAC* that consists in correcting all single-step actions with the previous action.
#### `20230129`
Previous tagger `20230111` revealed a [temporal leakage issue](https://gitlab.pasteur.fr/nyx/larvatagger.jl/-/issues/88) that might have affected all previous taggers.
A similar tagger called `20230129` has been proposed to moderate this issue.
This tagger shares the same characteristics as `20230111` and differs in three important aspects:
* the number of training epochs was brought from 1,000 to 10,000 to let the original features be largely forgotten,
* the training stage involved more data: 1,200,235 time segments were used instead of 100,000; these data were unbalanced and training was performed with the newly introduced balancing strategy `auto` (see https://gitlab.pasteur.fr/nyx/larvatagger.jl/-/issues/92),
* pretraining and training data were drawn from t15 only (as opposed to previous taggers that were pretrained and trained with data from t15 and t5).
Note the last difference was not meant to improve performance, at all. The `20230129` was trained this way to study its performance on t5, and was kept as is after it showed better performance on t5 data, compared to previous taggers trained with t5 data in addition to t15 data.
## Usage ## Usage
For installation, see [TaggingBackends' README](https://gitlab.pasteur.fr/nyx/TaggingBackends/-/tree/dev#recommended-installation). For installation, see [TaggingBackends' README](https://gitlab.pasteur.fr/nyx/TaggingBackends/-/tree/dev#recommended-installation).
...@@ -66,20 +79,20 @@ All the [command arguments supported by `TaggingBackends`](https://gitlab.pasteu ...@@ -66,20 +79,20 @@ All the [command arguments supported by `TaggingBackends`](https://gitlab.pasteu
### Automatic tagging ### Automatic tagging
Using the [`20230111`](https://gitlab.pasteur.fr/nyx/MaggotUBA-adapter/-/tree/20230111) branch, the `20230111` tagger can be called on a supported tracking data file with: Using the [`20230129`](https://gitlab.pasteur.fr/nyx/MaggotUBA-adapter/-/tree/20230129) branch, the `20230129` tagger can be called on a supported tracking data file with:
``` ```
poetry run tagging-backend predict --model-instance 20230111 --skip-make-dataset poetry run tagging-backend predict --model-instance 20230129
``` ```
The `--skip-make-dataset` option is optional. It only makes *tagging-backend* slightly faster. Note: since `TaggingBackends==0.10`, the `--skip-make-dataset` argument is default behavior. Pass `--make-dataset` instead to enforce the former default.
For the above command to work, the track data file must be placed (*e.g.* copied) in the `data/raw/20230111` directory, to be first created or cleared. For the above command to work, the track data file must be placed (*e.g.* copied) in the `data/raw/20230129` directory, to be first created or cleared.
The resulting label file can be found as *data/processed/20230111/predicted.label*. The resulting label file can be found as *data/processed/20230129/predicted.label*.
Like all *.label* files, this file should be stored as a sibling of the corresponding track data file (in the same directory). Like all *.label* files, this file should be stored as a sibling of the corresponding track data file (in the same directory).
Similarly, with an arbitrary tagger named, say *mytagger*, in the above explanation all occurences of `20230111` or *20230111* must be replaced by the tagger's name. Similarly, with an arbitrary tagger named, say *mytagger*, in the above explanation all occurences of `20230129` or *20230129* must be replaced by the tagger's name.
For example, the input data file would go into *data/raw/mytagger*. For example, the input data file would go into *data/raw/mytagger*.
#### On HPC clusters #### On HPC clusters
...@@ -103,7 +116,7 @@ Beware that the default pretrained model may depend on the branch you are on. ...@@ -103,7 +116,7 @@ Beware that the default pretrained model may depend on the branch you are on.
The default pretrained model in the *20221005* branch involves linearly interpolating the spines at 10 Hz, and relies on a 20-time-step window (2 seconds). The dimensionality of the latent space is 100. The default pretrained model in the *20221005* branch involves linearly interpolating the spines at 10 Hz, and relies on a 20-time-step window (2 seconds). The dimensionality of the latent space is 100.
The default pretrained model in the *20230111* branch similarly interpolates spines at 10 Hz and relies on a 20-time-step window (2 seconds), but features 25 latent dimensions only. The default pretrained models in the *20230111* and *20230129* branches similarly interpolate spines at 10 Hz and rely on a 20-time-step window (2 seconds), but feature 25 latent dimensions only.
Alternative pretrained models can be specified using the `--pretrained-model-instance` option. Alternative pretrained models can be specified using the `--pretrained-model-instance` option.
......
{ {
"project_dir": "", "project_dir": "",
"seed": 100, "seed": 100,
"exp_name": "20230111", "exp_name": "20230129 -- see https://gitlab.pasteur.fr/nyx/MaggotUBA-adapter/-/blob/design/scripts/maestro/make_20230129_datasets.py and related scripts",
"data_dir": "/pasteur/appa/scratch/flaurent/MaggotUBA-adapter/data/20230111/pretrain/20/1/larva_dataset_2023_01_12_20_20_100000.hdf5", "data_dir": "/pasteur/appa/scratch/flaurent/MaggotUBA-adapter/data/20230129/pretrain/20/1/larva_dataset_2023_01_29_20_20_108440.hdf5",
"raw_data_dir": "/pasteur/zeus/projets/p02/hecatonchire/screens", "raw_data_dir": "/pasteur/zeus/projets/p02/hecatonchire/screens/t15",
"log_dir": "", "log_dir": "",
"exp_folder": "", "exp_folder": "",
"config": "models/20230111/autoencoder_config.json", "config": "models/20230129/autoencoder_config.json",
"num_workers": 4, "num_workers": 4,
"n_features": 10, "n_features": 10,
"len_traj": 20, "len_traj": 20,
...@@ -87,7 +87,7 @@ ...@@ -87,7 +87,7 @@
"init": "kaiming", "init": "kaiming",
"n_clusters": 2, "n_clusters": 2,
"dim_reduc": "UMAP", "dim_reduc": "UMAP",
"optim_iter": 1000, "optim_iter": 10000,
"pseudo_epoch": 100, "pseudo_epoch": 100,
"batch_size": 128, "batch_size": 128,
"lr": 0.005, "lr": 0.005,
......
No preview for this file type
[tool.poetry] [tool.poetry]
name = "MaggotUBA-adapter" name = "MaggotUBA-adapter"
version = "0.9.1" version = "0.10"
description = "Interface between MaggotUBA and the Nyx tagging UI" description = "Interface between MaggotUBA and the Nyx tagging UI"
authors = ["François Laurent"] authors = ["François Laurent"]
license = "MIT" license = "MIT"
......
...@@ -26,8 +26,11 @@ def train_model(backend, layers=1, pretrained_model_instance="default", ...@@ -26,8 +26,11 @@ def train_model(backend, layers=1, pretrained_model_instance="default",
pretrained_model_instances = pretrained_model_instance pretrained_model_instances = pretrained_model_instance
config_files = import_pretrained_models(backend, pretrained_model_instances) config_files = import_pretrained_models(backend, pretrained_model_instances)
model = make_trainer(config_files, labels, layers) model = make_trainer(config_files, labels, layers)
# fine-tune and save the model # fine-tune the model
model.train(dataset) model.train(dataset)
# add post-prediction rule ABC -> AAC
model.clf_config['post_filters'] = ['ABC->AAC']
# save the model
print(f"saving model \"{backend.model_instance}\"") print(f"saving model \"{backend.model_instance}\"")
model.save() model.save()
......
...@@ -71,7 +71,7 @@ class MaggotTrainer: ...@@ -71,7 +71,7 @@ class MaggotTrainer:
def pad(self, target_t, defined_t, data): def pad(self, target_t, defined_t, data):
if data.shape[0] == 1: if data.shape[0] == 1:
return data return np.repeat(data, len(target_t), axis=0)
else: else:
head = searchsortedfirst(target_t, defined_t[0]) head = searchsortedfirst(target_t, defined_t[0])
tail = len(target_t) - (searchsortedlast(target_t, defined_t[-1]) + 1) tail = len(target_t) - (searchsortedlast(target_t, defined_t[-1]) + 1)
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment