diff --git a/README.md b/README.md index e61bf8059cf1172d37db5257cf168d29956c2415..652dd2fdbea466f8cfddafea12c5559bc2d23bf8 100644 --- a/README.md +++ b/README.md @@ -61,172 +61,18 @@ maggotuba model train --name experiment maggotuba model eval --name experiment ``` -7. Optionally, examine clusters in the latent space : +7. ~Optionally, examine clusters in the latent space~: ```bash maggotuba model cluster --name experiment ``` -8. Compute the embeddings for the whole database. +8. ~Compute the embeddings for the whole database~. ```bash maggotuba model embed --name experiment --n_workers n_workers ``` -9. Compute the MMD matrix for a particular tracker +9. ~Compute the MMD matrix for a particular tracker~ ```bash maggotuba model embed --name experiment --tracker t5 --n_workers n_workers ``` -## Quickstart - legacy - -1. Create a new project folder : `workspace/project` - -2. Your data should be stored in `workspace/larva_dataset` according to the following tree structure - - ```folder tree - workspace - └── larva_dataset - └── t5_t15_point_dynamics - | └── t5_t15_point_dynamics_data - | ├── t_5_point_dynamics - | │ └── point_dynamics_data - | │ ├── LINE_1 - | │ │ ├── protocol_1 - | │ │ │ ├── date_time1 - | │ │ │ │ ├── Point_dynamics_t5_LINE_1_protocol_1_larva_id_date_time1_larva_number_xx.txt - | │ │ │ │ └──... - | │ │ │ └──date_time2 - | │ │ │ └──date_time3 - | │ │ └──protocol_2 - | │ │ └──protocol_3 - | │ └──LINE_2 - | │ │ └──... - | │ └──LINE_3 - | │ └──... - | └──t_15_point_dynamics_data - | └──mutatis mutandis - └──long_trajectories - └──point_dynamics_data - └──t5 - └──... - ``` - - The folder `long_trajectories` can be prepared from the `larva_dataset` using `prepare_long_trajs.py`. - -3. **Optional : if you train on a database that is not t5+t15** - Count the number of samples and store them in an appropriately named file : - - If you want to use larvae that are rescaled by thei length before the stimulus : - - ```bash - python structured-temporal-convolution/src/data/precompute/exhaustive_sample_counting.py 15 --data_dir larva_dataset/t5_t15_point_dynamics/t5_t15_point_dynamics_data/ --output_file test_project/rescaled_counts.npy --rescale - ``` - - If you want to use larvae that are rescaled by their length on the sample window : - - ```bash - python structured-temporal-convolution/src/data/precompute/exhaustive_sample_counting.py 15 --data_dir larva_dataset/t5_t15_point_dynamics/t5_t15_point_dynamics_data/ --output_file test_project/counts.npy - ``` - - Using 15 processes, this took well over 2 hours on my computer. - - Then, copy the output directly in `exhaustive_dataset.py` in lieu of `N_SAMPLES_SCALED` in the first case, `N_SAMPLES_NOT_SCALED`. - - I apologize for the poor choice of vocabulary which hopefully will be fixed in the future. - -4. Create a new set of samples on which the model will be trained : - - Assuming you want the samples to be rescaled by their length on the sample window : - - ```bash - python $HOME/workspace/structured-temporal-convolution/src/data/precompute/build_sample_database.py 36 --data_dir $HOME/workspace/larva_dataset/t5_t15_point_dynamics/t5_t15_point_dynamics_data --sample_dir $HOME/workspace/test_project/t5_t15_balanced_samples --balance_data - - ``` - On the contrary, if you want samples to be scaled according to their length prior to activation : - - ```bash - python $HOME/workspace/structured-temporal-convolution/src/data/precompute/build_sample_database.py 36 --data_dir $HOME/workspace/larva_dataset/t5_t15_point_dynamics/t5_t15_point_dynamics_data --sample_dir $HOME/workspace/test_project/t5_t15_balanced_samples_rescaled --balance_data --rescale - ``` - - Using 36 processes, this should take no more than 30mins. - - The folder structure now looks like : - - ```folder tree - workspace - └── larva_dataset - └── ... - └── test_project - └── t5_t15_balanced_samples - └── traj_20_pred_20 - └── data - ├── t5 - | ├── back - | │ ├── after - | | | ├── HASH1.txt - | | | ├── HASH2.txt - | | | └── ... - | │ ├── before - | │ ├── during - | │ └── setup - | ├── bend - | │ └── ... - | ├── hunch - | │ └── ... - | ├── roll - | │ └── ... - | └── run - | └── ... - └── t15 - └── ... - ``` - -4. Create a folder `workspace/test_project/training_log` - -5. The following code will : - - * create a batches folder - * populate it with batches from the samples folder - * launch a training for 1000 epochs - - For rescaling of the larvae according to the length before aactivation : - - ```bash - python structured-temporal-convolution/train_model.py --batch_dir test_project/t5_t15_balanced_batches_rescaled --optim_iter 1000 --log_dir test_project/training_log --data_dir test_project/t5_t15_balanced_samples_rescaled --batch_size 128 --no_rescale_at_runtime - ``` - - - For rescaling of the larvae according to the mean length during the sample : - - ```bash - python structured-temporal-convolution/train_model.py --batch_dir test_project/t5_t15_balanced_batches --optim_iter 1000 --log_dir test_project/training_log --data_dir test_project/t5_t15_balanced_samples --batch_size 128 - ``` - - The performance of the model is evaluated and figures are created every 100 epochs. - -Your folder structure should now look like this : - ```folder tree - workspace - ├── larva_dataset - | └── ... - └── test_project - ├── t5_t15_balanced_samples - | └── ... - └── training_log - └── DD-MM-AAAA_HH-MM-SS_100_ - ├── params.pt - └── visu - └── balanced_eval - └── bunch of stuff - ``` - -6. Finally, from a trained model, one can recreate all the figures plotted as the final evaluation phase using the following command : - - ```bash - python structured-temporal-convolution/predict_model.py --eval_saved_model test_project/training_log/DATE_TIME_100_ --log_dir test_project/training_log --batch_dir test_project/t5_t15_balanced_batches_rescaled --data_dir larva_dataset/t5_t15_balanced_samples_rescaled --no_rescale_at_runtime - ``` - - - - - - \ No newline at end of file