Skip to content
Snippets Groups Projects
Commit 09f6894b authored by François  LAURENT's avatar François LAURENT
Browse files

finetuning documented

parent e7838f74
No related branches found
No related tags found
1 merge request!1Fine-tuning
Pipeline #106344 failed
...@@ -19,8 +19,11 @@ A tagging backend, called *e.g.* `TaggingBackend`, is a Python project with the ...@@ -19,8 +19,11 @@ A tagging backend, called *e.g.* `TaggingBackend`, is a Python project with the
│ │ can be stored in this directory. │ │ can be stored in this directory.
│ └── processed <- Predicted labels from predict_model.py are │ └── processed <- Predicted labels from predict_model.py are
│ expected in this directory. │ expected in this directory.
├── models <- Hyperparameters and weights of trained ├── models <- Hyperparameters and weights of trained
│ classifiers can be stored here. │ classifiers can be stored here.
├── pretrained_models <- Partially trained models the training procedure
│ starts from; optional.
├── pyproject.toml <- Project definition file for Poetry. ├── pyproject.toml <- Project definition file for Poetry.
├── src ├── src
...@@ -40,6 +43,10 @@ A tagging backend, called *e.g.* `TaggingBackend`, is a Python project with the ...@@ -40,6 +43,10 @@ A tagging backend, called *e.g.* `TaggingBackend`, is a Python project with the
│ ├── train_model.py <- Trains the behavior tagging algorithm and │ ├── train_model.py <- Trains the behavior tagging algorithm and
│ │ stores the trained model in models/; │ │ stores the trained model in models/;
│ │ optional. │ │ optional.
│ ├── finetune_model.py <- Further trains the behavior tagging algorithm
│ │ and stores the retrained model as a new model
│ │ instance in models/; optional.
│ │ *Available since version 0.14*.
│ └── predict_model.py <- Loads the trained model and features from │ └── predict_model.py <- Loads the trained model and features from
│ data/interim, and moves the resulting │ data/interim, and moves the resulting
│ labels in data/processed. │ labels in data/processed.
...@@ -51,7 +58,8 @@ A tagging backend, called *e.g.* `TaggingBackend`, is a Python project with the ...@@ -51,7 +58,8 @@ A tagging backend, called *e.g.* `TaggingBackend`, is a Python project with the
The above structure borrows elements from the [Cookiecutter Data Science](https://drivendata.github.io/cookiecutter-data-science/) project template, adapted for use with [Poetry](https://python-poetry.org/). The above structure borrows elements from the [Cookiecutter Data Science](https://drivendata.github.io/cookiecutter-data-science/) project template, adapted for use with [Poetry](https://python-poetry.org/).
The `src/<package_name>/{data,features,models}` directories can accommodate Python modules (in subpackages `<package_name>.{data,features,models}` respectively). The `src/<package_name>/{data,features,models}` directories can accommodate Python modules
(in subpackages `<package_name>.{data,features,models}` respectively).
For example, the model can be implemented as a Python class in an additional file in For example, the model can be implemented as a Python class in an additional file in
`src/<package_name>/models`, *e.g.* `mymodel.py`. `src/<package_name>/models`, *e.g.* `mymodel.py`.
In this case, an empty `__init__.py` file should be created in the same directory. In this case, an empty `__init__.py` file should be created in the same directory.
...@@ -59,8 +67,29 @@ In this case, an empty `__init__.py` file should be created in the same director ...@@ -59,8 +67,29 @@ In this case, an empty `__init__.py` file should be created in the same director
As the Python package is installed, this custom module will be loadable from anywhere As the Python package is installed, this custom module will be loadable from anywhere
with `import <package_name>.models.mymodel`. with `import <package_name>.models.mymodel`.
On the other hand, the `make_dataset.py`, `build_features.py`, `predict_model.py` and `train_model.py` are Python scripts, with a main program. On the other hand, the `make_dataset.py`, `build_features.py`, `predict_model.py`,
These scripts will be run using Poetry, from the project root. `train_model.py` and `finetune_model.py` are Python scripts, with a main program.
These scripts are run using Poetry, from the project root.
More exactly, although the Nyx tagging UI does not expect the backend to be a Python
project, the backend should be set a Poetry-managed virtual environment with the
`taggingbackends` package installed as a dependency, so that the backend can be operated
calling `poetry run tagging-backend [train|predict|finetune]`, which in turn
calls the above-mentioned Python scripts.
*New in version 0.14*, fine-tuning: `finetune_model.py` differs from `train_model.py` as
it takes an existing trained model and further trains it. In contrast, `train_model.py`
trains a model from data only or a so-called *pretrained model*.
For example, MaggotUBA-adapter trains a classifier on top of a pretrained encoder.
In this particular backend, `train_model.py` picks a pretrained encoder in the
`pretrained_models` directory and saves the resulting model (encoder+classifier) in the
`models` directory. `finetune_model.py` instead picks a model from the `models` directory
and saves the retrained model in `models` as well, under a different name (subdirectory).
Note that the `pretrained_models` directory is included more for explanatory purposes.
It is not expected or checked for by the TaggingBackends logic, unlike all the other
directories and scripts mentioned above. The `pretrained_models` directory was introduced
by MaggotUBA-adapter.
See example scripts in the `examplebackend` directory. See example scripts in the `examplebackend` directory.
...@@ -82,8 +111,6 @@ as these subdirectories in `models` are looked for by the Nyx tagger UI. ...@@ -82,8 +111,6 @@ as these subdirectories in `models` are looked for by the Nyx tagger UI.
The `data` directory is automatically created by the `BackendExplorer` object, together with its `raw` and `processed` subdirectories, therefore there is no need to include these directories in the backend. The `data` directory is automatically created by the `BackendExplorer` object, together with its `raw` and `processed` subdirectories, therefore there is no need to include these directories in the backend.
Although the Nyx tagger UI does not expect the project to include a Python package, a Poetry-managed virtual environment should be set up with the `taggingbackends` package installed, so that the command `poetry run tagging-backend` is available at the project root directory.
The `tests` directory is renamed `test` for compatibility with Julia projects. The `tests` directory is renamed `test` for compatibility with Julia projects.
Python/Poetry do not need additional configuration to properly handle the tests. Python/Poetry do not need additional configuration to properly handle the tests.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment