diff --git a/README.md b/README.md index 4f08624aa46f221c4ee637f4398f0e8c3b646c1b..c82be44a144df7eb3db98fb9349bcc8e75a8a0e4 100644 --- a/README.md +++ b/README.md @@ -19,8 +19,11 @@ A tagging backend, called *e.g.* `TaggingBackend`, is a Python project with the │ │ can be stored in this directory. │ └── processed <- Predicted labels from predict_model.py are │ expected in this directory. +│ ├── models <- Hyperparameters and weights of trained │ classifiers can be stored here. +├── pretrained_models <- Partially trained models the training procedure +│ starts from; optional. │ ├── pyproject.toml <- Project definition file for Poetry. ├── src @@ -40,6 +43,10 @@ A tagging backend, called *e.g.* `TaggingBackend`, is a Python project with the │ ├── train_model.py <- Trains the behavior tagging algorithm and │ │ stores the trained model in models/; │ │ optional. +│ ├── finetune_model.py <- Further trains the behavior tagging algorithm +│ │ and stores the retrained model as a new model +│ │ instance in models/; optional. +│ │ *Available since version 0.14*. │ └── predict_model.py <- Loads the trained model and features from │ data/interim, and moves the resulting │ labels in data/processed. @@ -51,7 +58,8 @@ A tagging backend, called *e.g.* `TaggingBackend`, is a Python project with the The above structure borrows elements from the [Cookiecutter Data Science](https://drivendata.github.io/cookiecutter-data-science/) project template, adapted for use with [Poetry](https://python-poetry.org/). -The `src/<package_name>/{data,features,models}` directories can accommodate Python modules (in subpackages `<package_name>.{data,features,models}` respectively). +The `src/<package_name>/{data,features,models}` directories can accommodate Python modules +(in subpackages `<package_name>.{data,features,models}` respectively). For example, the model can be implemented as a Python class in an additional file in `src/<package_name>/models`, *e.g.* `mymodel.py`. In this case, an empty `__init__.py` file should be created in the same directory. @@ -59,8 +67,29 @@ In this case, an empty `__init__.py` file should be created in the same director As the Python package is installed, this custom module will be loadable from anywhere with `import <package_name>.models.mymodel`. -On the other hand, the `make_dataset.py`, `build_features.py`, `predict_model.py` and `train_model.py` are Python scripts, with a main program. -These scripts will be run using Poetry, from the project root. +On the other hand, the `make_dataset.py`, `build_features.py`, `predict_model.py`, +`train_model.py` and `finetune_model.py` are Python scripts, with a main program. +These scripts are run using Poetry, from the project root. +More exactly, although the Nyx tagging UI does not expect the backend to be a Python +project, the backend should be set a Poetry-managed virtual environment with the +`taggingbackends` package installed as a dependency, so that the backend can be operated +calling `poetry run tagging-backend [train|predict|finetune]`, which in turn +calls the above-mentioned Python scripts. + +*New in version 0.14*, fine-tuning: `finetune_model.py` differs from `train_model.py` as +it takes an existing trained model and further trains it. In contrast, `train_model.py` +trains a model from data only or a so-called *pretrained model*. + +For example, MaggotUBA-adapter trains a classifier on top of a pretrained encoder. +In this particular backend, `train_model.py` picks a pretrained encoder in the +`pretrained_models` directory and saves the resulting model (encoder+classifier) in the +`models` directory. `finetune_model.py` instead picks a model from the `models` directory +and saves the retrained model in `models` as well, under a different name (subdirectory). + +Note that the `pretrained_models` directory is included more for explanatory purposes. +It is not expected or checked for by the TaggingBackends logic, unlike all the other +directories and scripts mentioned above. The `pretrained_models` directory was introduced +by MaggotUBA-adapter. See example scripts in the `examplebackend` directory. @@ -82,8 +111,6 @@ as these subdirectories in `models` are looked for by the Nyx tagger UI. The `data` directory is automatically created by the `BackendExplorer` object, together with its `raw` and `processed` subdirectories, therefore there is no need to include these directories in the backend. -Although the Nyx tagger UI does not expect the project to include a Python package, a Poetry-managed virtual environment should be set up with the `taggingbackends` package installed, so that the command `poetry run tagging-backend` is available at the project root directory. - The `tests` directory is renamed `test` for compatibility with Julia projects. Python/Poetry do not need additional configuration to properly handle the tests.