Skip to content
Snippets Groups Projects
Select Git revision
  • make_dataset
  • main default protected
  • dev
  • v0.19
  • v0.18.2
  • v0.18.1
  • v0.18
  • v0.17
  • v0.16
  • v0.15.3
  • v0.15.2
  • v0.15.1
  • v0.15
  • v0.14.1
  • v0.14
  • v0.13.1
  • v0.13
  • v0.12.4
  • v0.12.3
  • v0.12.2
  • v0.12.1
  • v0.12
  • v0.11.1
23 results

TaggingBackends

  • Clone with SSH
  • Clone with HTTPS
  • TaggingBackends

    Build Status Coverage

    This library helps to implement automatic tagging backends for the Nyx larva tagger.

    Template project for tagging backends

    A tagging backend, called e.g. TaggingBackend, is a Python project with the following directory structure:

    ├── LICENSE                          <- Default is MIT.
    ├── README.md                        <- Project description.
    ├── data
    │   ├── raw                          <- The input data are accessible from this
    │   │                                   directory, with their original file structure.
    │   ├── interim                      <- Preprocessed data and extracted features
    │   │                                   can be stored in this directory.
    │   └── processed                    <- Predicted labels from predict_model.py are
    │                                       expected in this directory.
    ├── models                           <- Hyperparameters and weights of trained
    │                                       classifiers can be stored here.
    
    ├── pyproject.toml                   <- Project definition file for Poetry.
    ├── src
    │   └── taggingbackend               <- Python package name.
    │       │                               Same as project name, with lowercase letters,
    │       │                               hyphens converted into underscores.
    │       │                               For example, `My-Tagger` becomes `my_tagger`.
    │       ├── __init__.py              <- Defines variable `__version__`.
    │       ├── data
    │       │   └── make_dataset.py      <- Picks and converts raw data files; if files
    │       │                               are to be written, they go into data/interim;
    │       │                               optional.
    │       ├── features
    │       │   └── build_features.py    <- Extracts and saves features to file in
    │       │                               data/interim; optional.
    │       └── models
    │           ├── train_model.py       <- Trains the behavior tagging algorithm and
    │           │                           stores the trained model in models/;
    │           │                           optional.
    │           └── predict_model.py     <- Loads the trained model and features from
    │                                       data/interim, and moves the resulting
    │                                       labels in data/processed.
    └── test
        ├── __init__.py                  <- Empty file.
        └── test_taggingbackend.py       <- Automated tests; optional.
                                            Filename is `test_<package_name>.py`.

    The above structure borrows elements from the Cookiecutter Data Science project template, adapted for use with Poetry.

    The src/<package_name>/{data,features,models} directories can accommodate Python modules (in subpackages <package_name>.{data,features,models} respectively). For example, the model can be implemented as a Python class in an additional file in src/<package_name>/models, e.g. mymodel.py. In this case, an empty __init__.py file should be created in the same directory.

    As the Python package is installed, this custom module will be loadable from anywhere with import <package_name>.models.mymodel.

    On the other hand, the make_dataset.py, build_features.py, predict_model.py and train_model.py are Python scripts, with a main program. These scripts will be run using Poetry, from the project root.

    See example scripts in the examplebackend directory.

    Only predict_model.py is required by the Nyx tagging UI.

    The simplest working directory structure for a tagging backend is:

    ├── models/
    │   └── trained_model/
    ├── pyproject.toml
    └── scripts/
        └── predict_model.py

    with trained_model the name of the trained model. Backends that do not need to store trained models should still have an empty subdirectory there, as these subdirectories in models are looked for by the Nyx tagger UI.

    The data directory is automatically created by the BackendExplorer object, together with its raw and processed subdirectories, therefore there is no need to include these directories in the backend.

    Although the Nyx tagger UI does not expect the project to include a Python package, a Poetry-managed virtual environment should be set up with the taggingbackends package installed, so that the command poetry run tagging-backend is available at the project root directory.

    The tests directory is renamed test for compatibility with Julia projects. Python/Poetry do not need additional configuration to properly handle the tests.

    Input and output data

    Per default, the input data will be copied into data/raw. Input data files can be in any format, and a backend is responsible for handling these various files. Per default, training labels are provided as json files that can be loaded using the taggingbackends.labels.Labels class.

    Predicted labels are expected in data/processed.

    A backend can make use of data/interim or ignore it. Similarly, the trained models can be stored in models or not. However, a subdirectory by the name of the model instance should be created for model discovery (see below).

    Full paths to data and model directories, and files, are made available in the training and prediction procedures with a taggingbackends.explorer.BackendExplorer object.

    Labels

    The internal representation is as follows:

    • dictionary of run identifiers (str, typically date_time) as keys and, as values:
      • dictionary of larva identifiers (int) as keys and, as values:
        • dictionary of timestamps (float) as keys and discrete behavioral states (str) as values.

    Labels are encapsulated in a dedicated datatype that also stores metadata and information about labels (names, colors).

    See taggingbackends.data.labels.Labels, and an example json labels file.

    Model specification

    A backend can train and use multiple model instances. Each instance is assigned an identifier and the related files, including data files, are actually stored in corresponding subdirectories in the data/raw, data/interim, data/processed and models directories.

    Per default, a new model is identified with a timestamp in the YYYYMMDD_HHMMSS format. For example, the taggingbackends.explorer.BackendExplorer.list_input_files method will seek for files in the data/raw/<instance_identifier> directory only.

    A backend can store the model in the models/<instance_identifier> directory (please call the taggingbackends.explorer.BackendExplorer.model_dir method to get the exact location) or not. However, the directory should be created to make the trained models discoverable. Indeed, the Nyx tagging UI will seek for subdirectories in the models directory to list the available trained models.