Snippets Groups Projects

quickstart.rst

updated doc with trackmate and custom variables

amichaut authored 3 years ago

7c48ea18

7c48ea18 3 years ago

quickstart.rst 16.72 KiB

Quickstart

Data requirements

The mandatory input file of Track Analyzer is a data table (a csv or txt file) of tracks containing the position coordinates (in 2D or 3D) along time and the tracks identifiers. Optionally, data can be plotted on the original image provided as a 3D or 4D tiff stack (ie. 2D+time or 3D+time). If the format of your movie is different (list of images), please convert it to tiff stack using Fiji for instance.

The position file must contain columns with the x, y, (z) positions, a frame column and track id column. The positions coordinates can be in pixels or in scaled data. The information about the scaling and other metadata such as time and length scales will be provided by the user through the graphical interface.

If Track Analyzer is run through the Jupyter notebook (see below). The position file format is flexible and the user will be asked to interactively identify all the mandatory columns. You can specify that the position file was generated by Trackmate. The columns are then automatically identified.

If Track Analyzer is run through Galaxy (see below). The position file format has to strictly follow the default column names: x, y, (z), frame, track. However, if the position file was generated by Trackmate, then the columns are automatically identified.

If Track Analyzer is run using command lines (see below), the data directory must contain:

a comma-separated csv file named positions.csv which column names are: x, y, (z), frame, track
a text file named info.txt containing the metadata (see example)
(optional) a tiff file named stack.tif
(optional) configuration files in a config directory

The default config files can be generated by running TA_config path_to_directory. The config files are csv files that can be easily edited.

Running the pipeline

There are three ways of running Track Analyzer. Two user-friendly versions are available: an installation-free web-based tool run on Galaxy, and full version run on a user-friendly Jupyter notebook. On both versions, Track Analyzer can be run without any programming knowledge using its graphical interface. The full version interface is launched by running a Jupyter notebook containing widgets allowing the user to load data and set parameters without writing any code. Track Analyzer can also directly be run using command lines (if you need to run if on a distant machine such as a cluster).

Using Galaxy (recommended for a first trial)

The installation-free online version is available here. It is run on the web-base platform Galaxy, which is easy to use (some documentation regarding Galaxy is available here). This online version is slightly limited compared to the full version run on Jupyter notebook. Jupyter notebook offers 3D visualization and hand-drawing data selection using a Napari viewer. Moreover, loaded data are computed step by step throughout the pipeline, which provides the user with better interactivity with the data. Conversely, on Galaxy, the user needs to enter numerical parameters before the analysis can be run.

Complete documentation about Galaxy is available here. Here's a quick overview of Galaxy's interface.

Upload your data to Galaxy. If you want to keep track of your history of analysis, you can create a user account.
Choose your input files that were previously uploaded.
Enter the parameters necessary to your analysis.
Hit the execution button to launch the execution on Galaxy's cluster.
You can find in the history panel all the output of each analysis job. For each of the output elements, you can have a quick look (6), or save it (7). Note that when you display output plots, it is not very intuitive how to display again the main interface. The double arrow 'Run this job again' button (8) displayed on every log file, is then useful. If you press the 'Run this job again' button, the interface will be displayed with the exact same set of parameters as the corresponding job.

Using a Jupyter notebook (recommended for advanced options)

The full version can be run using a Jupyter notebook. Documentation about Jupyter notebooks can be found here. Briefly, a notebook comprises a series of 'cells' which are blocks of Python code to be executed. Each cell can be run by pressing Shift+Enter. Each cell will execute a piece of code generating the pipeline graphical interface. They all depend on each other, therefore, they MUST be run in order. By default, the code of each cell is hidden but it can be shown by pressing the button at the top of the notebook: 'Click here to toggle on/off the raw code'. Once the code is hidden, you might miss a cell. This is a common explanation if you get an error. If this happens, start the pipeline again a couple of cells above.

To launch a notebook:

the notebook is at the root of the git repository, or you can just download it here: :download:`run_TA.ipynb <../_static/run_TA.ipynb>` .
go to the project folder, or where you downloaded the notebook by running cd <path_to_the_project_folder> in a terminal
activate the environment: run conda activate <env_name> (see :ref:`installation<installation>` for more details)
launch a Jupyter notebook, run jupyter notebook
a web browser opens, click on run_TA.ipynb
to shut down the notebook, press CTRL+C in the terminal.

Using command lines (only if you need to run it on a distant machine)

If you need to run Track Analyzer from a terminal without any graphical interface, it is possible, but you won't beneficiate from the interactive modules. Data filtering and analysis parameters will need to be passed through config files (see examples). Track Analyzer comes with two commands:

traj_analysis which runs the trajectory analysis section (see below). It takes as arguments: path to data directory (optional: use the flag -r or --refresh to refresh the database)
map_analysis which runs the map analysis section (see below). It takes as arguments: path to data directory (optional: use the flag -r or --refresh to refresh the database)

Analysis procedure

Track Analyzer contains a data filtering section and three main analysis sections. This section describes the procedure for the full version, but the installation-free version on Galaxy is very similar, only a few options aren't available.

Load data

Just follow the instructions on the graphical interface (on the Jupyter notebook or Galaxy), to load your data files.

If you run Track Analyzer for the first time, enter the metadata.

If Track Analyzer is run through the Jupyter notebook, you can additionally select custom columns of variables you might want to plot. You can then have to type in their names and units to be displayed on the plots.

You can also set some plotting parameters such as image file format, colors to be used, image resolution, etc.

Data filtering section

Subsets of the datasets can be filtered on spatiotemporal criteria: x, y, z positions, time subset and track duration. A drawing tool also offers the possibility to hand-draw regions of interest.

Additionally, specific trajectories can be selected by using their position in a region of interest at a specific time. This feature can be useful to inspect either their past (back-tracking) or their future (fate-mapping). Trajectories can also be selected just using their ids.

These subsets can then be analyzed separately. The analysis will be run independently on each of them. Alternatively, they can be analyzed together. Trajectories and computed quantities will then be plotted together using color-coding.

Trajectory analysis section

Trajectories can be plotted over the original image, frame by frame, with some custom color-coding (z color-coded, t color-coded, subset, or random). The total trajectories can also be plotted together with the option to center their origin. This can be useful to detect some patterns in trajectories.

Several quantities can be computed and plotted: velocities and acceleration (spatial components and their modulus). The local cell density can be estimated by performing a Voronoi tesselation. The Voronoi diagram can be plotted and the area of each Voronoi cell can be calculated and plotted. Currently, only the Voronoi tesselation in 2D (even if the data are 3D) is available. If Track Analyzer is run through the Jupyter notebook, you can also plot other variables you selected.

All these quantities can also be averaged over the whole trajectory and plotted.

Trajectories can also be quantified using the Mean Squared Displacement (MSD) analysis. The MSD can be plotted and fitted with some diffusion models to compute the diffusion coefficient.

Map analysis section

Data can be averaged on a regular grid to produce maps of such quantities. Two kinds of maps can be plotted: vector fields and scalar fields.

Vector fields

Velocity and acceleration vectors can be plotted on 2D maps. If 3D data, the z dimension can be color-coded. Such maps can be superimposed on a scalar field.

Scalar fields

The velocity and acceleration components and moduli can be plotted as color-coded maps. The vector average moduli can also be computed. The difference between the velocity mean and the vector average modulus is that the velocity mean is the mean over all velocities in the grid unit, while the vector average modulus is the modulus of the vector averaged in the grid unit. Divergence (contraction and expansion) maps, and curl (rotation) maps can also be plotted.

Comparator section

Previously generated data by the trajectory analysis section can be compared by plotting parameters together on the same plots.

Output

Track Analyzer generates several files, plots, data points, and configuration files.

Database and configuration files

Some files that are necessary to the processing are generated when the pipeline is executed:

data_base.p is a binary collection of python objects generated when the initial tracking file is loaded. It allows the initial loading to be skipped if the pipeline is run several times on the same tracking data. It can be refreshed if necessary.
info.txt is a text file containing important metadata: 'lengthscale', 'timescale', 'z_step', 'image_width', 'image_height', 'length_unit', 'time_unit', 'table_unit', 'separator'. It can be interactively generated using the notebook.
if the original image stack is 4D (3D+t), a stack_maxproj.tif is generated by performing a maximum projection over the z dimension, so a 2D image can be used for 2D based plotting

Data output

The trajectory analysis and the map analysis are generated respectively in a traj_analysis and map_analysis directory. Each subset's analysis is saved in a new folder.

In each subset's directory:

a config folder is generated with the configuration parameters used for this specific analysis
all_data.csv stores the subset's table of positions
track_prop.csv stores the averaged quantities along trajectories
each plot is saved using an image format, size and resolution that can be customized. Additionally, the default colors and color maps can be customized in the plotting parameters sections.
the data points of each plot is saved in a csv file with the same name as the image file, so you can replot the data using your favorite plotting software

Examples

Real data

You can get familiar with Track Analyzer by running it on example data. For instance, you can analyze data of a C. elegans developing embryo provided by the cell tracking challenge. Download the data directory containing trajectories and metadata (these positions were extracted following napari's tutorial):

:download:`cell tracking challenge <../_static/example/Fluo-N3DH-CE.tar.gz>`

Additionally, you can download the original timelapse for optimal visualization. Download the dataset. And run the following :download:`python script <../_static/example/load_tracking.py>` to extract the image and generate a single tiff file that you can use during the analysis. To run the script, open a terminal and run:

pip install imagecodecs
cd <path_to_script_folder>
python load_tracking.py <path_to_dataset_folder>

You can also generate the positions by adding the flag -p, it will generate the positions.csv file present in the archive.

Warning: if you try to open the generated tiff file with Fiji, you will see that the t and z dimensions are not separated. You will have to run "stack to hyperstack" with z=35 and t=195. But this is only if you want to see the file in Fiji, you don't need to do this for Track Analyzer!

Synthetic data

You can also analyze synthetic data that were generated to ensure that the analysis performed by Track Analyzer is correct. You can download several datasets :download:`here <../_static/example/synthetic_data.tar.gz>`. They all have a param.csv with the input values for each trajectory.

Troubleshooting

The 3D visualization and the drawing selection tool depend on the napari package. The installation of this package can lead to issues depending on your system. If you are not able to solve this installation, you will not be able to have access to 3D rendering. However, you will still be able to use Track Analyzer without the drawing tool, by using coordinates sliders in the graphical interface.