Skip to content
Snippets Groups Projects
Commit 28dd8755 authored by Julien  GUGLIELMINI's avatar Julien GUGLIELMINI
Browse files

Readme update

parent 6edda578
No related branches found
No related tags found
No related merge requests found
...@@ -10,6 +10,9 @@ where ...@@ -10,6 +10,9 @@ where
* $`id(A_i,B_i)`$ is the identity score for each BBH pair between element $`A`$ and element $`B`$ * $`id(A_i,B_i)`$ is the identity score for each BBH pair between element $`A`$ and element $`B`$
* $`min(P_A,P_B)`$ is the number of proteins in the smallest of $`A`$ and $`B`$ elements. * $`min(P_A,P_B)`$ is the number of proteins in the smallest of $`A`$ and $`B`$ elements.
**NOTE**
`wGRR` calculates 3 versions of the wGRR depending on which BBH pairs are considered (numerator) and what proteins should be counted (denominator). See the `Output` section of this manual for more explanations.
### Dependencies ### Dependencies
BBH are defined by all versus all protein comparisons using [MMseqs2][1]. BBH are defined by all versus all protein comparisons using [MMseqs2][1].
...@@ -33,25 +36,25 @@ chmod +x wGRR* ...@@ -33,25 +36,25 @@ chmod +x wGRR*
```bash ```bash
./wGRR -i $fasta [-p $mmseqs2_path -o $output_prefix -t $threads -a $comparisons -T -f] ./wGRR -i $fasta [-p $mmseqs2_path -o $output_prefix -t $threads -a $comparisons -T -f]
``` ```
**WARNING** **WARNING**
Memory consumption can be really high and running `wGRR` might exhaust your system. It is advised to run a test run (with the `-T` flag) first. Memory consumption can be really high and running `wGRR` might exhaust your system. It is advised to perform a test run (with the `-T` flag) first.
### On an interactive session on Maestro ### On an interactive session on Maestro
wGRR can be used interactively on Maestro. First, you need to allocate resources. For example wGRR can be used interactively on Maestro. First you need to allocate resources. For example
```bash ```bash
salloc -p hubbioit -c 10 --mem 20G salloc -p hubbioit -c 10 --mem 20G
``` ```
This will allocate 10 CPUs and 20G of memory of the hubbioit partition. This will allocate 10 CPUs and 20G of memory of the hubbioit partition.
When the resource is available, wGRR can be executed as described above. Note that you do not need to load any module - wGRR will automatically handle this. When the resource is available, wGRR can be executed as described above. Note that you do not need to load any module - `wGRR` will automatically handle this.
The number of threads $THREADS passed by the `-t` option will be used for both the MMseqs step and the wGRR calculation. The number of threads passed via the `-t` option will be used for both the MMseqs step and the wGRR calculation.
### Using `sbatch` ### Using `sbatch`
This is the recommended way of using wGRR for large datasets. Simply use the `sbatch` command with the proper partition specification. Note that there is no need to allocate more than one CPU (with the `sbatch` option `-c`). For example: This is the recommended way of using wGRR for large datasets. Simply use the `sbatch` command with the proper partition specification. Note that there is no need to allocate more than one CPU (with the `sbatch` option `-c`). For example:
```bash ```bash
sbatch -p hubbioit ./wGRR -i test_2.prt -t 30 sbatch -p hubbioit ./wGRR -i test_2.prt -t 30
``` ```
This will run wGRR on the file test_2.prt on the hubbioit partition. If you do not provide a partition, `wGRR` will use Maestro's default, *i.e.* the "common" partition with the "normal" Quality of Service (QoS). This will run `wGRR` on the file test_2.prt on the hubbioit partition. If you do not provide a partition, `wGRR` will use Maestro's default, *i.e.* the "common" partition with the "normal" Quality of Service (QoS).
The MMseqs job will be submitted to the cluster's scheduler with 30 CPUs (`-t 30`). Then for the actual wGRR calculation, the required amount of jobs (depending on the value passed with the `-a` option) will be submitted to the queue. If 100 jobs (1 CPU each) are necessary, a job array of 100 jobs will be submitted to the scheduler. The MMseqs job will be submitted to the cluster's scheduler with 30 CPUs (`-t 30`). Then for the actual wGRR calculation, the required amount of jobs (depending on the value passed with the `-a` option) will be submitted to the queue. If 100 jobs (1 CPU each) are necessary, a job array of 100 jobs will be submitted to the scheduler.
To avoid using 100% of your partition's CPUs you can adjust the number of maximum jobs running simultaneously with the `-m` option. To avoid using 100% of your partition's CPUs you can adjust the number of maximum jobs running simultaneously with the `-m` option.
......
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment