* $`id(A_i,B_i)`$ is the identity score for each BBH pair between element $`A`$ and element $`B`$
* $`id(A_i,B_i)`$ is the identity score for each BBH pair between element $`A`$ and element $`B`$
* $`min(P_A,P_B)`$ is the number of proteins in the smallest of $`A`$ and $`B`$ elements.
* $`min(P_A,P_B)`$ is the number of proteins in the smallest of $`A`$ and $`B`$ elements.
**NOTE**
`wGRR` calculates 3 versions of the wGRR depending on which BBH pairs are considered (numerator) and what proteins should be counted (denominator). See the `Output` section of this manual for more explanations.
### Dependencies
### Dependencies
BBH are defined by all versus all protein comparisons using [MMseqs2][1].
BBH are defined by all versus all protein comparisons using [MMseqs2][1].
Memory consumption can be really high and running `wGRR` might exhaust your system. It is advised to run a test run (with the `-T` flag) first.
Memory consumption can be really high and running `wGRR` might exhaust your system. It is advised to perform a test run (with the `-T` flag) first.
### On an interactive session on Maestro
### On an interactive session on Maestro
wGRR can be used interactively on Maestro. First, you need to allocate resources. For example
wGRR can be used interactively on Maestro. First you need to allocate resources. For example
```bash
```bash
salloc -p hubbioit -c 10 --mem 20G
salloc -p hubbioit -c 10 --mem 20G
```
```
This will allocate 10 CPUs and 20G of memory of the hubbioit partition.
This will allocate 10 CPUs and 20G of memory of the hubbioit partition.
When the resource is available, wGRR can be executed as described above. Note that you do not need to load any module - wGRR will automatically handle this.
When the resource is available, wGRR can be executed as described above. Note that you do not need to load any module - `wGRR` will automatically handle this.
The number of threads $THREADS passed by the `-t` option will be used for both the MMseqs step and the wGRR calculation.
The number of threads passed via the `-t` option will be used for both the MMseqs step and the wGRR calculation.
### Using `sbatch`
### Using `sbatch`
This is the recommended way of using wGRR for large datasets. Simply use the `sbatch` command with the proper partition specification. Note that there is no need to allocate more than one CPU (with the `sbatch` option `-c`). For example:
This is the recommended way of using wGRR for large datasets. Simply use the `sbatch` command with the proper partition specification. Note that there is no need to allocate more than one CPU (with the `sbatch` option `-c`). For example:
```bash
```bash
sbatch -p hubbioit ./wGRR -i test_2.prt -t 30
sbatch -p hubbioit ./wGRR -i test_2.prt -t 30
```
```
This will run wGRR on the file test_2.prt on the hubbioit partition. If you do not provide a partition, `wGRR` will use Maestro's default, *i.e.* the "common" partition with the "normal" Quality of Service (QoS).
This will run `wGRR` on the file test_2.prt on the hubbioit partition. If you do not provide a partition, `wGRR` will use Maestro's default, *i.e.* the "common" partition with the "normal" Quality of Service (QoS).
The MMseqs job will be submitted to the cluster's scheduler with 30 CPUs (`-t 30`). Then for the actual wGRR calculation, the required amount of jobs (depending on the value passed with the `-a` option) will be submitted to the queue. If 100 jobs (1 CPU each) are necessary, a job array of 100 jobs will be submitted to the scheduler.
The MMseqs job will be submitted to the cluster's scheduler with 30 CPUs (`-t 30`). Then for the actual wGRR calculation, the required amount of jobs (depending on the value passed with the `-a` option) will be submitted to the queue. If 100 jobs (1 CPU each) are necessary, a job array of 100 jobs will be submitted to the scheduler.
To avoid using 100% of your partition's CPUs you can adjust the number of maximum jobs running simultaneously with the `-m` option.
To avoid using 100% of your partition's CPUs you can adjust the number of maximum jobs running simultaneously with the `-m` option.