diff --git a/README.md b/README.md index 43567114b549f18e4b516582ea8313040108d8c9..eb189550d0c0fc2ce1cf0ae6dc6b529889fd3586 100644 --- a/README.md +++ b/README.md @@ -10,6 +10,9 @@ where * $`id(A_i,B_i)`$ is the identity score for each BBH pair between element $`A`$ and element $`B`$ * $`min(P_A,P_B)`$ is the number of proteins in the smallest of $`A`$ and $`B`$ elements. +**NOTE** +`wGRR` calculates 3 versions of the wGRR depending on which BBH pairs are considered (numerator) and what proteins should be counted (denominator). See the `Output` section of this manual for more explanations. + ### Dependencies BBH are defined by all versus all protein comparisons using [MMseqs2][1]. @@ -33,25 +36,25 @@ chmod +x wGRR* ```bash ./wGRR -i $fasta [-p $mmseqs2_path -o $output_prefix -t $threads -a $comparisons -T -f] ``` -**WARNING** -Memory consumption can be really high and running `wGRR` might exhaust your system. It is advised to run a test run (with the `-T` flag) first. +**WARNING** +Memory consumption can be really high and running `wGRR` might exhaust your system. It is advised to perform a test run (with the `-T` flag) first. ### On an interactive session on Maestro -wGRR can be used interactively on Maestro. First, you need to allocate resources. For example +wGRR can be used interactively on Maestro. First you need to allocate resources. For example ```bash salloc -p hubbioit -c 10 --mem 20G ``` This will allocate 10 CPUs and 20G of memory of the hubbioit partition. -When the resource is available, wGRR can be executed as described above. Note that you do not need to load any module - wGRR will automatically handle this. -The number of threads $THREADS passed by the `-t` option will be used for both the MMseqs step and the wGRR calculation. +When the resource is available, wGRR can be executed as described above. Note that you do not need to load any module - `wGRR` will automatically handle this. +The number of threads passed via the `-t` option will be used for both the MMseqs step and the wGRR calculation. ### Using `sbatch` This is the recommended way of using wGRR for large datasets. Simply use the `sbatch` command with the proper partition specification. Note that there is no need to allocate more than one CPU (with the `sbatch` option `-c`). For example: ```bash sbatch -p hubbioit ./wGRR -i test_2.prt -t 30 ``` -This will run wGRR on the file test_2.prt on the hubbioit partition. If you do not provide a partition, `wGRR` will use Maestro's default, *i.e.* the "common" partition with the "normal" Quality of Service (QoS). +This will run `wGRR` on the file test_2.prt on the hubbioit partition. If you do not provide a partition, `wGRR` will use Maestro's default, *i.e.* the "common" partition with the "normal" Quality of Service (QoS). The MMseqs job will be submitted to the cluster's scheduler with 30 CPUs (`-t 30`). Then for the actual wGRR calculation, the required amount of jobs (depending on the value passed with the `-a` option) will be submitted to the queue. If 100 jobs (1 CPU each) are necessary, a job array of 100 jobs will be submitted to the scheduler. To avoid using 100% of your partition's CPUs you can adjust the number of maximum jobs running simultaneously with the `-m` option.