From 6edda578baf1b5cfe757b6a1aad0c207980bb266 Mon Sep 17 00:00:00 2001
From: jgugliel <julien.guglielmini@pasteur.fr>
Date: Tue, 14 Jun 2022 11:30:56 +0200
Subject: [PATCH] Readme update

---
 README.md | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/README.md b/README.md
index ac4a33d..4356711 100644
--- a/README.md
+++ b/README.md
@@ -33,6 +33,8 @@ chmod +x wGRR*
 ```bash
 ./wGRR -i $fasta [-p $mmseqs2_path -o $output_prefix -t $threads -a $comparisons -T -f]
 ```
+**WARNING**
+Memory consumption can be really high and running `wGRR` might exhaust your system. It is advised to run a test run (with the `-T` flag) first.
 
 ### On an interactive session on Maestro
 wGRR can be used interactively on Maestro. First, you need to allocate resources. For example
@@ -45,14 +47,15 @@ When the resource is available, wGRR can be executed as described above. Note th
 The number of threads $THREADS passed by the `-t` option will be used for both the MMseqs step and the wGRR calculation. 
 
 ### Using `sbatch`
-This is the recommended way of using wGRR. Simply use the `sbatch` command with the proper partition specification. Note that there is no need to allocate more than one CPU (with the `sbatch` option `-c`). For example:
+This is the recommended way of using wGRR for large datasets. Simply use the `sbatch` command with the proper partition specification. Note that there is no need to allocate more than one CPU (with the `sbatch` option `-c`). For example:
 ```bash
 sbatch -p hubbioit ./wGRR -i test_2.prt -t 30
 ```
-This will run wGRR on the file test_2.prt on the hubbioit partition. The MMseqs job will be submitted to the cluster's scheduler with 30 CPUs. Then for the actual wGRR calculation, the required amount of jobs (depending on the value passed with the `-a` option) will be submitted to the queue. If 100 jobs (1 CPU each) are necessary, a job array of 100 jobs will be submitted to the scheduler.
-You can adjust the number of maximum jobs running simultaneously (to avoid using 100% of your partition's CPUs) by using the `-m` option.
+This will run wGRR on the file test_2.prt on the hubbioit partition. If you do not provide a partition, `wGRR` will use Maestro's default, *i.e.* the "common" partition with the "normal" Quality of Service (QoS).  
+The MMseqs job will be submitted to the cluster's scheduler with 30 CPUs (`-t 30`). Then for the actual wGRR calculation, the required amount of jobs (depending on the value passed with the `-a` option) will be submitted to the queue. If 100 jobs (1 CPU each) are necessary, a job array of 100 jobs will be submitted to the scheduler.
+To avoid using 100% of your partition's CPUs you can adjust the number of maximum jobs running simultaneously with the `-m` option.
 
-If you do not have access to a dedicated partition, or if there is not enough free CPUs on your partition, you can try to turn on the `-f` flag. By doing so, the wGRR workers will be submitted to the common and dedicated machines of Maestro, on the "fast" Quality of Service (QoS). Jobs running on the fast QoS have a higher priority (so the workers will start faster) but are limited to 2 hours. Also, using the `-m` parameter is less necessary because you will use a lot of different common resources. But you need to be sure that each worker will end in less than 2 hours - otherwise the run will fail.
+If you do not have access to a dedicated partition, or if there is not enough free CPUs on your partition, you can try to turn on the `-f` flag. By doing so the wGRR workers will be submitted to the common and dedicated machines of Maestro, on the "fast" QoS. Jobs running on the fast QoS have a higher priority (so the workers will start faster) but are limited to 2 hours. Also, using the `-m` parameter is not necessary because in this case you will use a lot of different common resources. But you need to be sure that each worker will end in less than 2 hours otherwise the run will fail.
 
 ### Mandatory parameter
 `$fasta` is a fasta file containing all the proteins of all the elements you want to compare. The protein names **must** be formatted as:
-- 
GitLab