tempo save. Remain to add new zenodo output link

41615fb0 · Gael MILLOT · 6bbb814f · 41615fb0 · 41615fb0 · 41615fb0
Commit 41615fb0 authored 2 months ago by Gael MILLOT
--- a/README.md
+++ b/README.md
@@ -200,8 +200,9 @@ chmod 755 bin/*.*
 <br /><br />
 ## OUTPUT

-
-An example of results obtained with the dataset is present at this address: https://zenodo.org/records/14509864/files/repertoire_profiler_1734454432.zip
+By default, all the results are returned in a *result* folder where the *main.nf* executed file is located (created if does not exist). This can be changed using the *out_path_ini* parameter of the *nextflow.config* file. By default, each execution produces a new folder named *repertoire_profiler_<ID>*, created inside the *result* folder and containing all the outputs of the execution. The name of the folder can be changed using the *result_folder_name* parameter of the *nextflow.config* file. The new name file be followed by an <ID> in all cases.
+<br /><br />
+An example of results obtained with the dataset is present at this address: https://zenodo.org/records/14509864/files/repertoire_profiler_1734454432.zip .
 <br /><br />
 Complete informations are in the Protocol 144-rev0 Ig clustering - Immcantation.docx (contact Gael Millot).
 <br /><br />
@@ -298,6 +299,14 @@ Special acknowledgement to [Kenneth Hoehn](https://medicine.yale.edu/profile/ken
 <br /><br />
 ## WHAT'S NEW IN

+#### v11.5
+
+- nextflow.config file improved for users
+- README file updated
+- Title of donut improved
+- Zenodo output link updated
+
+
 #### v11.4

 - Zenodo input link updated

--- a/bin/donut.R
+++ b/bin/donut.R
@@ -415,7 +415,7 @@ tempo.title <- paste0(
            ifelse(
                kind == "annotated", 
                paste0(
-                    "Donut plot of the all-passed sequences grouped by same V and J alleles, for which at least one annotation\nWarning: this is different from clonal groups since the latter must have also the same CDR3 length\n\n",
+                    "Donut plot of the all-passed sequences grouped by same V and J alleles, for which at least one name replacement is present\n(according to the meta_name_replacement parameter of the nextflow.config file)\nWarning: this is different from clonal groups since the latter must have also the same CDR3 length\n\n",
                    "Kind of sequences: ", 
                    "all the all-passed ones (see the corresponding all-passed_seq.tsv"
                ), 

--- a/nextflow.config
+++ b/nextflow.config
@@ -3,7 +3,7 @@
 ##                                                                     ##
 ##     repertoire_profiler.config                                      ##
 ##                                                                     ##
-##     gmillot A. Millot                                                  ##
+##     gmillot A. Millot                                               ##
 ##     Bioinformatics and Biostatistics Hub                            ##
 ##     Computational Biology Department                                ##
 ##     Institut Pasteur Paris                                          ##
@@ -19,16 +19,30 @@
 #########################################################################
 */

+
 /*
-##########################
-##                      ##
-##     Ig Clustering    ##
-##                      ##
-##########################
+########################
+##                    ##
+##     Data           ##
+##                    ##
+########################
 */

 env {
    sample_path = "https://zenodo.org/records/14509916/files/ig_clustering_test_1_VH.zip" // single character string of the path of the fasta files directory. The last / can be added or not, as it is removed by nextflow file(). Can also be a .zip file that contains only fasta files. Warning: the fasta names must not start by a digit, before alakazam::readChangeoDb() function is fixed by the maintainer. Example : sample_path="/mnt/c/Users/gmillot/Documents/Git_projects/repertoire_profiler/dataset/A1_IgG_H_fw" or sample_path="/pasteur/appa/homes/gmillot/dataset/20210707_AV07016_HAD-III-89_plate3_IgK_sanger_seq". Example with spaces in the path: sample_path="/mnt/x/ROCURONIUM PROJECT/01 Primary data/04.Repertoire analysis/SORT1/SORT1 Seq-original/xlsx_to_fasta_1669018924/All/VL". Example: sample_path = "/mnt/c/Users/gmillot/Documents/Git_projects/repertoire_profiler/dataset/ig_clustering_test_1_VH". Example: sample_path = "https://zenodo.org/records/14500292/files/ig_clustering_test_1_VH.zip"
+    meta_path = "https://zenodo.org/records/14500245/files/metadata.tsv" // single character string of a valid path of a metadata file for adding info to the results. Write "NULL" if no metadata to add. WARNING: the metadata .tsv table must include a first column named "Label" containing sequence names, i.e., the header of some of the fasta files from sample_path, without the ">" of the header. Additionnal columns (quanti or quali) can then be added after the fisrt column to modify the leafs of the tree. For instance: "KD", or Antibody name. Example: meta_path = "/mnt/c/Users/gmillot/Documents/Git_projects/repertoire_profiler/dataset/metadata.tsv". Example: meta_path = "NULL". Example: meta_path = "https://zenodo.org/records/14500245/files/metadata.tsv"
+    meta_name_replacement = "Name" // single character string of the name of a character column of the metadata table indicated in the meta_path parameter. This column will be used to replace the sequence names/IDs (header of the fasta files) by more appropriate names in returned .tsv and .pdf files (but the initial sequence name/ID remains indicated in all .tsv files, in another column). This is convenient to easily identify some Ig of interest in a huge set of Ig. Write "NULL" if not required and if meta_path is not "NULL". Ignored if meta_path = "NULL". Example: meta_name_replacement = "KD". Of note, germ_tree_leaf_size parameter is ignored if meta_name_replacement is a numeric column ,and germ_tree_leaf_shape is ignored if the column is another mode
+}
+
+/*
+#########################################
+##                                     ##
+##     Ig annotation and clustering    ##
+##                                     ##
+#########################################
+*/
+
+env {
    igblast_organism = "mouse" // single character string indicating the organism analyzed. Either "mouse", "human", "rabbit", "rat" or "rhesus_monkey" (value of the --organism option of AssignGenes.py igblast). Example: igblast_organism="human". Example: igblast_organism="mouse"
    igblast_database_path = "germlines/imgt/mouse/vdj" // single character string of the path of the database provided by igblast indicating a folder of fasta files, WITHOUT the last /. Normally, only the organism name should be changed in the path, to be the same as in the igblast_organism parameter. Example: igblast_database_path="germlines/imgt/human/vdj". Example: igblast_database_path="germlines/imgt/mouse/vdj". Warnings (for developers only): (1) see \\wsl$\Ubuntu-20.04\home\gmillot\share for the different possibilities of paths and (2) change this code in the .nf file " MakeDb.py igblast -i \${FILE}_igblast.fmt7 -s ${fs} -r \${REPO_PATH}/imgt_human_IGHV.fasta \${REPO_PATH}/imgt_human_IGHD.fasta \${REPO_PATH}/imgt_human_IGHJ.fasta --extended" if the present path is modified
    igblast_loci = "ig" // single character string of the value of the --loci option of AssignGenes.py igblast. Example: igblast_loci="ig"
@@ -49,9 +63,7 @@ env {
 */

 env {
-    meta_path = "https://zenodo.org/records/14500245/files/metadata.tsv" // single character string of a valid path of a metadata file for adding info to the results. Write "NULL" if no metadata to add. WARNING: the metadata .tsv table must include a first column named "Label" containing sequence names, i.e., the header of some of the fasta files from sample_path, without the ">" of the header. Additionnal columns (quanti or quali) can then be added after the fisrt column to modify the leafs of the tree. For instance: "KD", or Antibody name. Example: meta_path = "/mnt/c/Users/gmillot/Documents/Git_projects/repertoire_profiler/dataset/metadata.tsv". Example: meta_path = "NULL". Example: meta_path = "https://zenodo.org/records/14500245/files/metadata.tsv"
-    meta_name_replacement = "Name" // single character string of the name of the columns of the table indicated in the meta_path parameter. This column will be used to replace the sequence names (header of the fasta files) by more appropriate names. Write "NULL" if not required and if meta_path is not "NULL". Ignored if meta_path = "NULL". Example: meta_name_replacement = "KD". Of note, germ_tree_leaf_size parameter is ignored if meta_name_replacement is a numeric column ,and germ_tree_leaf_shape is ignored if the column is another mode
-    meta_legend = "KD" // single character string of the name of the columns of the table indicated in the meta_path parameter. This column will be used to add a legend in trees (and only in trees), in order to visualize an additionnal parameter like KD, names, etc. Ignored if meta_path = "NULL". Example: meta_legend = "KD". Of note, germ_tree_leaf_size parameter is ignored if meta_legend is a numeric column ,and germ_tree_leaf_shape is ignored if the column is another mode
+    meta_legend = "KD" // single character string of the name of a column of the table indicated in the meta_path parameter. This column will be used to add a legend in trees (and only in trees), in order to visualize an additionnal parameter like KD, names, etc. If a numeric column is indicated, it will be used for leaf size. If a non numeric column is indicated, it will be used for coloring the leafs according to the classes inside the column. Ignored if meta_path = "NULL". Example: meta_legend = "KD". Of note, germ_tree_leaf_size parameter is ignored if meta_legend is a numeric column ,and germ_tree_leaf_shape is ignored if the column is another mode
    germ_tree_kind = "rectangular" // single character string of the kind of tree. Can be "rectangular", "roundrect", "slanted", "ellipse", "circular", "fan", "equal_angle", "daylight". See https://yulab-smu.top/treedata-book/chapter4.html#tree-layouts
    germ_tree_duplicate_seq = "FALSE" // single character string indicating if identical sequences (with difference cell or sequence names) must be removed from trees or not. Either "TRUE" for keeping or "FALSE" for removing
    germ_tree_leaf_color = "NULL" // single character string of the color of leaf tip. Ignored if meta_legend parameter is a name of a non numeric column of the meta_path parameter
@@ -79,22 +91,23 @@ env {
    donut_legend_box_space = "2" // single character string of the space between the legend boxes in mm (numeric value)
    donut_legend_limit = "0.05" // single character string of the classes displayed in the legend for which the corresponding proportion is over the mentionned proportion threshold (positive proportion). Example: donut_legend_limit = 0.4 means that only the sectors over 40% of the donut will be in the legend. Write "NULL" for all the sectors in the legend (no limit required).
    phylo_tree_heavy = "true" // single character string indicating whether the analyzed sequences correspond to heavy chain or light chain (as it impacts alignment options). Either "true" or "false"
-    phylo_tree_model_path = "https://gitlab.pasteur.fr/gmillot/repertoire_profiler/-/tree/master/bin/AB_model" // single character string indicating the path of the evolutionary model/matrix file, dedicated to antibodies (10.1093/molbev/msu340). Example: phylo_tree_model_path = "/mnt/c/Users/gmillot/Documents/Git_projects/repertoire_profiler/bin/AB_model". Example: phylo_tree_model="$baseDir/bin/AB_model". Example: phylo_tree_model="https://gitlab.pasteur.fr/gmillot/repertoire_profiler/-/tree/master/bin/AB_model"
+    phylo_tree_model_path = "https://gitlab.pasteur.fr/gmillot/repertoire_profiler/-/tree/master/bin/AB_model" // single character string indicating the path of the evolutionary model/matrix file, dedicated to antibodies (10.1093/molbev/msu340). Change this parameter value only if you want another model. Example: phylo_tree_model_path = "/mnt/c/Users/gmillot/Documents/Git_projects/repertoire_profiler/bin/AB_model". Example: phylo_tree_model="$baseDir/bin/AB_model". Example: phylo_tree_model="https://gitlab.pasteur.fr/gmillot/repertoire_profiler/-/tree/master/bin/AB_model"
    phylo_tree_itolkey = "eOIzrxSbR2pyDVxMwEGY2g" // single character string indicating the iTOL user api key, tu upload the trees and download the images. Example: phylo_tree_itolkey = "eOIzrxSbR2pyDVxMwEGY2g"
 }


 /*
-############################
-##                        ##
-##     Local / Cluster    ##
-##                        ##
-############################
+######################################
+##                                  ##
+##     Local / Cluster execution    ##
+##                                  ##
+######################################
 */

+apptainer_path = "NULL" // single character string of the path of the apptainer folder (where all the apptainer images are are pulled and stored for proper nextflow execution). You can indicate an empty folder. In that case, docker images will be pulled from dockerhub, converted into apptainer images and stored into this indicted folder for next executions. Warning: Writing "NULL" for default path is possible but will work only if you update cacheDir = '/pasteur/helix/projects/BioIT/gmillot/apptainer' and cacheDir = '/mnt/c/Users/gmillot/apptainer' in the apptainer{} section below. Example: apptainer_path='/pasteur/helix/projects/BioIT/gmillot/apptainer'. Example: apptainer_path='/mnt/c/Users/gmillot/apptainer'. Example: apptainer_path="$projectDir/apptainer" # do not forget double quotes
 // see https://confluence.pasteur.fr/pages/viewpage.action?pageId=69304504
 system_exec = 'local' // single character string of the system that runs the workflow. Either 'local' to run on our own computer or 'slurm' to run on the pasteur cluster. Example: system_exec = 'local'
-simult_jobs = 3000 // number of max simultaneous jobs. This is to avoid to saturated a cluster if millions of jobs in parallel. Write 0 for all the threads. Not used if system_exec is 'local'
+simult_jobs = 3000 // number of max simultaneous jobs. This is to avoid to saturated a cluster if millions of jobs in parallel. Write 0 for all the threads
 queue = 'common,dedicated' // single character string of the -p option of slurm. Example: queue = 'common,dedicated'. Example: queue = 'hubbioit'
 qos = '--qos=ultrafast' // single character string of the --qos option of slurm. Example: qos= '--qos=fast'. Example: qos = '--qos=ultrafast'. Example: qos = '--qos=hubbioit'
 add_options = ' ' // single character string of the additional option of slurm. Example: add_options = '--exclude=maestro-1101,maestro-1034' or add_options = ' ', add_options = '--time=70:00:00' (acceptable time formats include "minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes" and "days-hours:minutes:seconds"). See https://slurm.schedmd.com/sbatch.html#OPT_time for other options
@@ -111,10 +124,8 @@ env{
    cute_path = "https://gitlab.pasteur.fr/gmillot/cute_little_R_functions/-/raw/v12.8/cute_little_R_functions.R" // single character string indicating the file (and absolute pathway) of the required cute_little_R_functions toolbox. With ethernet connection available, this can also be used: "https://gitlab.pasteur.fr/gmillot/cute_little_R_functions/raw/v5.1.0/cute_little_R_functions.R" or local "C:\\Users\\gmillot\\Documents\\Git_projects\\cute_little_R_functions\\cute_little_R_functions.R"
    igphylm_exe_path = "/usr/local/share/igphyml/src/igphyml" // single character string indicating the path of the igphyml exec file. No need to change that path when using the containers defined below. Example: igphylm_exe_path = "/usr/local/share/igphyml/src/igphyml". Example: igphylm_exe_path = "\\\\wsl$\\Ubuntu-20.04\\home\\gmillot\\bin\\igphyml\\src\\igphyml"
 }
-
-apptainer_path = "NULL" // single character string of the path of the apptainer folder (where all the apptainer images are are pulled and stored for proper nextflow execution). Write "NULL" for default path (but will not work in most cases). Example: apptainer_path='/pasteur/helix/projects/BioIT/gmillot/apptainer'. Example: apptainer_path='/mnt/c/Users/gmillot/apptainer'. Example: apptainer_path="$projectDir/apptainer" # do not forget double quotes
 out_path_ini = "$projectDir/results" // single character string of where the output files will be saved. Example out_path_ini = '.' for where the main.nf run is executed or out_path_ini = "$projectDir/results" to put the results in a result folder (created if required), $projectDir indicating where the main.nf run is executed. Example: out_path_ini = '/mnt/c/Users/gmillot/Desktop'. Example : out_path_ini="/pasteur/helix/projects/BioIT/gmillot/08002_bourgeron/results". Warning: this does not work: out_path_ini = "/mnt/share/Users/gmillot/Desktop"
-result_folder_name="repertoire_profiler" // single character string.of the name of the folder where the results files are dorpped
+result_folder_name = "repertoire_profiler" // single character string.of the name of the folder where the results files are dorpped


 /*
@@ -205,8 +216,6 @@ apptainer {
    if(apptainer_path == "NULL"){
        if(system_exec == 'slurm'){
            cacheDir = '/pasteur/helix/projects/BioIT/gmillot/apptainer' // name of the directory where remote Singularity images are stored. When rerun, the exec directly uses these without redownloading them. When using a computing cluster it must be a shared folder accessible to all computing nodes
-        }else if(system_exec == 'slurm_local'){
-            cacheDir = 'apptainer' // "$projectDir/apptainer" can be used but do not forget double quotes.
        }else{
            cacheDir = '/mnt/c/Users/gmillot/apptainer' // "$projectDir/apptainer" can be used but do not forget double quotes.
        }