diff --git a/tools/README.md b/tools/README.md index c1df281bdf8e9bc9159fca5c6384a1fd6195efbd..0dc761d94c1cdd56f7d5cea458053534dfafaadd 100644 --- a/tools/README.md +++ b/tools/README.md @@ -7,14 +7,14 @@ This directory contains all tool descriptions that can be imported and used with All tools are described in a `Snakefile` in a directory having its name. All Snakefiles try to respect some rules and best practices in their design: -* Reference to options from a `config.yaml` file +* Reference to options from a `config_example.yaml` file * Tool description ### Reference to options from `config.yaml` file First part correspond to all options that are set up from a `config.yaml` file. They all have the nomenclature `__TOOLNAME_variable`. -Then in the YAML `config.yaml` file, you can set the variable as followed: +Then in the YAML `config_example.yaml` file, you can set the variable as followed: ```yaml TOOLNAME: @@ -31,17 +31,18 @@ config['TOOLNAME'].get('variable', 1) In order to ease the linking of tools in a workflow, every parts of the tool is described as followed: - * input with the nomenclature: `__TOOLNAME_input` - * output with the nomenclature: ` __TOOLNAME_output` - * params with different options that are described above - * There is usually a `exec_command` to give the possibility to change the way the tool is called (locally installed, singularity ...) - * There is usually a `options` to specify all other command line options. _You can still give a more detailed level of description for the options_ - * the shell command +* input with the nomenclature: `__TOOLNAME_input` +* output with the nomenclature: ` __TOOLNAME_output` +* params with different options that are described above + * There is usually a `exec_command` to give the possibility to change the way the tool is called (locally installed, singularity ...) + * There is usually a `options` to specify all other command line options. _You can still give a more detailed level of description for the options_ +* the shell command input and output are then set up in the workflow `Snakefile` that refer to the rule. Therefore the rules cannot be used directly. +> **Info**: You have noticed the possibility to give `modules` this is dedicated to our HPC that have some tools accessible via `module`. + ## Example -The directory for metaphlan2 rules give some example on the way to call the rules as well as setting -the parameters in the `config.yaml` file. +To find out example, refer to the workflow based on this tool descriptions in the `/workflows/` section. diff --git a/tools/metaphlan2/metaphlan2/paired/Snakefile b/tools/metaphlan2/metaphlan2/paired/Snakefile new file mode 100644 index 0000000000000000000000000000000000000000..0a06bd9882efaa9dcb2bac60fabb84e6b8b78db0 --- /dev/null +++ b/tools/metaphlan2/metaphlan2/paired/Snakefile @@ -0,0 +1,35 @@ +__metaphlan2_exec_command = config.get('metaphlan2', {}).get('exec_command', 'metaphlan2.py') +__metaphlan2_modules = config.get('metaphlan2', {}).get('modules') +__metaphlan2_input_type = config['metaphlan2'].get('input_type', 'fastq') +__metaphlan2_options = config.get('metaphlan2', {}).get('options', "") +__metaphlan2_threads = config.get('metaphlan2', {}).get('threads', 1) + + +rule metaphlan2_paired: + """ + MetaPhlAn 2 can also natively handle paired-end metagenomes (but does not use the paired-end information), + and, more generally, metagenomes stored in multiple files (but you need to specify the --bowtie2out parameter): + + $ metaphlan2.py metagenome_1.fastq,metagenome_2.fastq --bowtie2out metagenome.bowtie2.bz2 --nproc 5 + --input_type fastq > profiled_metagenome.txt + + """ + input: + r1 = __metaphlan2_input_r1, + r2 = __metaphlan2_input_r2 + output: + __metaphlan2_output + params: + exec_command = __metaphlan2_exec_command, + modules = __metaphlan2_modules, + input_type = __metaphlan2_input_type, + bowtie2out = "{output_dir}/{sample}.bowtie2.bz2".format(output_dir=__metaphlan2_output_dir, sample="{sample}"), + options = __metaphlan2_options + threads: + __metaphlan2_threads + run: + command = [] + if params.modules: + command.append("module load {params.modules}") + command.append("{params.exec_command} --nproc {threads} --input_type {params.input_type} --bowtie2out {params.bowtie2out} {params.options} {input.r1},{input.r2} {output}") + shell(" && ".join(command)) diff --git a/tools/metaphlan2/metaphlan2/paired/config_example.yaml b/tools/metaphlan2/metaphlan2/paired/config_example.yaml new file mode 100644 index 0000000000000000000000000000000000000000..a66651100333ccdd6760b1002dcf3c5fca75ede0 --- /dev/null +++ b/tools/metaphlan2/metaphlan2/paired/config_example.yaml @@ -0,0 +1,8 @@ +input_dir: data + +metaphlan2: + threads: 1 + input_type: fastq + options: "" + pair_suffix: "" + exec_command: metaphlan2.py diff --git a/tools/metaphlan2/metaphlan2/Snakefile b/tools/metaphlan2/metaphlan2/single/Snakefile similarity index 73% rename from tools/metaphlan2/metaphlan2/Snakefile rename to tools/metaphlan2/metaphlan2/single/Snakefile index 4203f1303248d7359af40c2f4a0a43654de3df0a..74fa8fbfcbdd7dcd58b3e19ae3947a9aa2c6209d 100644 --- a/tools/metaphlan2/metaphlan2/Snakefile +++ b/tools/metaphlan2/metaphlan2/single/Snakefile @@ -1,8 +1,10 @@ __metaphlan2_exec_command = config.get('metaphlan2', {}).get('exec_command', 'metaphlan2.py') __metaphlan2_modules = config.get('metaphlan2', {}).get('modules') +__metaphlan2_input_type = config['metaphlan2'].get('input_type', 'fastq') __metaphlan2_options = config.get('metaphlan2', {}).get('options', "") __metaphlan2_threads = config.get('metaphlan2', {}).get('threads', 1) + rule metaphlan2: input: __metaphlan2_input @@ -12,6 +14,7 @@ rule metaphlan2: exec_command = __metaphlan2_exec_command, modules = __metaphlan2_modules, input_type = __metaphlan2_input_type, + bowtie2out = "{output_dir}/{sample}.bowtie2.bz2".format(output_dir=__metaphlan2_output_dir, sample="{sample}"), options = __metaphlan2_options threads: __metaphlan2_threads @@ -19,5 +22,5 @@ rule metaphlan2: command = [] if params.modules: command.append("module load {params.modules}") - command.append("{params.exec_command} --nproc {threads} --input_type {params.input_type} {params.options} {input} {output}") + command.append("{params.exec_command} --nproc {threads} --input_type {params.input_type} --bowtie2out {params.bowtie2out} {params.options} {input} {output}") shell(" && ".join(command)) diff --git a/tools/metaphlan2/metaphlan2/config.yaml b/tools/metaphlan2/metaphlan2/single/config_example.yaml similarity index 100% rename from tools/metaphlan2/metaphlan2/config.yaml rename to tools/metaphlan2/metaphlan2/single/config_example.yaml diff --git a/tools/metaphlan2/metaphlan2/usage_example.rules b/tools/metaphlan2/metaphlan2/usage_example.rules deleted file mode 100644 index b4e995a389fb71d7334433f8d9233c14ae38c96a..0000000000000000000000000000000000000000 --- a/tools/metaphlan2/metaphlan2/usage_example.rules +++ /dev/null @@ -1,26 +0,0 @@ -""" -This example would be used as followed: - -$ snakemake --snakefile metaphlan2.rules output/s01.txt - -It requires the presence of data/s01.fastq.gz to work based on config file -""" - -configfile: "config.yaml" - -__input_dir = config['input_dir'] -__main_output_dir = config.get('output_dir', 'output') - -# ---- Metaphlan2 -__metaphlan2_output_dir = __main_output_dir + "/metaphlan2" -__metaphlan2_input_type = config['metaphlan2'].get('input_type', 'fastq') -__metaphlan2_input = "{dir}/{sample}.{ext}".format(dir=__input_dir, - sample="{sample}", - ext=__metaphlan2_input_type + ".gz") -__metaphlan2_output = "{dir}/{sample}.txt".format(dir=__metaphlan2_output_dir, - sample="{sample}") -include: "Snakefile" - -rule all: - input: - __metaphlan2_output diff --git a/workflows/README.md b/workflows/README.md index e51a50f8c142c65b283372d31037c807dd204765..c2586a62e7568646bcf7d3239242073040f88165 100644 --- a/workflows/README.md +++ b/workflows/README.md @@ -13,6 +13,8 @@ Therefore, workflows have the following parts: 4. Set up of specific variables for the tools 5. `rule all` to specify what is/are the specific file(s) expected from the workflow. +> **Note**: These workflows are not made to be fully portable and reusable but at least give around 80% of the work done. You might need to do minor modifications. The idea being to have a copy of `Snakefile` workflow with its dedicated `config.yaml` file for each experiments. + ### Reference to options from `config.yaml` file First part correspond to all options that are set up from a `config.yaml` file. They all have the nomenclature `__TOOLNAME_variable`. diff --git a/workflows/metaphlan2/README.md b/workflows/metaphlan2/README.md new file mode 100644 index 0000000000000000000000000000000000000000..9535edaf2fde3aff6cf142a43553df7579822ad2 --- /dev/null +++ b/workflows/metaphlan2/README.md @@ -0,0 +1,8 @@ +# Simple metaphlan2 workflows + +Workflows using metaphlan2 and simple visualization of the results. + +All examples presented were made for our TARS cluster system. This means you will be likely to find some +absolute path into the `config.yaml` that you might not have access to. + +For every workflow, an example is provided and is based on the `config.yaml` file. Singularity images are necessary for these examples. diff --git a/workflows/metaphlan2/paired_metaphlan2/Snakefile b/workflows/metaphlan2/paired_metaphlan2/Snakefile new file mode 100644 index 0000000000000000000000000000000000000000..e8785568835c18fea5fc31f77b434f4a86e0866b --- /dev/null +++ b/workflows/metaphlan2/paired_metaphlan2/Snakefile @@ -0,0 +1,58 @@ +configfile: "config.yaml" + +# ==== Snakefile paths ==== +__metaphlan2_rules = config.get("snakefiles", {}).get("metaphlan2", "../../tools/metaphlan2/metaphlan2/Snakefile") +__metaphlan2_merge_rules = config.get("snakefiles", {}).get("metaphlan2_merge", "../../tools/metaphlan2/metaphlan2_merge/Snakefile") +__metaphlan2_heatmap_rules = config.get("snakefiles", {}).get("metaphlan2_heatmap", "../../tools/metaphlan2/metaphlan2_heatmap/Snakefile") +__graphlan_from_metaphlan2_rules = config.get("snakefiles", {}).get("graphlan_from_metaphlan2", "../subworkflows/graphlan_from_metaphlan2/Snakefile") + +__input_dir = config['input_dir'] +__main_output_dir = config.get('output_dir', 'output') + +# ---- Metaphlan2 +__metaphlan2_suffix = config['metaphlan2'].get('pair_suffix', '') +__metaphlan2_output_dir = __main_output_dir + "/metaphlan2" +__metaphlan2_input_type = config['metaphlan2'].get('input_type', 'fastq') +__metaphlan2_input_r1 = "{dir}/{sample}_R1{suffix}.{ext}".format(dir=__input_dir, + sample="{sample}", + suffix=__metaphlan2_suffix, + ext=__metaphlan2_input_type + ".gz") +__metaphlan2_input_r2 = "{dir}/{sample}_R2{suffix}.{ext}".format(dir=__input_dir, + sample="{sample}", + suffix=__metaphlan2_suffix, + ext=__metaphlan2_input_type + ".gz") +__metaphlan2_output = "{dir}/{sample}.txt".format(dir=__metaphlan2_output_dir, + sample="{sample}") +include: __metaphlan2_rules + +# ---- Metaphlan2 merge +__metaphlan2_merge_output_dir = __main_output_dir + "/metaphlan2_merge" +__metaphlan2_merge_output_file_name = config['metaphlan2_merge'].get('output_file_name',"merged_taxonomic_profiles.txt") +__metaphlan2_merge_input = expand("{dir}/{sample}.txt".format(dir=__metaphlan2_output_dir, + sample="{sample}"), + sample=config['samples']) +__metaphlan2_merge_output = "{dir}/{file_name}".format(dir=__metaphlan2_merge_output_dir, + file_name=__metaphlan2_merge_output_file_name) +include: __metaphlan2_merge_rules + +# ---- Metaphlan2 heatmap +__metaphlan2_heatmap_output_dir = __main_output_dir + "/metaphlan2_heatmap" +__metaphlan2_heatmap_output_file_name = config['metaphlan2_heatmap'].get('output_name',"heatmap.png") +__metaphlan2_heatmap_input = __metaphlan2_merge_output +__metaphlan2_heatmap_output = "{dir}/{file_name}".format(dir=__metaphlan2_heatmap_output_dir, + file_name=__metaphlan2_heatmap_output_file_name) +include: __metaphlan2_heatmap_rules + +# ---- Graphlan Dendogram +__graphlan_from_metaphlan2_output_dir = __main_output_dir + "/graphlan" +__graphlan_from_metaphlan2_output_file_name = config.get("graphlan_from_metaphlan2", {}).get('output_name',"dendogram.png") +__graphlan_from_metaphlan2_input = __metaphlan2_merge_output +__graphlan_from_metaphlan2_output = "{dir}/{file_name}".format(dir=__graphlan_from_metaphlan2_output_dir, + file_name=__graphlan_from_metaphlan2_output_file_name) +include: __graphlan_from_metaphlan2_rules + +rule all: + input: + heatmap = __metaphlan2_heatmap_output, + dendogram = __graphlan_from_metaphlan2_output + diff --git a/workflows/metaphlan2/paired_metaphlan2/config.yaml b/workflows/metaphlan2/paired_metaphlan2/config.yaml new file mode 100644 index 0000000000000000000000000000000000000000..21b4cbde2f02258dc4727f8c7fb95bf64978f675 --- /dev/null +++ b/workflows/metaphlan2/paired_metaphlan2/config.yaml @@ -0,0 +1,44 @@ +snakefiles: + metaphlan2: /pasteur/projets/policy01/Atm/snakemake/tools/metaphlan2/metaphlan2/paired/Snakefile + metaphlan2_merge: /pasteur/projets/policy01/Atm/snakemake/tools/metaphlan2/metaphlan2_merge/Snakefile + metaphlan2_heatmap: /pasteur/projets/policy01/Atm/snakemake/tools/metaphlan2/metaphlan2_heatmap/Snakefile + graphlan_from_metaphlan2: /pasteur/projets/policy01/Atm/snakemake/subworkflows/graphlan_from_metaphlan2/Snakefile + +samples: + - sample_1 + - sample_2 + - sample_2 + +input_dir: /a/path/to/input/data +output_dir: metaphlan2_output + +metaphlan2: + modules: singularity + threads: 4 + input_type: fastq + pair_suffix: "_001" + options: --bowtie2db /pasteur/gaia/projets/p01/Atm/DBs/bowtie2/metaphlan2/ + exec_command: singularity exec --bind /pasteur/ /pasteur/gaia/projets/p01/Atm/singularity/tools/metaphlan2/from_docker/metaphlan2_2.7.7_s3.2.1.simg metaphlan2.py + +metaphlan2_merge: + modules: singularity + exec_command: singularity exec --bind /pasteur/ /pasteur/gaia/projets/p01/Atm/singularity/tools/metaphlan2/from_docker/metaphlan2_2.7.7_s3.2.1.simg merge_metaphlan_tables.py + +metaphlan2_heatmap: + modules: singularity + exec_command: singularity exec --bind /pasteur/ /pasteur/gaia/projets/p01/Atm/singularity/tools/metaphlan2/from_docker/metaphlan2_2.6.0_s3.2.1.simg metaphlan_hclust_heatmap.py + output_name: snakemake_heatmap.png + +export2graphlan: + modules: singularity + exec_command: singularity exec --bind /pasteur/ /pasteur/gaia/projets/p01/Atm/singularity/tools/graphlan/from_docker/graphlan_0.9.7_s3.2.1.simg export2graphlan.py + options: "--skip_rows 1,2 --most_abundant 100 --abundance_threshold 1 --least_biomarkers 10 --annotations 5,6 --external_annotations 7 --min_clade_size 1" + +graphlan_annotate: + modules: singularity + exec_command: singularity exec --bind /pasteur/ /pasteur/gaia/projets/p01/Atm/singularity/tools/graphlan/from_docker/graphlan_0.9.7_s3.2.1.simg graphlan_annotate.py + +graphlan: + modules: singularity + exec_command: singularity exec --bind /pasteur/ /pasteur/gaia/projets/p01/Atm/singularity/tools/graphlan/from_docker/graphlan_0.9.7_s3.2.1.simg graphlan.py + options: "--dpi 300 --external_legends" diff --git a/workflows/simple_metaphlan2/Snakefile b/workflows/metaphlan2/single_metaphlan2/Snakefile similarity index 90% rename from workflows/simple_metaphlan2/Snakefile rename to workflows/metaphlan2/single_metaphlan2/Snakefile index ba2059a1a193662baeb75a87c38a1cea97ffa8ee..413024e674a1c41c2a710827e289c6ee554931fe 100644 --- a/workflows/simple_metaphlan2/Snakefile +++ b/workflows/metaphlan2/single_metaphlan2/Snakefile @@ -1,10 +1,10 @@ configfile: "config.yaml" # ==== Snakefile paths ==== -__metaphlan2_rules = config.get("snakefiles", {}).get("metaphlan2", "../../tools/metaphlan2/metaphlan2/Snakefile") -__metaphlan2_merge_rules = config.get("snakefiles", {}).get("metaphlan2_merge", "../../tools/metaphlan2/metaphlan2_merge/Snakefile") -__metaphlan2_heatmap_rules = config.get("snakefiles", {}).get("metaphlan2_heatmap", "../../tools/metaphlan2/metaphlan2_heatmap/Snakefile") -__graphlan_from_metaphlan2_rules = config.get("snakefiles", {}).get("graphlan_from_metaphlan2", "../subworkflows/graphlan_from_metaphlan2/Snakefile") +__metaphlan2_rules = config.get("snakefiles", {}).get("metaphlan2") +__metaphlan2_merge_rules = config.get("snakefiles", {}).get("metaphlan2_merge") +__metaphlan2_heatmap_rules = config.get("snakefiles", {}).get("metaphlan2_heatmap") +__graphlan_from_metaphlan2_rules = config.get("snakefiles", {}).get("graphlan_from_metaphlan2") __input_dir = config['input_dir'] __main_output_dir = config.get('output_dir', 'output') diff --git a/workflows/metaphlan2/single_metaphlan2/config.yaml b/workflows/metaphlan2/single_metaphlan2/config.yaml new file mode 100644 index 0000000000000000000000000000000000000000..6d7b5bf0d573cee970a0122ca45ad7b0ea941363 --- /dev/null +++ b/workflows/metaphlan2/single_metaphlan2/config.yaml @@ -0,0 +1,43 @@ +snakefiles: + metaphlan2: /pasteur/projets/policy01/Atm/snakemake/tools/metaphlan2/metaphlan2/single/Snakefile + metaphlan2_merge: /pasteur/projets/policy01/Atm/snakemake/tools/metaphlan2/metaphlan2_merge/Snakefile + metaphlan2_heatmap: /pasteur/projets/policy01/Atm/snakemake/tools/metaphlan2/metaphlan2_heatmap/Snakefile + graphlan_from_metaphlan2: /pasteur/projets/policy01/Atm/snakemake/subworkflows/graphlan_from_metaphlan2/Snakefile + +samples: + - sample_1 + - sample_2 + - sample_2 + +input_dir: /a/path/to/input/data +output_dir: metaphlan2_output + +metaphlan2: + modules: singularity + threads: 4 + input_type: fastq + options: --bowtie2db /pasteur/gaia/projets/p01/Atm/DBs/bowtie2/metaphlan2/ + exec_command: singularity exec --bind /pasteur/ /pasteur/gaia/projets/p01/Atm/singularity/tools/metaphlan2/from_docker/metaphlan2_2.7.7_s3.2.1.simg metaphlan2.py + +metaphlan2_merge: + modules: singularity + exec_command: singularity exec --bind /pasteur/ /pasteur/gaia/projets/p01/Atm/singularity/tools/metaphlan2/from_docker/metaphlan2_2.7.7_s3.2.1.simg merge_metaphlan_tables.py + +metaphlan2_heatmap: + modules: singularity + exec_command: singularity exec --bind /pasteur/ /pasteur/gaia/projets/p01/Atm/singularity/tools/metaphlan2/from_docker/metaphlan2_2.6.0_s3.2.1.simg metaphlan_hclust_heatmap.py + output_name: snakemake_heatmap.png + +export2graphlan: + modules: singularity + exec_command: singularity exec --bind /pasteur/ /pasteur/gaia/projets/p01/Atm/singularity/tools/graphlan/from_docker/graphlan_0.9.7_s3.2.1.simg export2graphlan.py + options: "--skip_rows 1,2 --most_abundant 100 --abundance_threshold 1 --least_biomarkers 10 --annotations 5,6 --external_annotations 7 --min_clade_size 1" + +graphlan_annotate: + modules: singularity + exec_command: singularity exec --bind /pasteur/ /pasteur/gaia/projets/p01/Atm/singularity/tools/graphlan/from_docker/graphlan_0.9.7_s3.2.1.simg graphlan_annotate.py + +graphlan: + modules: singularity + exec_command: singularity exec --bind /pasteur/ /pasteur/gaia/projets/p01/Atm/singularity/tools/graphlan/from_docker/graphlan_0.9.7_s3.2.1.simg graphlan.py + options: "--dpi 300 --external_legends" diff --git a/workflows/simple_metaphlan2/README.md b/workflows/simple_metaphlan2/README.md deleted file mode 100644 index 9014992dbf8ebef87732069302c816f3f37f152a..0000000000000000000000000000000000000000 --- a/workflows/simple_metaphlan2/README.md +++ /dev/null @@ -1,7 +0,0 @@ -# Simple metaphlan2 workflow - -This is a simple example of a workflow using metaphlan2. - -## Example - -An example is provided and is based on the `config.yaml` file. Singularity images are necessary for this example as well as the samples in the appropriate folder. \ No newline at end of file diff --git a/workflows/simple_metaphlan2/config.yaml b/workflows/simple_metaphlan2/config.yaml deleted file mode 100644 index c8c0b9f99ca675a715e49aacbdc18dd8286bdfda..0000000000000000000000000000000000000000 --- a/workflows/simple_metaphlan2/config.yaml +++ /dev/null @@ -1,33 +0,0 @@ -samples: - - SRS014459-Stool - - SRS014464-Anterior_nares - - SRS014470-Tongue_dorsum - - SRS014472-Buccal_mucosa - - SRS014476-Supragingival_plaque - - SRS014494-Posterior_fornix - -input_dir: /pasteur/homes/kehillio/Atm/kenzo/repo/workflow-benchmarking/data -output_dir: 20190509_simplecase - -metaphlan2: - threads: 4 - input_type: fasta - exec_command: singularity run --bind /pasteur/ /pasteur/gaia/projets/p01/Atm/singularity/metaphlan2/metaphlan2.simg - -metaphlan2_merge: - exec_command: singularity run --app merge --bind /pasteur/ /pasteur/gaia/projets/p01/Atm/singularity/metaphlan2/metaphlan2.simg - -metaphlan2_heatmap: - exec_command: singularity run --app heatmap --bind /pasteur/ /pasteur/gaia/projets/p01/Atm/singularity/metaphlan2/metaphlan2.simg - output_name: snakemake_heatmap.png - -export2graphlan: - exec_command: singularity run --app export2graphlan --bind /pasteur/ /pasteur/gaia/projets/p01/Atm/singularity/graphlan/graphlan.simg - options: "--skip_rows 1,2 --most_abundant 100 --abundance_threshold 1 --least_biomarkers 10 --annotations 5,6 --external_annotations 7 --min_clade_size 1" - -graphlan_annotate: - exec_command: singularity run --app graphlan_annotate --bind /pasteur/ /pasteur/gaia/projets/p01/Atm/singularity/graphlan/graphlan.simg - -graphlan: - exec_command: singularity run --bind /pasteur/ /pasteur/gaia/projets/p01/Atm/singularity/graphlan/graphlan.simg - options: "--dpi 300 --external_legends" \ No newline at end of file