Commit f485ebed authored by Kenzo-Hugo Hillion's avatar Kenzo-Hugo Hillion
Browse files

Merge branch '14-metaphlan2-paired' into 'master'

Metaphlan2 to use paired-end reads

Closes #14

See merge request !1
parents 727b51e0 b44971f1
......@@ -7,14 +7,14 @@ This directory contains all tool descriptions that can be imported and used with
All tools are described in a `Snakefile` in a directory having its name.
All Snakefiles try to respect some rules and best practices in their design:
* Reference to options from a `config.yaml` file
* Reference to options from a `config_example.yaml` file
* Tool description
### Reference to options from `config.yaml` file
First part correspond to all options that are set up from a `config.yaml` file. They all have the nomenclature `__TOOLNAME_variable`.
Then in the YAML `config.yaml` file, you can set the variable as followed:
Then in the YAML `config_example.yaml` file, you can set the variable as followed:
```yaml
TOOLNAME:
......@@ -31,17 +31,18 @@ config['TOOLNAME'].get('variable', 1)
In order to ease the linking of tools in a workflow, every parts of the tool is described as followed:
* input with the nomenclature: `__TOOLNAME_input`
* output with the nomenclature: ` __TOOLNAME_output`
* params with different options that are described above
* There is usually a `exec_command` to give the possibility to change the way the tool is called (locally installed, singularity ...)
* There is usually a `options` to specify all other command line options. _You can still give a more detailed level of description for the options_
* the shell command
* input with the nomenclature: `__TOOLNAME_input`
* output with the nomenclature: ` __TOOLNAME_output`
* params with different options that are described above
* There is usually a `exec_command` to give the possibility to change the way the tool is called (locally installed, singularity ...)
* There is usually a `options` to specify all other command line options. _You can still give a more detailed level of description for the options_
* the shell command
input and output are then set up in the workflow `Snakefile` that refer to the rule.
Therefore the rules cannot be used directly.
> **Info**: You have noticed the possibility to give `modules` this is dedicated to our HPC that have some tools accessible via `module`.
## Example
The directory for metaphlan2 rules give some example on the way to call the rules as well as setting
the parameters in the `config.yaml` file.
To find out example, refer to the workflow based on this tool descriptions in the `/workflows/` section.
__metaphlan2_exec_command = config.get('metaphlan2', {}).get('exec_command', 'metaphlan2.py')
__metaphlan2_modules = config.get('metaphlan2', {}).get('modules')
__metaphlan2_input_type = config['metaphlan2'].get('input_type', 'fastq')
__metaphlan2_options = config.get('metaphlan2', {}).get('options', "")
__metaphlan2_threads = config.get('metaphlan2', {}).get('threads', 1)
rule metaphlan2_paired:
"""
MetaPhlAn 2 can also natively handle paired-end metagenomes (but does not use the paired-end information),
and, more generally, metagenomes stored in multiple files (but you need to specify the --bowtie2out parameter):
$ metaphlan2.py metagenome_1.fastq,metagenome_2.fastq --bowtie2out metagenome.bowtie2.bz2 --nproc 5
--input_type fastq > profiled_metagenome.txt
"""
input:
r1 = __metaphlan2_input_r1,
r2 = __metaphlan2_input_r2
output:
__metaphlan2_output
params:
exec_command = __metaphlan2_exec_command,
modules = __metaphlan2_modules,
input_type = __metaphlan2_input_type,
bowtie2out = "{output_dir}/{sample}.bowtie2.bz2".format(output_dir=__metaphlan2_output_dir, sample="{sample}"),
options = __metaphlan2_options
threads:
__metaphlan2_threads
run:
command = []
if params.modules:
command.append("module load {params.modules}")
command.append("{params.exec_command} --nproc {threads} --input_type {params.input_type} --bowtie2out {params.bowtie2out} {params.options} {input.r1},{input.r2} {output}")
shell(" && ".join(command))
input_dir: data
metaphlan2:
threads: 1
input_type: fastq
options: ""
pair_suffix: ""
exec_command: metaphlan2.py
__metaphlan2_exec_command = config.get('metaphlan2', {}).get('exec_command', 'metaphlan2.py')
__metaphlan2_modules = config.get('metaphlan2', {}).get('modules')
__metaphlan2_input_type = config['metaphlan2'].get('input_type', 'fastq')
__metaphlan2_options = config.get('metaphlan2', {}).get('options', "")
__metaphlan2_threads = config.get('metaphlan2', {}).get('threads', 1)
rule metaphlan2:
input:
__metaphlan2_input
......@@ -12,6 +14,7 @@ rule metaphlan2:
exec_command = __metaphlan2_exec_command,
modules = __metaphlan2_modules,
input_type = __metaphlan2_input_type,
bowtie2out = "{output_dir}/{sample}.bowtie2.bz2".format(output_dir=__metaphlan2_output_dir, sample="{sample}"),
options = __metaphlan2_options
threads:
__metaphlan2_threads
......@@ -19,5 +22,5 @@ rule metaphlan2:
command = []
if params.modules:
command.append("module load {params.modules}")
command.append("{params.exec_command} --nproc {threads} --input_type {params.input_type} {params.options} {input} {output}")
command.append("{params.exec_command} --nproc {threads} --input_type {params.input_type} --bowtie2out {params.bowtie2out} {params.options} {input} {output}")
shell(" && ".join(command))
"""
This example would be used as followed:
$ snakemake --snakefile metaphlan2.rules output/s01.txt
It requires the presence of data/s01.fastq.gz to work based on config file
"""
configfile: "config.yaml"
__input_dir = config['input_dir']
__main_output_dir = config.get('output_dir', 'output')
# ---- Metaphlan2
__metaphlan2_output_dir = __main_output_dir + "/metaphlan2"
__metaphlan2_input_type = config['metaphlan2'].get('input_type', 'fastq')
__metaphlan2_input = "{dir}/{sample}.{ext}".format(dir=__input_dir,
sample="{sample}",
ext=__metaphlan2_input_type + ".gz")
__metaphlan2_output = "{dir}/{sample}.txt".format(dir=__metaphlan2_output_dir,
sample="{sample}")
include: "Snakefile"
rule all:
input:
__metaphlan2_output
......@@ -13,6 +13,8 @@ Therefore, workflows have the following parts:
4. Set up of specific variables for the tools
5. `rule all` to specify what is/are the specific file(s) expected from the workflow.
> **Note**: These workflows are not made to be fully portable and reusable but at least give around 80% of the work done. You might need to do minor modifications. The idea being to have a copy of `Snakefile` workflow with its dedicated `config.yaml` file for each experiments.
### Reference to options from `config.yaml` file
First part correspond to all options that are set up from a `config.yaml` file. They all have the nomenclature `__TOOLNAME_variable`.
......
# Simple metaphlan2 workflows
Workflows using metaphlan2 and simple visualization of the results.
All examples presented were made for our TARS cluster system. This means you will be likely to find some
absolute path into the `config.yaml` that you might not have access to.
For every workflow, an example is provided and is based on the `config.yaml` file. Singularity images are necessary for these examples.
configfile: "config.yaml"
# ==== Snakefile paths ====
__metaphlan2_rules = config.get("snakefiles", {}).get("metaphlan2", "../../tools/metaphlan2/metaphlan2/Snakefile")
__metaphlan2_merge_rules = config.get("snakefiles", {}).get("metaphlan2_merge", "../../tools/metaphlan2/metaphlan2_merge/Snakefile")
__metaphlan2_heatmap_rules = config.get("snakefiles", {}).get("metaphlan2_heatmap", "../../tools/metaphlan2/metaphlan2_heatmap/Snakefile")
__graphlan_from_metaphlan2_rules = config.get("snakefiles", {}).get("graphlan_from_metaphlan2", "../subworkflows/graphlan_from_metaphlan2/Snakefile")
__input_dir = config['input_dir']
__main_output_dir = config.get('output_dir', 'output')
# ---- Metaphlan2
__metaphlan2_suffix = config['metaphlan2'].get('pair_suffix', '')
__metaphlan2_output_dir = __main_output_dir + "/metaphlan2"
__metaphlan2_input_type = config['metaphlan2'].get('input_type', 'fastq')
__metaphlan2_input_r1 = "{dir}/{sample}_R1{suffix}.{ext}".format(dir=__input_dir,
sample="{sample}",
suffix=__metaphlan2_suffix,
ext=__metaphlan2_input_type + ".gz")
__metaphlan2_input_r2 = "{dir}/{sample}_R2{suffix}.{ext}".format(dir=__input_dir,
sample="{sample}",
suffix=__metaphlan2_suffix,
ext=__metaphlan2_input_type + ".gz")
__metaphlan2_output = "{dir}/{sample}.txt".format(dir=__metaphlan2_output_dir,
sample="{sample}")
include: __metaphlan2_rules
# ---- Metaphlan2 merge
__metaphlan2_merge_output_dir = __main_output_dir + "/metaphlan2_merge"
__metaphlan2_merge_output_file_name = config['metaphlan2_merge'].get('output_file_name',"merged_taxonomic_profiles.txt")
__metaphlan2_merge_input = expand("{dir}/{sample}.txt".format(dir=__metaphlan2_output_dir,
sample="{sample}"),
sample=config['samples'])
__metaphlan2_merge_output = "{dir}/{file_name}".format(dir=__metaphlan2_merge_output_dir,
file_name=__metaphlan2_merge_output_file_name)
include: __metaphlan2_merge_rules
# ---- Metaphlan2 heatmap
__metaphlan2_heatmap_output_dir = __main_output_dir + "/metaphlan2_heatmap"
__metaphlan2_heatmap_output_file_name = config['metaphlan2_heatmap'].get('output_name',"heatmap.png")
__metaphlan2_heatmap_input = __metaphlan2_merge_output
__metaphlan2_heatmap_output = "{dir}/{file_name}".format(dir=__metaphlan2_heatmap_output_dir,
file_name=__metaphlan2_heatmap_output_file_name)
include: __metaphlan2_heatmap_rules
# ---- Graphlan Dendogram
__graphlan_from_metaphlan2_output_dir = __main_output_dir + "/graphlan"
__graphlan_from_metaphlan2_output_file_name = config.get("graphlan_from_metaphlan2", {}).get('output_name',"dendogram.png")
__graphlan_from_metaphlan2_input = __metaphlan2_merge_output
__graphlan_from_metaphlan2_output = "{dir}/{file_name}".format(dir=__graphlan_from_metaphlan2_output_dir,
file_name=__graphlan_from_metaphlan2_output_file_name)
include: __graphlan_from_metaphlan2_rules
rule all:
input:
heatmap = __metaphlan2_heatmap_output,
dendogram = __graphlan_from_metaphlan2_output
snakefiles:
metaphlan2: /pasteur/projets/policy01/Atm/snakemake/tools/metaphlan2/metaphlan2/paired/Snakefile
metaphlan2_merge: /pasteur/projets/policy01/Atm/snakemake/tools/metaphlan2/metaphlan2_merge/Snakefile
metaphlan2_heatmap: /pasteur/projets/policy01/Atm/snakemake/tools/metaphlan2/metaphlan2_heatmap/Snakefile
graphlan_from_metaphlan2: /pasteur/projets/policy01/Atm/snakemake/subworkflows/graphlan_from_metaphlan2/Snakefile
samples:
- sample_1
- sample_2
- sample_2
input_dir: /a/path/to/input/data
output_dir: metaphlan2_output
metaphlan2:
modules: singularity
threads: 4
input_type: fastq
pair_suffix: "_001"
options: --bowtie2db /pasteur/gaia/projets/p01/Atm/DBs/bowtie2/metaphlan2/
exec_command: singularity exec --bind /pasteur/ /pasteur/gaia/projets/p01/Atm/singularity/tools/metaphlan2/from_docker/metaphlan2_2.7.7_s3.2.1.simg metaphlan2.py
metaphlan2_merge:
modules: singularity
exec_command: singularity exec --bind /pasteur/ /pasteur/gaia/projets/p01/Atm/singularity/tools/metaphlan2/from_docker/metaphlan2_2.7.7_s3.2.1.simg merge_metaphlan_tables.py
metaphlan2_heatmap:
modules: singularity
exec_command: singularity exec --bind /pasteur/ /pasteur/gaia/projets/p01/Atm/singularity/tools/metaphlan2/from_docker/metaphlan2_2.6.0_s3.2.1.simg metaphlan_hclust_heatmap.py
output_name: snakemake_heatmap.png
export2graphlan:
modules: singularity
exec_command: singularity exec --bind /pasteur/ /pasteur/gaia/projets/p01/Atm/singularity/tools/graphlan/from_docker/graphlan_0.9.7_s3.2.1.simg export2graphlan.py
options: "--skip_rows 1,2 --most_abundant 100 --abundance_threshold 1 --least_biomarkers 10 --annotations 5,6 --external_annotations 7 --min_clade_size 1"
graphlan_annotate:
modules: singularity
exec_command: singularity exec --bind /pasteur/ /pasteur/gaia/projets/p01/Atm/singularity/tools/graphlan/from_docker/graphlan_0.9.7_s3.2.1.simg graphlan_annotate.py
graphlan:
modules: singularity
exec_command: singularity exec --bind /pasteur/ /pasteur/gaia/projets/p01/Atm/singularity/tools/graphlan/from_docker/graphlan_0.9.7_s3.2.1.simg graphlan.py
options: "--dpi 300 --external_legends"
configfile: "config.yaml"
# ==== Snakefile paths ====
__metaphlan2_rules = config.get("snakefiles", {}).get("metaphlan2", "../../tools/metaphlan2/metaphlan2/Snakefile")
__metaphlan2_merge_rules = config.get("snakefiles", {}).get("metaphlan2_merge", "../../tools/metaphlan2/metaphlan2_merge/Snakefile")
__metaphlan2_heatmap_rules = config.get("snakefiles", {}).get("metaphlan2_heatmap", "../../tools/metaphlan2/metaphlan2_heatmap/Snakefile")
__graphlan_from_metaphlan2_rules = config.get("snakefiles", {}).get("graphlan_from_metaphlan2", "../subworkflows/graphlan_from_metaphlan2/Snakefile")
__metaphlan2_rules = config.get("snakefiles", {}).get("metaphlan2")
__metaphlan2_merge_rules = config.get("snakefiles", {}).get("metaphlan2_merge")
__metaphlan2_heatmap_rules = config.get("snakefiles", {}).get("metaphlan2_heatmap")
__graphlan_from_metaphlan2_rules = config.get("snakefiles", {}).get("graphlan_from_metaphlan2")
__input_dir = config['input_dir']
__main_output_dir = config.get('output_dir', 'output')
......
snakefiles:
metaphlan2: /pasteur/projets/policy01/Atm/snakemake/tools/metaphlan2/metaphlan2/single/Snakefile
metaphlan2_merge: /pasteur/projets/policy01/Atm/snakemake/tools/metaphlan2/metaphlan2_merge/Snakefile
metaphlan2_heatmap: /pasteur/projets/policy01/Atm/snakemake/tools/metaphlan2/metaphlan2_heatmap/Snakefile
graphlan_from_metaphlan2: /pasteur/projets/policy01/Atm/snakemake/subworkflows/graphlan_from_metaphlan2/Snakefile
samples:
- sample_1
- sample_2
- sample_2
input_dir: /a/path/to/input/data
output_dir: metaphlan2_output
metaphlan2:
modules: singularity
threads: 4
input_type: fastq
options: --bowtie2db /pasteur/gaia/projets/p01/Atm/DBs/bowtie2/metaphlan2/
exec_command: singularity exec --bind /pasteur/ /pasteur/gaia/projets/p01/Atm/singularity/tools/metaphlan2/from_docker/metaphlan2_2.7.7_s3.2.1.simg metaphlan2.py
metaphlan2_merge:
modules: singularity
exec_command: singularity exec --bind /pasteur/ /pasteur/gaia/projets/p01/Atm/singularity/tools/metaphlan2/from_docker/metaphlan2_2.7.7_s3.2.1.simg merge_metaphlan_tables.py
metaphlan2_heatmap:
modules: singularity
exec_command: singularity exec --bind /pasteur/ /pasteur/gaia/projets/p01/Atm/singularity/tools/metaphlan2/from_docker/metaphlan2_2.6.0_s3.2.1.simg metaphlan_hclust_heatmap.py
output_name: snakemake_heatmap.png
export2graphlan:
modules: singularity
exec_command: singularity exec --bind /pasteur/ /pasteur/gaia/projets/p01/Atm/singularity/tools/graphlan/from_docker/graphlan_0.9.7_s3.2.1.simg export2graphlan.py
options: "--skip_rows 1,2 --most_abundant 100 --abundance_threshold 1 --least_biomarkers 10 --annotations 5,6 --external_annotations 7 --min_clade_size 1"
graphlan_annotate:
modules: singularity
exec_command: singularity exec --bind /pasteur/ /pasteur/gaia/projets/p01/Atm/singularity/tools/graphlan/from_docker/graphlan_0.9.7_s3.2.1.simg graphlan_annotate.py
graphlan:
modules: singularity
exec_command: singularity exec --bind /pasteur/ /pasteur/gaia/projets/p01/Atm/singularity/tools/graphlan/from_docker/graphlan_0.9.7_s3.2.1.simg graphlan.py
options: "--dpi 300 --external_legends"
# Simple metaphlan2 workflow
This is a simple example of a workflow using metaphlan2.
## Example
An example is provided and is based on the `config.yaml` file. Singularity images are necessary for this example as well as the samples in the appropriate folder.
\ No newline at end of file
samples:
- SRS014459-Stool
- SRS014464-Anterior_nares
- SRS014470-Tongue_dorsum
- SRS014472-Buccal_mucosa
- SRS014476-Supragingival_plaque
- SRS014494-Posterior_fornix
input_dir: /pasteur/homes/kehillio/Atm/kenzo/repo/workflow-benchmarking/data
output_dir: 20190509_simplecase
metaphlan2:
threads: 4
input_type: fasta
exec_command: singularity run --bind /pasteur/ /pasteur/gaia/projets/p01/Atm/singularity/metaphlan2/metaphlan2.simg
metaphlan2_merge:
exec_command: singularity run --app merge --bind /pasteur/ /pasteur/gaia/projets/p01/Atm/singularity/metaphlan2/metaphlan2.simg
metaphlan2_heatmap:
exec_command: singularity run --app heatmap --bind /pasteur/ /pasteur/gaia/projets/p01/Atm/singularity/metaphlan2/metaphlan2.simg
output_name: snakemake_heatmap.png
export2graphlan:
exec_command: singularity run --app export2graphlan --bind /pasteur/ /pasteur/gaia/projets/p01/Atm/singularity/graphlan/graphlan.simg
options: "--skip_rows 1,2 --most_abundant 100 --abundance_threshold 1 --least_biomarkers 10 --annotations 5,6 --external_annotations 7 --min_clade_size 1"
graphlan_annotate:
exec_command: singularity run --app graphlan_annotate --bind /pasteur/ /pasteur/gaia/projets/p01/Atm/singularity/graphlan/graphlan.simg
graphlan:
exec_command: singularity run --bind /pasteur/ /pasteur/gaia/projets/p01/Atm/singularity/graphlan/graphlan.simg
options: "--dpi 300 --external_legends"
\ No newline at end of file
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment