Commit 3a4ec78c authored by Kenzo-Hugo Hillion's avatar Kenzo-Hugo Hillion
Browse files

update documentation and best practices

parent 7af20a4c
# Snakemake rules # Snakemake
Repository for snakemake rules concerning tools and workflows This repository aims to gather all Snakemake descriptions. They all try to follow some rules in order to make them easily reusable from one workflow to another.
Thus, the repository is splitted in two sections:
* [__tools__](tools/): All tool descriptions. These are not meant to be used directly and need to be refered in another Snakefile
* [__subworkflows__](subworkflows/): Supposed to behave as tool descriptions, meaning that they need to be refered in another Snakefile to be used.
* [__workflows__](workflows/): All workflows that are supposed to be usable directly
\ No newline at end of file
# Subworkflows
Similar to tools, subworkflows cannot be used directly but need to be called in a higher Snakefile
by setting up the required variables.
## Design and best practices
All subworkflows are described in a `Snakefile` in a directory having its name.
All Snakefiles try to respect some rules and best practices in their design:
* Snakefile paths
* Set up different parameters for tool rules
### Snakefile paths
All `Snakefile` path can be configured in the `config.yaml` file but have default value that are relative path in this repository.
### Set up different parameters for tool rules
Then for every tool, it should have some input and output to specify. (_ref to tools directory README for more information_).
As for the tool descriptions, input and output for subworkflows are abstracted using the nomenclature `__SUBWORKFLOW_NAME_input` and `__SUBWORKFLOW_NAME_output`.
For intermediate tools, you can then easily link output and input:
```python
__tool1_output = __subworkflow_output + ".tool1.out"
__tool2_input = __tool1_input
```
\ No newline at end of file
...@@ -5,12 +5,18 @@ This subworkflow need to be called to be used by specifying: ...@@ -5,12 +5,18 @@ This subworkflow need to be called to be used by specifying:
- __graphlan_from_metaphlan2_input - __graphlan_from_metaphlan2_input
- __graphlan_from_metaphlan2_output - __graphlan_from_metaphlan2_output
- __graphlan_from_metaphlan2_output_dir - __graphlan_from_metaphlan2_output_dir
Config file can contained all options for every tools as well as:
snakefiles:
export2graphlan: PATH_TO_SNAKEFILE
graphlan_annotate: PATH_TO_SNAKEFILE
graphlan: PATH_TO_SNAKEFILE
""" """
# ==== Rule paths ==== # ==== Snakefile paths ====
__export2graphlan_rules = config.get("rules", {}).get("export2graphlan", "../../../tools/graphlan/export2graphlan/Snakefile") __export2graphlan_rules = config.get("snakefiles", {}).get("export2graphlan", "../../../tools/graphlan/export2graphlan/Snakefile")
__graphlan_annotate_rules = config.get("rules", {}).get("graphlan_annotate", "../../../tools/graphlan/graphlan_annotate/Snakefile") __graphlan_annotate_rules = config.get("snakefiles", {}).get("graphlan_annotate", "../../../tools/graphlan/graphlan_annotate/Snakefile")
__graphlan_rules = config.get("rules", {}).get("graphlan", "../../../tools/graphlan/graphlan/Snakefile") __graphlan_rules = config.get("snakefiles", {}).get("graphlan", "../../../tools/graphlan/graphlan/Snakefile")
# ---- export2graphlan # ---- export2graphlan
__export2graphlan_input = __graphlan_from_metaphlan2_input __export2graphlan_input = __graphlan_from_metaphlan2_input
......
# Tools # Tools
This directory contains all tool description that can be imported and used within Snakemake workflows This directory contains all tool descriptions that can be imported and used within Snakemake workflows.
Every rules for a tool should come with an example of use in a workflow as well as a `config.yaml`
example file (usually containing default parameters that are set up on the `.rules` file itself).
## Design and best practices ## Design and best practices
All Snakemake tool descriptions try to respect some rules and best practices in their design: All tools are described in a `Snakefile` in a directory having its name.
All Snakefiles try to respect some rules and best practices in their design:
* Reference to options from a `config.yaml` file
* Tool description
### Reference to options from `config.yaml` file
First part correspond to all options that are set up from a `config.yaml` file. They all have the nomenclature `__TOOLNAME_variable`.
Then in the YAML `config.yaml` file, you can set the variable as followed:
```yaml
TOOLNAME:
variable: 4
```
We recommend to use the `get` method to access this parameter and set a default value:
```python
config['TOOLNAME'].get('variable', 1)
```
### Tool description
In order to ease the linking of tools in a workflow, every parts of the tool is described as followed:
* First part correspond to all options that are set up from a `config.yaml` file. They all have the nomenclature `__TOOLNAME_variable`.
* Then the tool itself is described with:
* input with the nomenclature: `__TOOLNAME_input` * input with the nomenclature: `__TOOLNAME_input`
* output with the nomenclature: ` __TOOLNAME_output` * output with the nomenclature: ` __TOOLNAME_output`
* params with different options that are described above * params with different options that are described above
* There is usually a `exec_command` to give the possibility to change the way the tool is called (locally installed, singularity ...) * There is usually a `exec_command` to give the possibility to change the way the tool is called (locally installed, singularity ...)
* There is usually a `options` to specify all other command line options * There is usually a `options` to specify all other command line options. _You can still give a more detailed level of description for the options_
* the shell command * the shell command
input and output are then set up in the Snakemake file that refer to the rule. input and output are then set up in the workflow `Snakefile` that refer to the rule.
Therefore the rules cannot be used directly. Therefore the rules cannot be used directly.
## Example ## Example
......
# Snakemake workflows # Snakemake workflows
This directory contains every workflows built from Snakemake tool found in the `tools` directory of the repository. This directory contains every workflows built from Snakemake tool and subworkflows found in the `tools` and `subworkflows` directories of the repository.
## Design and best practices ## Design and best practices
All the Snakemake workflows try to respect some rules and best practices in their design. All the Snakemake workflows try to respect some rules and best practices in their design.
Therefore, workflows have the following parts: Therefore, workflows have the following parts:
1. reference to `config.yaml` file 1. Reference to options from a `config.yaml` file.
2. reference to tool rules 2. Snakefile paths
3. set up of common variable for the workflow 3. Set up of common variable for the workflow
4. set up of specific variables for the tools 4. Set up of specific variables for the tools
5. `rule all` to specify what is/are the specific file(s) expected from the workflow. 5. `rule all` to specify what is/are the specific file(s) expected from the workflow.
#### Reference to tool rules ### Reference to options from `config.yaml` file
The location of the tools included in the workflow can vary depending your organization. First part correspond to all options that are set up from a `config.yaml` file. They all have the nomenclature `__TOOLNAME_variable`.
Therefore, we decide to put every references at the begining of the workflow to make it easy
to customize and change if necessary.
#### Set up common variable for the workflow Then in the YAML `config.yaml` file, you can set the variable as followed:
```yaml
TOOLNAME:
variable: 4
```
We recommend to use the `get` method to access this parameter and set a default value:
```python
config['TOOLNAME'].get('variable', 1)
```
### Snakefile paths
All `Snakefile` path can be configured in the `config.yaml` file but have default value that are relative path in this repository.
### Set up common variable for the workflow
Then every common variables are set up. This can be handy for reference to a common directory. Then every common variables are set up. This can be handy for reference to a common directory.
#### Set up specific variables for the tools ### Set up specific variables for the tools
Every specific variables for every tools are then specified. For more details about the way Every specific variables for every tools are then specified. For more details about the way
every tool is describe, you can refer to the `README` of the `tools` directory of the every tool is describe, you can refer to the `README` of the `tools` directory of the
...@@ -62,4 +77,6 @@ Then, to run a `SnakeFile` on Tars we use a command similar to the following: ...@@ -62,4 +77,6 @@ Then, to run a `SnakeFile` on Tars we use a command similar to the following:
```bash ```bash
sbatch --qos=atm -p atm -c 1 snakemake -p -j 6 --cluster-config cluster.yml --cluster "sbatch --qos={cluster.queue} -p {cluster.queue} -c {threads}" sbatch --qos=atm -p atm -c 1 snakemake -p -j 6 --cluster-config cluster.yml --cluster "sbatch --qos={cluster.queue} -p {cluster.queue} -c {threads}"
``` ```
\ No newline at end of file
For cluster configuration, you need to have a `cluster.yml` file on the running directory.
\ No newline at end of file
configfile: "config.yaml" configfile: "config.yaml"
# ==== Rule paths ==== # ==== Snakefile paths ====
__metaphlan2_rules = config.get("rules", {}).get("metaphlan2", "../../tools/metaphlan2/metaphlan2/Snakefile") __metaphlan2_rules = config.get("snakefiles", {}).get("metaphlan2", "../../tools/metaphlan2/metaphlan2/Snakefile")
__metaphlan2_merge_rules = config.get("rules", {}).get("metaphlan2_merge", "../../tools/metaphlan2/metaphlan2_merge/Snakefile") __metaphlan2_merge_rules = config.get("snakefiles", {}).get("metaphlan2_merge", "../../tools/metaphlan2/metaphlan2_merge/Snakefile")
__metaphlan2_heatmap_rules = config.get("rules", {}).get("metaphlan2_heatmap", "../../tools/metaphlan2/metaphlan2_heatmap/Snakefile") __metaphlan2_heatmap_rules = config.get("snakefiles", {}).get("metaphlan2_heatmap", "../../tools/metaphlan2/metaphlan2_heatmap/Snakefile")
__graphlan_from_metaphlan2_rules = config.get("rules", {}).get("graphlan_from_metaphlan2", "../subworkflows/graphlan_from_metaphlan2/Snakefile") __graphlan_from_metaphlan2_rules = config.get("snakefiles", {}).get("graphlan_from_metaphlan2", "../subworkflows/graphlan_from_metaphlan2/Snakefile")
__input_dir = config['input_dir'] __input_dir = config['input_dir']
__main_output_dir = config.get('output_dir', 'output') __main_output_dir = config.get('output_dir', 'output')
......
# Snakemake subworkflows
Similar to tools, subworkflows cannot be used directly but need to be called in a higher Snakefile
by setting up the required variables.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment