diff --git a/README.md b/README.md index 77a39e256c4bac80b6f3d2725e70264a2e3f232e..4e8ae5f81af66199e55d92e74e18d194b9d62dc0 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,9 @@ -# Snakemake rules +# Snakemake -Repository for snakemake rules concerning tools and workflows +This repository aims to gather all Snakemake descriptions. They all try to follow some rules in order to make them easily reusable from one workflow to another. + +Thus, the repository is splitted in two sections: + +* [__tools__](tools/): All tool descriptions. These are not meant to be used directly and need to be refered in another Snakefile +* [__subworkflows__](subworkflows/): Supposed to behave as tool descriptions, meaning that they need to be refered in another Snakefile to be used. +* [__workflows__](workflows/): All workflows that are supposed to be usable directly \ No newline at end of file diff --git a/subworkflows/README.md b/subworkflows/README.md new file mode 100644 index 0000000000000000000000000000000000000000..ccb75bb5dbccf8225e26b73ce87cb0f9156dade7 --- /dev/null +++ b/subworkflows/README.md @@ -0,0 +1,29 @@ +# Subworkflows + +Similar to tools, subworkflows cannot be used directly but need to be called in a higher Snakefile +by setting up the required variables. + +## Design and best practices + +All subworkflows are described in a `Snakefile` in a directory having its name. +All Snakefiles try to respect some rules and best practices in their design: + +* Snakefile paths +* Set up different parameters for tool rules + +### Snakefile paths + +All `Snakefile` path can be configured in the `config.yaml` file but have default value that are relative path in this repository. + +### Set up different parameters for tool rules + +Then for every tool, it should have some input and output to specify. (_ref to tools directory README for more information_). + +As for the tool descriptions, input and output for subworkflows are abstracted using the nomenclature `__SUBWORKFLOW_NAME_input` and `__SUBWORKFLOW_NAME_output`. + +For intermediate tools, you can then easily link output and input: + +```python +__tool1_output = __subworkflow_output + ".tool1.out" +__tool2_input = __tool1_input +``` \ No newline at end of file diff --git a/workflows/subworkflows/graphlan_from_metaphlan2/Snakefile b/subworkflows/graphlan_from_metaphlan2/Snakefile similarity index 60% rename from workflows/subworkflows/graphlan_from_metaphlan2/Snakefile rename to subworkflows/graphlan_from_metaphlan2/Snakefile index ed2ab0401d21095c69a5990e2374470749a30bac..c45ffedbd9abfa5f5561bd573647844101ec6051 100644 --- a/workflows/subworkflows/graphlan_from_metaphlan2/Snakefile +++ b/subworkflows/graphlan_from_metaphlan2/Snakefile @@ -5,12 +5,18 @@ This subworkflow need to be called to be used by specifying: - __graphlan_from_metaphlan2_input - __graphlan_from_metaphlan2_output - __graphlan_from_metaphlan2_output_dir + +Config file can contained all options for every tools as well as: +snakefiles: + export2graphlan: PATH_TO_SNAKEFILE + graphlan_annotate: PATH_TO_SNAKEFILE + graphlan: PATH_TO_SNAKEFILE """ -# ==== Rule paths ==== -__export2graphlan_rules = config.get("rules", {}).get("export2graphlan", "../../../tools/graphlan/export2graphlan/Snakefile") -__graphlan_annotate_rules = config.get("rules", {}).get("graphlan_annotate", "../../../tools/graphlan/graphlan_annotate/Snakefile") -__graphlan_rules = config.get("rules", {}).get("graphlan", "../../../tools/graphlan/graphlan/Snakefile") +# ==== Snakefile paths ==== +__export2graphlan_rules = config.get("snakefiles", {}).get("export2graphlan", "../../../tools/graphlan/export2graphlan/Snakefile") +__graphlan_annotate_rules = config.get("snakefiles", {}).get("graphlan_annotate", "../../../tools/graphlan/graphlan_annotate/Snakefile") +__graphlan_rules = config.get("snakefiles", {}).get("graphlan", "../../../tools/graphlan/graphlan/Snakefile") # ---- export2graphlan __export2graphlan_input = __graphlan_from_metaphlan2_input diff --git a/tools/README.md b/tools/README.md index 31c15148dfd81ce91b454ec7f1025147505fd2be..c1df281bdf8e9bc9159fca5c6384a1fd6195efbd 100644 --- a/tools/README.md +++ b/tools/README.md @@ -1,24 +1,44 @@ # Tools -This directory contains all tool description that can be imported and used within Snakemake workflows - -Every rules for a tool should come with an example of use in a workflow as well as a `config.yaml` -example file (usually containing default parameters that are set up on the `.rules` file itself). +This directory contains all tool descriptions that can be imported and used within Snakemake workflows. ## Design and best practices -All Snakemake tool descriptions try to respect some rules and best practices in their design: +All tools are described in a `Snakefile` in a directory having its name. +All Snakefiles try to respect some rules and best practices in their design: + +* Reference to options from a `config.yaml` file +* Tool description + +### Reference to options from `config.yaml` file + +First part correspond to all options that are set up from a `config.yaml` file. They all have the nomenclature `__TOOLNAME_variable`. + +Then in the YAML `config.yaml` file, you can set the variable as followed: + +```yaml +TOOLNAME: + variable: 4 +``` + +We recommend to use the `get` method to access this parameter and set a default value: + +```python +config['TOOLNAME'].get('variable', 1) +``` + +### Tool description + +In order to ease the linking of tools in a workflow, every parts of the tool is described as followed: -* First part correspond to all options that are set up from a `config.yaml` file. They all have the nomenclature `__TOOLNAME_variable`. -* Then the tool itself is described with: * input with the nomenclature: `__TOOLNAME_input` * output with the nomenclature: ` __TOOLNAME_output` * params with different options that are described above * There is usually a `exec_command` to give the possibility to change the way the tool is called (locally installed, singularity ...) - * There is usually a `options` to specify all other command line options + * There is usually a `options` to specify all other command line options. _You can still give a more detailed level of description for the options_ * the shell command -input and output are then set up in the Snakemake file that refer to the rule. +input and output are then set up in the workflow `Snakefile` that refer to the rule. Therefore the rules cannot be used directly. ## Example diff --git a/workflows/README.md b/workflows/README.md index 81c31dcdd085bf6214a1323256c5f531e8e0539b..e51a50f8c142c65b283372d31037c807dd204765 100644 --- a/workflows/README.md +++ b/workflows/README.md @@ -1,29 +1,44 @@ # Snakemake workflows -This directory contains every workflows built from Snakemake tool found in the `tools` directory of the repository. +This directory contains every workflows built from Snakemake tool and subworkflows found in the `tools` and `subworkflows` directories of the repository. ## Design and best practices All the Snakemake workflows try to respect some rules and best practices in their design. Therefore, workflows have the following parts: -1. reference to `config.yaml` file -2. reference to tool rules -3. set up of common variable for the workflow -4. set up of specific variables for the tools +1. Reference to options from a `config.yaml` file. +2. Snakefile paths +3. Set up of common variable for the workflow +4. Set up of specific variables for the tools 5. `rule all` to specify what is/are the specific file(s) expected from the workflow. -#### Reference to tool rules +### Reference to options from `config.yaml` file -The location of the tools included in the workflow can vary depending your organization. -Therefore, we decide to put every references at the begining of the workflow to make it easy -to customize and change if necessary. +First part correspond to all options that are set up from a `config.yaml` file. They all have the nomenclature `__TOOLNAME_variable`. -#### Set up common variable for the workflow +Then in the YAML `config.yaml` file, you can set the variable as followed: + +```yaml +TOOLNAME: + variable: 4 +``` + +We recommend to use the `get` method to access this parameter and set a default value: + +```python +config['TOOLNAME'].get('variable', 1) +``` + +### Snakefile paths + +All `Snakefile` path can be configured in the `config.yaml` file but have default value that are relative path in this repository. + +### Set up common variable for the workflow Then every common variables are set up. This can be handy for reference to a common directory. -#### Set up specific variables for the tools +### Set up specific variables for the tools Every specific variables for every tools are then specified. For more details about the way every tool is describe, you can refer to the `README` of the `tools` directory of the @@ -62,4 +77,6 @@ Then, to run a `SnakeFile` on Tars we use a command similar to the following: ```bash sbatch --qos=atm -p atm -c 1 snakemake -p -j 6 --cluster-config cluster.yml --cluster "sbatch --qos={cluster.queue} -p {cluster.queue} -c {threads}" -``` \ No newline at end of file +``` + +For cluster configuration, you need to have a `cluster.yml` file on the running directory. \ No newline at end of file diff --git a/workflows/simple_metaphlan2/Snakefile b/workflows/simple_metaphlan2/Snakefile index a53b03787d541fbda5f06a0846deaa0e0c07af79..58d0e3afaaca3ba04e27f8b6b99ec77e8a4fc1bc 100644 --- a/workflows/simple_metaphlan2/Snakefile +++ b/workflows/simple_metaphlan2/Snakefile @@ -1,10 +1,10 @@ configfile: "config.yaml" -# ==== Rule paths ==== -__metaphlan2_rules = config.get("rules", {}).get("metaphlan2", "../../tools/metaphlan2/metaphlan2/Snakefile") -__metaphlan2_merge_rules = config.get("rules", {}).get("metaphlan2_merge", "../../tools/metaphlan2/metaphlan2_merge/Snakefile") -__metaphlan2_heatmap_rules = config.get("rules", {}).get("metaphlan2_heatmap", "../../tools/metaphlan2/metaphlan2_heatmap/Snakefile") -__graphlan_from_metaphlan2_rules = config.get("rules", {}).get("graphlan_from_metaphlan2", "../subworkflows/graphlan_from_metaphlan2/Snakefile") +# ==== Snakefile paths ==== +__metaphlan2_rules = config.get("snakefiles", {}).get("metaphlan2", "../../tools/metaphlan2/metaphlan2/Snakefile") +__metaphlan2_merge_rules = config.get("snakefiles", {}).get("metaphlan2_merge", "../../tools/metaphlan2/metaphlan2_merge/Snakefile") +__metaphlan2_heatmap_rules = config.get("snakefiles", {}).get("metaphlan2_heatmap", "../../tools/metaphlan2/metaphlan2_heatmap/Snakefile") +__graphlan_from_metaphlan2_rules = config.get("snakefiles", {}).get("graphlan_from_metaphlan2", "../subworkflows/graphlan_from_metaphlan2/Snakefile") __input_dir = config['input_dir'] __main_output_dir = config.get('output_dir', 'output') diff --git a/workflows/subworkflows/README.md b/workflows/subworkflows/README.md deleted file mode 100644 index b960b6cdf88f60e0ebdc232f77e860d956ca6fc7..0000000000000000000000000000000000000000 --- a/workflows/subworkflows/README.md +++ /dev/null @@ -1,4 +0,0 @@ -# Snakemake subworkflows - -Similar to tools, subworkflows cannot be used directly but need to be called in a higher Snakefile -by setting up the required variables.