Skip to content
GitLab
Projects
Groups
Snippets
/
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
Metagenomics
snakemake
Commits
8c7edff7
Commit
8c7edff7
authored
Feb 11, 2020
by
Kenzo-Hugo Hillion
♻
Browse files
finish Snakefile for splitting Fasta
parent
2c621522
Changes
3
Hide whitespace changes
Inline
Side-by-side
tools/utils/split_fasta/Snakefile
View file @
8c7edff7
__split_fasta_number_sequences = config.get('split_fasta', {}).get('number_sequences', 1000000)
__split_fasta_prefix = config.get('split_fasta', {}).get('prefix', 'seq_chunk_')
EXPECTED_EXT = [f"{i:05d}" for i in range(0, int(9898412/__split_fasta_number_sequences) + 1)]
rule split_fasta:
"""
Split a FASTA file with the desired number of sequences per chunk
...
...
@@ -17,4 +12,5 @@ rule split_fasta:
shell:
"""
cat {input} | awk '/^>/ {{if(N>0) printf("\\n"); printf("%s\\n",$0);++N;next;}} {{ printf("%s",$0);}} END {{printf("\\n");}}' | split -l {params.n_lines} -a 5 -d - {params.prefix}
for i in `ls {params.prefix}*`; do mv $i ${{i}}.fa;done
"""
tools/utils/split_fasta/example_usage/Snakefile
0 → 100644
View file @
8c7edff7
configfile: "config.yaml"
def count_sequences(fasta_file):
with open(fasta_file, 'r') as file:
seq = 0
for line in file:
if '>' in line:
seq += 1
return seq
# ==== Snakefile path ====
__split_fasta_rules = config.get("snakefiles", {}).get("split_fasta")
__main_output_dir = config.get('output_dir', 'output')
# ==== Split FASTA ====
__split_fasta_output_dir = __main_output_dir + "/split_fasta"
__split_fasta_input = config['input_fasta']
__split_fasta_number_sequences = config.get('split_fasta', {}).get('number_sequences', 1000000)
total_number_sequences = count_sequences(__split_fasta_input)
EXTENSIONS = [f"{i:05d}" for i in range(0, int(total_number_sequences/__split_fasta_number_sequences) + 1)]
__split_fasta_prefix = "/".join([__split_fasta_output_dir, config['split_fasta']['prefix']])
__split_fasta_output = expand(__split_fasta_prefix + "{ext}.fa", ext=EXTENSIONS)
include: __split_fasta_rules
rule all:
input: __split_fasta_output
tools/utils/split_fasta/example_usage/config.yaml
0 → 100644
View file @
8c7edff7
snakefiles
:
split_fasta
:
/pasteur/projets/policy01/Atm/snakemake/tools/utils/split_fasta/Snakefile
input_fasta
:
/pasteur/projets/policy01/DBs/IGC/2014-9.9M/IGC.fa
output_dir
:
/pasteur/projets/policy01/sandbox/20200210_test_snakemake/output
split_fasta
:
prefix
:
IGC_
number_sequences
:
1000000
Write
Preview
Supports
Markdown
0%
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment