Commit 2c621522 authored by Kenzo-Hugo Hillion's avatar Kenzo-Hugo Hillion
Browse files

start Snakefile to split fasta

parent e3b2c07b
__split_fasta_number_sequences = config.get('split_fasta', {}).get('number_sequences', 1000000)
__split_fasta_prefix = config.get('split_fasta', {}).get('prefix', 'seq_chunk_')
EXPECTED_EXT = [f"{i:05d}" for i in range(0, int(9898412/__split_fasta_number_sequences) + 1)]
rule split_fasta:
"""
Split a FASTA file with the desired number of sequences per chunk
"""
input:
__split_fasta_input
output:
__split_fasta_output
params:
n_lines = __split_fasta_number_sequences * 2,
prefix = __split_fasta_prefix
shell:
"""
cat {input} | awk '/^>/ {{if(N>0) printf("\\n"); printf("%s\\n",$0);++N;next;}} {{ printf("%s",$0);}} END {{printf("\\n");}}' | split -l {params.n_lines} -a 5 -d - {params.prefix}
"""
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment