Commit 136e1c33 authored by Amandine  PERRIN's avatar Amandine PERRIN
Browse files

gene name starting by genome name

parent 072bbd29
Pipeline #65317 passed with stages
in 6 minutes and 30 seconds
...@@ -326,15 +326,10 @@ def get_genome(header, all_genomes): ...@@ -326,15 +326,10 @@ def get_genome(header, all_genomes):
header = header.split(">")[1].split()[0] header = header.split(">")[1].split()[0]
for genome in all_genomes: for genome in all_genomes:
if genome in header: if header.startswith(genome):
# header should be genome<something>_num # header should start with the genome name. Nothing before it.
# -> header.split(genome) should be empty for the first field # Ex: >86KG_12345 is from genome 86KG. >6KG_12345 is from genome 6KG, not 86KG
# If not empty, means that genome name is included into another genome name, so return genome
# we must not return this genome.
# For example, genome "8-KG" is in header "98-KG_xxx", but the correct genome for this
# header is "98-KG"
if not header.split(genome)[0]:
return genome
logger.error((f"Protein {header} does not correspond to any genome name " logger.error((f"Protein {header} does not correspond to any genome name "
f"given... {all_genomes}")) f"given... {all_genomes}"))
return None return None
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment