Skip to content
Snippets Groups Projects
Commit 31756889 authored by Amandine  PERRIN's avatar Amandine PERRIN
Browse files

Update conditions on genome name

parent 1ee65c56
Branches
Tags
No related merge requests found
Pipeline #96742 passed
...@@ -633,18 +633,16 @@ protein files ...@@ -633,18 +633,16 @@ protein files
Each genome in your list_file corresponds to a protein file in ``dbdir``. This protein file is in multi-fasta format, Each genome in your list_file corresponds to a protein file in ``dbdir``. This protein file is in multi-fasta format,
and the headers must follow this format: and the headers must follow this format:
``<genome-name_without_space_nor_dot>_<numeric_chars>``. ``<genome-name>_<numeric_chars>``. The ``<genome_name>`` must fulfil the following conditions:
For example ``my-genome-1_00056`` or ``my_genome_1_00056`` are valid protein headers.
.. warning:: All proteins of a genome must have the same ``<genome-name_without_space_nor_dot>``. Otherwise, they won't be considered in the same genome, which will produce errors in your core or persistent genome! - either follow the 'gembase_format', ``<name>.<date>.<strain_num>.<contig><place>_<num>`` (as it is described in :ref:`LSTINFO folder format <lstf>`, field "name of the sequence annotated"). If your protein files were generated by ``PanACoTA annotate``, they are already in this format!
- either being a string ``without space nor dot``.
Ideally, you should follow the 'gembase_format', ``<name>.<date>.<strain_num>.<contig><place>_<num>`` For example ``my-genome-1_00056``, ``ESCO.0321.00001.001i_12345`` or ``my_genome_1_00056`` are valid protein headers. ``mygenome-v1.1.1_12345`` and ``mygenome v1 _12345`` are not.
(as it is described in :ref:`LSTINFO folder format <lstf>`, field "name of the sequence annotated"),
where the genome name, shared by all proteins of the genome.
If your protein files were generated by ``PanACoTA annotate``, they are already in this format! .. warning:: All proteins of a genome must have the same ``genome_name``. Otherwise, they won't be considered in the same genome, which will produce errors in your core or persistent genome!
Those fields will be used to sort genes inside pangenome families. They are sorted by species ``<genome-name_without_space_nor_dot>`` This information will be used to sort genes inside pangenome families. If your follow the gembase format, they are sorted by species
(if you do a pangenome containing different species), (if you do a pangenome containing different species),
strain number ``<strain_num>`` (inside a same species), and protein number ``<num>`` (inside a same strain). If you do not use gembase format, strain number ``<strain_num>`` (inside a same species), and protein number ``<num>`` (inside a same strain). If you do not use gembase format,
families will only be sorted by protein number (the ``<numeric_chars>`` part). families will only be sorted by protein number (the ``<numeric_chars>`` part).
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment