Update conditions on genome name

31756889 · Amandine PERRIN · 1ee65c56 · 31756889
Commit 31756889 authored 2 years ago by Amandine PERRIN
--- a/doc/source/usage.rst
+++ b/doc/source/usage.rst
@@ -633,18 +633,16 @@ protein files
 Each genome in your list_file corresponds to a protein file in ``dbdir``. This protein file is in multi-fasta format,
 and the headers must follow this format:
-``<genome-name_without_space_nor_dot>_<numeric_chars>``.
+``<genome-name>_<numeric_chars>``. The ``<genome_name>`` must fulfil the following conditions:
-For example ``my-genome-1_00056`` or ``my_genome_1_00056`` are valid protein headers.
-.. warning:: All proteins of a genome must have the same ``<genome-name_without_space_nor_dot>``. Otherwise, they won't be considered in the same genome, which will produce errors in your core or persistent genome!
+    - either follow the 'gembase_format', ``<name>.<date>.<strain_num>.<contig><place>_<num>`` (as it is described in :ref:`LSTINFO folder format <lstf>`, field "name of the sequence annotated"). If your protein files were generated by ``PanACoTA annotate``, they are already in this format!
+    - either being a string ``without space nor dot``.
-Ideally, you should follow the 'gembase_format', ``<name>.<date>.<strain_num>.<contig><place>_<num>``
+For example ``my-genome-1_00056``, ``ESCO.0321.00001.001i_12345`` or ``my_genome_1_00056`` are valid protein headers. ``mygenome-v1.1.1_12345`` and ``mygenome v1 _12345`` are not.
-(as it is described in :ref:`LSTINFO folder format <lstf>`, field "name of the sequence annotated"),
-where the genome name, shared by all proteins of the genome.
-If your protein files were generated by ``PanACoTA annotate``, they are already in this format!
+.. warning:: All proteins of a genome must have the same ``genome_name``. Otherwise, they won't be considered in the same genome, which will produce errors in your core or persistent genome!
-Those fields will be used to sort genes inside pangenome families. They are sorted by species ``<genome-name_without_space_nor_dot>``
+This information will be used to sort genes inside pangenome families. If your follow the gembase format, they are sorted by species
 (if you do a pangenome containing different species),
 strain number ``<strain_num>`` (inside a same species), and protein number ``<num>`` (inside a same strain). If you do not use gembase format,
 families will only be sorted by protein number (the ``<numeric_chars>`` part).