diff --git a/doc/source/usage.rst b/doc/source/usage.rst index 36576d15c54660456f94cf599263c6c5c776c63b..91e3940198269d40a9512bc039005220d53ae5bb 100755 --- a/doc/source/usage.rst +++ b/doc/source/usage.rst @@ -633,18 +633,16 @@ protein files Each genome in your list_file corresponds to a protein file in ``dbdir``. This protein file is in multi-fasta format, and the headers must follow this format: -``<genome-name_without_space_nor_dot>_<numeric_chars>``. -For example ``my-genome-1_00056`` or ``my_genome_1_00056`` are valid protein headers. +``<genome-name>_<numeric_chars>``. The ``<genome_name>`` must fulfil the following conditions: -.. warning:: All proteins of a genome must have the same ``<genome-name_without_space_nor_dot>``. Otherwise, they won't be considered in the same genome, which will produce errors in your core or persistent genome! + - either follow the 'gembase_format', ``<name>.<date>.<strain_num>.<contig><place>_<num>`` (as it is described in :ref:`LSTINFO folder format <lstf>`, field "name of the sequence annotated"). If your protein files were generated by ``PanACoTA annotate``, they are already in this format! + - either being a string ``without space nor dot``. -Ideally, you should follow the 'gembase_format', ``<name>.<date>.<strain_num>.<contig><place>_<num>`` -(as it is described in :ref:`LSTINFO folder format <lstf>`, field "name of the sequence annotated"), -where the genome name, shared by all proteins of the genome. +For example ``my-genome-1_00056``, ``ESCO.0321.00001.001i_12345`` or ``my_genome_1_00056`` are valid protein headers. ``mygenome-v1.1.1_12345`` and ``mygenome v1 _12345`` are not. -If your protein files were generated by ``PanACoTA annotate``, they are already in this format! +.. warning:: All proteins of a genome must have the same ``genome_name``. Otherwise, they won't be considered in the same genome, which will produce errors in your core or persistent genome! -Those fields will be used to sort genes inside pangenome families. They are sorted by species ``<genome-name_without_space_nor_dot>`` +This information will be used to sort genes inside pangenome families. If your follow the gembase format, they are sorted by species (if you do a pangenome containing different species), strain number ``<strain_num>`` (inside a same species), and protein number ``<num>`` (inside a same strain). If you do not use gembase format, families will only be sorted by protein number (the ``<numeric_chars>`` part).