Skip to content
Snippets Groups Projects
Select Git revision
  • 1.2
  • main default protected
  • 1.1
  • 1.0
4 results

OGRI

  • Clone with SSH
  • Clone with HTTPS
  • user avatar
    Alexis CRISCUOLO authored
    2ab67044
    History
    Name Last commit Last update
    COPYING
    OGRI_B.sh
    README.md

    GPLv3 license Bash

    OGRI

    OGRI (Overall Genome Relatedness Indices; Chun & Rainey 2014) is a command line programs written in Bash to compute pairwise similarity measures between whole genome sequences. Every computed similarity is based on local sequence alignments:

      ▹   Average Nucleotide Identity (ANI; Goris et al. 2007),

      ▹   Percentage of Conserved DNA (cDNA; Goris et al. 2007),

      ▹   OrthoANI (oANI; Lee et al. 2016),

      ▹   Percentage Of Conserved Proteins (POCP; Qin et al. 2014),

      ▹   CDS-based ANI (cANI; Konstantinidis & Tiedje 2005a; gANI; Varghese et al. 2015),

      ▹   Alignment Fraction (AF; Varghese et al. 2015),

      ▹   (one-way) Average Amino-acid Identity (AAI; Konstantinidis & Tiedje 2005b),

      ▹   Proteome Coverage (ProCov; Kim et al. 2021),

      ▹   Reciprocal AAI (rAAI; Nicholson et al. 2020).

    The key aim of OGRI is to provide a wide range of genome proximity metrics in an accurate way, i.e. implemented following the specific descriptions given by each associated article (see Methods). Consequently, OGRI is not expected to run very fast (e.g. OGRI_B requires up to one minute to deal with two 5 Mbp-long genomes on 12 threads), even though faster running times are expected with a larger number of threads.

    Every OGRI tool runs on UNIX, Linux and most OS X operating systems.

    Dependencies

    You will need to install the required programs listed in the following table, or to verify that they are already installed with the required version.

    OGRI tool program package version sources
    OGRI_B gawk - > 4.0.0 ftp.gnu.org/gnu/gawk
    OGRI_B prodigal - ≥ 2.6.3 github.com/hyattpd/Prodigal
    OGRI_B makeblastdb
    blastn
    blastp
    tblastn
    blast+ ≥ 2.12.0 ftp.ncbi.nlm.nih.gov/blast/executables/blast+

    Installation and execution

    Clone this repository with the following command line:

    git clone https://gitlab.pasteur.fr/GIPhy/OGRI.git

    Go to the directory OGRI/ to give the execute permission to the file:

    cd OGRI/
    chmod +x OGRI_B.sh

    and run it with the following command line model:

    ./OGRI_B.sh [options]

    If at least one of the indicated programs (see Dependencies) is not available on your $PATH variable (or if one compiled binary has a different default name), the OGRI tools will exit with an error message (when the requisite programs are missing). To set a required program that is not available on your $PATH variable, edit the file and indicate the local path to the corresponding binary(ies) within the code block REQUIREMENTS.

    Usage

    Run OGRI_B without option to read the following documentation:

     USAGE:  OGRI.sh  [OPTIONS]  <fasta1>  <fasta2>  [<fasta3> ...]
    
     OPTIONS:
      -x          only OGRIs based on genome fragments (ANI, oANI)
      -y          only OGRIs based on CDS (POCP, gANI, AF, AAI, ProCov, rAAI)
      -z          only OGRIs based on reciprocal searches (oANI, gANI, AF, ProCov, rAAI)
      -b <int>    number of bootstrap replicates for confidence intervals (default: 200)
      -r          tab-delimited raw output (default: detailed output)
      -t <int>    number of threads (default: 2)
      -h          prints this help and exits

    Notes

    • Each input file should be in FASTA format and may contain nucleotide sequences. At least two input files should be specified. If more than two files are specified, the pairwise similarities are computed between the genome in the first file and the genome in each other files.

    • By default, all OGRIs are computed (see Methods below). However, the number of computed OGRIs can be reduced using options -x, -y or -z.

    • The 95% confidence interval is estimated for most OGRIs using a bootstrap approach. The default number of bootstrap replicates (i.e. 200) can be modified with option -b.

    • Faster running times can be observed when using a large number of threads (option -t). OGRI is expected to reach its optimum running times when using up to 50 threads (no need to set more...).

    • By default, progress bars and detailed results are outputted in stderr and stdout, respectively. The progress bars can be suppressed by ending the command line with 2>/dev/null.

    • Raw tab-delimited results (stdout) can be obtained with option -r. Field names and numbers are summarized in the table below (see Methods for the meaning of each field).

    field name field number
    (default)
    field number
    (option -x)
    field number
    (option -y)
    field number
    (option -z)
    GENO1 1 1 1 1
    GENO2 2 2 2 2
    lgt1 3 3 3 3
    lgt2 4 4 4 4
    nFRA1 5 5
    nFRA2 6 6
    nFRA12 7 7
    nFRA21 8 8
    cDNA12 9 9
    cDNA21 10 10
    ANI12 [CI_ANI12] 11 11
    ANI21 [CI_ANI21] 12 12
    ANI [CI_ANI] 13 13
    nfRBH 14 14 5
    oANI [CI_oANI] 15 15 6
    nCDS1 16 5 7
    nCDS2 17 6 8
    nCDS12 18 7
    nCDS21 19 8
    POCP 20 9
    cCDS12 21 10
    cCDS21 22 11
    cANI12 [CI_cANI12] 23 12
    cANI21 [CI_cANI21] 24 13
    cANI [CI_cANI] 25 14
    ngRBH 26 15 9
    gANI12 [CI_gANI12] 27 16 10
    gANI21 [CI_gANI21] 28 17 11
    gANI [CI_gANI] 29 18 12
    AF12 [CI_AF12] 30 19 13
    AF21 [CI_AF21] 31 20 14
    AF [AF_CI] 32 21 15
    mCDS12 33 22
    mCDS21 34 23
    AAI12 [CI_AAI12] 35 24
    AAI21 [CI_AAI21] 36 25
    AAI [CI_AAI] 37 26
    naRBH 38 27 16
    ProCov 39 28 17
    rAAI [CI_rAAI] 40 29 18

    Methods

    Each input genome nucleotide sequences GENOi is decomposed into three sets:

    • FRAGi: a set of consecutive fragments, each of length (at most) 1020 bps (Goris et al. 2007, Lee et al. 2016); OGRI extracts fragments containing only the character states A, C, G and T (case insensitive), and discards all fragments of length smaller than 920 bps;

    • CDSNi: a set of coding codon sequences; OGRI uses Prodigal (Hyatt et al. 2010) to build this set; every codon sequence of length smaller than 33 codons is discarded, as well as any sequence containing any other character state than the ones from the IUPAC set {A, C, G, T}; of note, all stop codons are kept;

    • CDSAi: a set of coding amino acid sequences; OGRI creates this set by translating every codon sequences in CDSNi; of note, every non-translatable codon is discarded.

    Given two genomes 1 and 2, different local alignments (best hit of each sequence from a set against the sequences from another set) are obtained using different flavors of BLAST (Altschul et al. 1990; Camacho et al. 2008):

    • FRAG1 against GENO2 (and reciprocally) using blastn (Altschul et al. 1990; Zhang et al. 2000) with tuned parameters, as described by Goris et al. (2007; see also Yoon et al. 2017);

    • FRAG1 against FRAG2 (and reciprocally) using blastn with tuned parameters, as described by Lee et al. (2016; see also Yoon et al. 2017);

    • CDSN1 against CDSN2 (and reciprocally) using blastn with tuned parameters (as described by Konstantinidis and Tiedje 2005a), as well as default parameters to approximate the NSimScan tool (Novichkov et al. 2016) used by Varghese et al. (2015);

    • CDSA1 against GENO2 (and reciprocally) using tblastn (Gertz et al. 2006) with default parameters, as described by Konstantinidis and Tiedje (2005b);

    • CDSA1 against CDSA2 (and reciprocally) using blastp (Altschul et al. 1997) with default parameters, as described by Qin et al. (2014); see also Nicholson et al. (2020), and Kim et al. (2021) for similar approaches.

    These various local alignments are next specifically filtered, and the resulting sets of local similarities are used to derive different pairwise similarity measures:

    • ANI: the local alignments of FRAG1 against GENO2 and FRAG2 against GENO1 are screened following the criteria of Goris et al. (2007), resulting to nFRA1 and nFRA2 remaining fragments and associated local alignments, respectively; these selected local alignments are used to derive the two percentages of conserved DNA cDNA12 and cDNA21, and the two pairwise similarity percentages ANI12 and ANI21 (as well as their average ANI), respectively, as described by Goris et al. (2007);

    • OrthoANI: the local alignments of FRAG1 against FRAG2 and FRAG2 against FRAG1 are screened following the criteria of Lee et al. (2016), and next processed to identify reciprocal best hits (RBH), resulting to nfRBH remaining fragment pairs and associated local alignments; these selected local alignments are used to derive the similarity percentages oANI, as described by Lee et al. (2016);

    • cANI: the local alignments of CDSN1 against CDSN2 and CDSN2 against CDSN1 are screened following the criteria of Konstantinidis and Tiedje (2005a), resulting to cCDS12 and cCDS21 remaining CDS and associated local alignments (at the nucleotide level), respectively; these selected local alignments are used to derive the two pairwise similarity percentages cANI12 and cANI21, respectively, as described by Konstantinidis and Tiedje (2005a), as well as their average cANI;

    • gANI, AF: the local alignments (blastn, default parameters) of CDSN1 against CDSN2 and CDSN2 against CDSN1 are screened following the criteria of Varghese et al. (2015), and next processed to identify RBH, resulting to ngRBH remaining CDS pairs and associated local alignments (at the nucleotide level); these selected local alignments are used to derive the two pairwise similarity percentages gANI12 and gANI21, and the two alignment fractions AF12 and AF21, respectively, as described by Varghese et al. (2015); the final values gANI and AF are the average of gANI12 and gANI21, and of AF12 and AF21, respectively;

    • (one-way) AAI: the local alignments of CDSA1 against GENO2 and CDSA2 against GENO1 are screened following the criteria of Konstantinidis and Tiedje (2005b), resulting to mCDS1 and mCDS2 remaining CDS and associated local alignments (at the amino acid level), respectively; these selected local alignments are used to derive the two pairwise similarity percentages AAI12 and AAI21, respectively, as described by Konstantinidis and Tiedje (2005b), as well as their average AAI; it is worth noting that this estimation of the AAI corresponds to the "AAI based on one-way BLAST" (sensu Konstantinidis and Tiedje 2005b), to be opposed to "AAI based on two-way BLAST" (sensu Konstantinidis and Tiedje 2005b);

    • POCP: the local alignments of CDSA1 against CDSA2 and CDSA2 against CDSA1 are screened following the criteria of Qin et al. (2014; see also Nicholson et al. 2020, Kim et al. 2021), resulting to nCDS1 and nCDS2 remaining CDS and associated local alignments, respectively; these selected local alignments are used to derive the Percentage Of Conserved Proteins POCP, as described by Qin et al. (2014);

    • ProCov, rAAI: the local alignments selected for computing POCP are processed to identify RBH, resulting to naRBH remaining CDS pairs and associated local alignments; these selected local alignments are used to derive the similarity percentage rAAI, as described by Nicholson et al. (2020; see also Kim et al. 2021), as well as the Proteome Coverage ProCov, as described by Kim et al. (2021); note that rAAI is quite comparable to the AAI based on two-way BLAST (sensu Konstantinidis and Tiedje 2005b).

    The different estimated OGRIs can be used to assess taxonomic rank delineation.

    • Species delineation. Different species delineation cutoffs based on different OGRIs were proposed, e.g. cANI = 94% (Konstantinidis and Tiedje 2005a), (two-way) AAI = 95%-96% (Konstantinidis and Tiedje 2005b), ANI = 95% and cDNA12 = cDNA21 = 69% (Goris et al. 2007; see also Rodriguez-R and Konstantinidis 2014), ANI = 95%-96% (Richter and Rossello-Mora 2009), rAAI = 95% (Luo et al. 2014), AF = 0.6 and gANI = 96.5% (Varghese et al. 2015). Alternative implementations for estimating the Average Nucleotide Identity also led to comparable species delineation cutoffs, e.g. 95% using FastANI (Jain et al. 2018), a tool comparable to OrthoANI for closely-related genomes (e.g. ANI > 93%; Palmer et al. 2020). Of important note, OGRI values based on non-RBH sequence similarity searches (e.g. ANI, cANI, AAI) are often (incorrectly) smaller than those based on RBH approaches (e.g. OrthoANI, gANI, AF, rAAI), because of the occurrences of repeat regions or the presence of expanded families of paralogous genes (e.g. Konstantinidis and Tiedje 2005b, Palmer et al. 2020). Nevertheless, as the proposed cutoffs for both AF and gANI are based on a sequence similarity search tool (NSimScan) that is different from the one used by OGRI (blastn), the two implementations are not comparable. It is therefore recommended to use oANI = 95% and/or rAAI = 95% as species delineation cutoffs.

    • Genus delineation. Different genus delineation cutoffs based on different OGRIs were proposed, e.g. POCP = 50% (Qin et al. 2014), (two-way) AAI = 65% (Konstantinidis et al. 2017; see also Rodriguez-R and Konstantinidis 2014), rAAI = 60% (Luo et al. 2014). In consequence, one can consider that two genomes leading to POCP < 50% and rAAI < 60% are likely belonging to (at least) distint genera. However, different exceptions to this simplistic rule have been shown (especially for POCP, e.g. Surech et al. 2019). Among the recommended approaches to determine a genus delineation cutoff, one can (i) look for a natural cutoff in the distribution of a large set of pairwise rAAI values (see e.g. Nicholson et al. 2020), or (ii) estimate a genus inflexion point in the plotting of rAAI vs. ProCov values (for example) estimated against a selected type species (for a similar approach, see Barco et al. 2020).

    • Family delineation. A family delineation cutoff of (two-way) AAI = 45% was suggested by Konstantinidis et al. (2017), but this cutoff was not assessed based on a large number of compared genomes.

    • Order delineation. Order delineation cutoffs of rAAI = 47%-50% were observed by Luo et al. (2014). However, as the distribution of the pairwise rAAI values between member of distinct bacterial orders overlaps with those related to the genus and the phylum, assessing orders based on rAAI is not recommended.

    • Class delineation. No class delineation for any OGRI was ever proposed.

    • Phylum delineation. A phylum delineation cutoff of rAAI = 40% was assessed by Luo et al. (2014). A comparable cutoff of rAAI = 40% can therefore be eventually considered when using OGRI.

    • Kingdom delineation. No kingdom delineation for any OGRI was ever proposed.

    • Domain delineation. A domain delineation cutoff of rAAI = 40% was observed by Luo et al. (2014). However, such a cutoff should be used with caution.

    Example

    In order to illustrate the usefulness of OGRI, the following use case example describes its usage for estimating pairwise similarity measures between 13 Enterobacteriaceae chromosomes, as published by Konstantinidis and Tiedje (2005a), as well as Goris et al. (2007).

    Downloading genome sequences

    Download the 13 chromosome sequence files using the following Bash command lines:

    URL="https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA";
    wget -q -O - $URL/000/008/865/GCA_000008865.2_ASM886v2/GCA_000008865.2_ASM886v2_genomic.fna.gz                         | gunzip -c | awk '/^>/{if(NR>1)exit}{print}' > 01.Escherichia.coli.O157.H7.Sakai.fasta ;
    wget -q -O - $URL/000/006/665/GCA_000006665.1_ASM666v1/GCA_000006665.1_ASM666v1_genomic.fna.gz                         | gunzip -c | awk '/^>/{if(NR>1)exit}{print}' > 02.Escherichia.coli.O157.H7.EDL933.fasta ;
    wget -q -O - $URL/000/273/425/GCA_000273425.1_Esch_coli_MG12655_V1/GCA_000273425.1_Esch_coli_MG12655_V1_genomic.fna.gz | gunzip -c | awk '/^>/{if(NR>1)exit}{print}' > 03.Escherichia.coli.K-12.MG1655.fasta ;
    wget -q -O - $URL/000/007/445/GCA_000007445.1_ASM744v1/GCA_000007445.1_ASM744v1_genomic.fna.gz                         | gunzip -c | awk '/^>/{if(NR>1)exit}{print}' > 04.Escherichia.coli.CFT073.fasta ;
    wget -q -O - $URL/000/007/405/GCA_000007405.1_ASM740v1/GCA_000007405.1_ASM740v1_genomic.fna.gz                         | gunzip -c | awk '/^>/{if(NR>1)exit}{print}' > 05.Shigella.flexneri.2a.2457T.fasta ;
    wget -q -O - $URL/000/006/925/GCA_000006925.2_ASM692v2/GCA_000006925.2_ASM692v2_genomic.fna.gz                         | gunzip -c | awk '/^>/{if(NR>1)exit}{print}' > 06.Shigella.flexneri.2a.301.fasta ;
    wget -q -O - $URL/000/006/945/GCA_000006945.2_ASM694v2/GCA_000006945.2_ASM694v2_genomic.fna.gz                         | gunzip -c | awk '/^>/{if(NR>1)exit}{print}' > 07.Salmonella.enterica.Typhimurium.LT2.fasta ;
    wget -q -O - $URL/000/007/545/GCA_000007545.1_ASM754v1/GCA_000007545.1_ASM754v1_genomic.fna.gz                         | gunzip -c | awk '/^>/{if(NR>1)exit}{print}' > 08.Salmonella.enterica.Typhi.Ty2.fasta ;
    wget -q -O - $URL/001/302/605/GCA_001302605.1_ASM130260v1/GCA_001302605.1_ASM130260v1_genomic.fna.gz                   | gunzip -c | awk '/^>/{if(NR>1)exit}{print}' > 09.Salmonella.enterica.Typhi.PM016.13.fasta ;
    wget -q -O - $URL/000/970/105/GCA_000970105.1_ASM97010v1/GCA_000970105.1_ASM97010v1_genomic.fna.gz                     | gunzip -c | awk '/^>/{if(NR>1)exit}{print}' > 11.Yersinia.pestis.KIM5.fasta ;
    wget -q -O - $URL/000/009/065/GCA_000009065.1_ASM906v1/GCA_000009065.1_ASM906v1_genomic.fna.gz                         | gunzip -c | awk '/^>/{if(NR>1)exit}{print}' > 12.Yersinia.pestis.CO92.fasta ;
    wget -q -O - $URL/000/009/345/GCA_000009345.1_ASM934v1/GCA_000009345.1_ASM934v1_genomic.fna.gz                         | gunzip -c | awk '/^>/{if(NR>1)exit}{print}' > 14.Yersinia.enterocolitica.8081.fasta ;
    wget -q -O - $URL/000/294/535/GCA_000294535.1_ASM29453v1/GCA_000294535.1_ASM29453v1_genomic.fna.gz                     | gunzip -c | awk '/^>/{if(NR>1)exit}{print}' > 15.Erwinia.carotovora.PCC21.fasta ;

    Note that each file is numbered according to the Enterics part of the Table S1 in Konstantinidis and Tiedje (2005a).

    Running OGRI to compare two genomes

    Of note, the following results were obtained using blast+ v2.14.1 and prodigal v2.6.3.

    Run the following command line to compare the two first Escherichia coli genomes (using 12 threads):

    OGRI_B.sh  -t 12  01.Escherichia.coli.O157.H7.Sakai.fasta  02.Escherichia.coli.O157.H7.EDL933.fasta

    After ~1 minute of calculations, OGRI displays the following results:

    [1/2]    [0%]----------+----------+----------+----------+----------[100%]
    [2/2]    [0%]----------+----------+----------+----------+----------[100%]
    
     Genome files
       GENO1             01.Escherichia.coli.O157.H7.Sakai.fasta
       GENO2             02.Escherichia.coli.O157.H7.EDL933.fasta
    
     Average Nucleotide Identity (Goris et al. 2007)
       nFRA1  (nFRA12)   5390 (5329)
       nFRA2  (nFRA21)   4766 (4749)
       cDNA12 (lgt1)     98.81 (5498578)
       cDNA21 (lgt2)     87.64 (5521804)
       ANI12  [95%CI]    99.90 [99.87-99.92]
       ANI21  [95%CI]    99.98 [99.97-99.99]
       ANI    [95%CI]    99.94 [99.92-99.95]
    
     OrthoANI (Lee et al. 2016)
       nfRBH             4446
       oANI   [95%CI]    99.97 [99.96-99.98]
    
     Percentage Of Conserved Proteins (Qin et al. 2014)
       nCDS1  (nCDS12)   5286 (4445)
       nCDS2  (nCDS21)   4246 (4231)
       POCP              91.01
    
     CDS-based ANI (Konstantinidis & Tiedje 2005)
       nCDS1  (cCDS12)   5286 (4319)
       nCDS2  (cCDS21)   4246 (4226)
       cANI12 [95%CI]    99.54 [99.46-99.63]
       cANI21 [95%CI]    99.97 [99.96-99.99]
       cANI   [95%CI]    99.75 [99.71-99.81]
    
     Whole-genome based ANI & Alignment Fraction (Varghese et al. 2015)
       ngRBH             4027
       gANI12 [95%CI]    99.62 [99.37-99.85]
       gANI21 [95%CI]    99.89 [99.83-99.95]
       gANI   [95%CI]    99.75 [99.60-99.90]
       AF12   [95%CI]    0.697 [0.684-0.711]
       AF21   [95%CI]    0.960 [0.942-0.979]
       AF     [95%CI]    0.828 [0.813-0.845]
    
     Average Amino-acid Identity (one-way; Konstantinidis & Tiedje 2005)
       nCDS1  (mCDS12)   5286 (5214)
       nCDS2  (mCDS21)   4246 (4230)
       AAI12  [95%CI]    99.74 [99.66-99.80]
       AAI21  [95%CI]    99.93 [99.88-99.97]
       AAI    [95%CI]    99.83 [99.77-99.88]
    
     Proteome Coverage (Kim et al. 2021) & rAAI (Nicholson et al. 2020)
       naRBH             4009
       ProCov            0.841
       rAAI   [95%CI]    99.97 [99.93-99.99]
    

    It can be observed that the ANI of E. coli O157:H7 Sakai (genome 1) against E. coli O157:H7 EDL933 (genome 2) is ANI12 = 99.90%, with a percentage of conserved DNA of cDNA12 = 98.81%. This values can be compared to the ones reported by Goris et al. (2007; Table 2): 99.68% and 99.6%, respectively. One also observes ANI21 = 99.98% and cDNA21 = 87.64%, whereas Goris et al. (2007) reported 99.63% and 97.3%, respectively. Such differences can be explained by the way each genome sequence is decomposed into consecutive fragments.

    The CDS-based ANI of E. coli O157:H7 EDL933 (genome 2) against E. coli O157:H7 Sakai (genome 1) is cANI21 = 99.97%, with 4226/4246=99.52% conserved genes. These values can be compared to the one reported by Konstantinidis and Tiedje (2005a; Table S1): 99.7% and 98.6%, respectively. Such differences can be explained by the different numbers of predicted CDS (i.e. nCDS1 = 5286 and nCDS2 = 4246), whereas Konstantinidis and Tiedje (2005a; Table S1) reported 5361 and 5324 CDS, respectively.

    Running OGRI to compare one genome against several ones

    To obtain more results against E. coli O157:H7 Sakai (genome 1), OGRI can be run on the whole set of downloaded genomes to display all metrics in tab-delimited format (option -r), e.g.

    OGRI_B.sh  -t 48  -r  *.fasta  2>/dev/null

    This command line leads to the following output:

    #GENO1                                   GENO2                                        lgt1    lgt2     nFRA1 nFRA2  nFRA12 nFRA21  cDNA12 cDNA21  ANI12 [CI_ANI12]     ANI21 [CI_ANI21]     ANI [CI_ANI]         nfRBH  oANI [CI_oANI]       nCDS1 nCDS2  nCDS12 nCDS21  POCP   cCDS12 cCDS21  cANI12 [CI_cANI12]   cANI21 [CI_cANI21]   cANI [CI_cANI]       ngRBH  gANI12 [CI_gANI12]   gANI21 [CI_gANI21]   gANI [CI_gANI]       AF12 [CI_AF12]       AF21 [CI_AF21]       AF [AF_CI]           mCDS12 mCDS21  AAI12 [CI_AAI12]     AAI21 [CI_AAI21]     AAI [CI_AAI]         naRBH  ProCov  rAAI [CI_rAAI]
    01.Escherichia.coli.O157.H7.Sakai.fasta  02.Escherichia.coli.O157.H7.EDL933.fasta     5498578 5521804  5390  4766   5329   4749    98.81  87.64   99.90 [99.87-99.92]  99.98 [99.97-99.99]  99.94 [99.92-99.95]  4446   99.97 [99.96-99.98]  5286  4246   4445   4231    91.01  4319   4226    99.54 [99.46-99.63]  99.97 [99.96-99.99]  99.75 [99.71-99.81]  4027   99.62 [99.37-99.85]  99.89 [99.83-99.95]  99.75 [99.60-99.90]  0.697 [0.684-0.711]  0.960 [0.942-0.979]  0.828 [0.813-0.845]  5214   4230    99.74 [99.66-99.80]  99.93 [99.88-99.97]  99.83 [99.77-99.88]  4009   0.841   99.97 [99.93-99.99]
    01.Escherichia.coli.O157.H7.Sakai.fasta  03.Escherichia.coli.K-12.MG1655.fasta        5498578 4638970  5390  4548   4021   3988    74.55  88.16   97.81 [97.68-97.92]  98.02 [97.92-98.11]  97.91 [97.80-98.01]  4013   98.05 [97.95-98.13]  5286  4295   4019   3923    82.89  3864   3807    97.88 [97.74-98.00]  98.03 [97.91-98.12]  97.95 [97.82-98.06]  3765   96.94 [96.56-97.30]  97.10 [96.80-97.37]  97.02 [96.68-97.33]  0.759 [0.744-0.774]  0.898 [0.882-0.916]  0.828 [0.813-0.845]  4080   3968    96.19 [95.76-96.54]  97.25 [96.96-97.54]  96.72 [96.36-97.04]  3760   0.784   98.61 [98.42-98.72]
    01.Escherichia.coli.O157.H7.Sakai.fasta  04.Escherichia.coli.CFT073.fasta             5498578 5231148  5390  5070   3946   3839    72.85  74.42   96.53 [96.39-96.68]  96.65 [96.50-96.77]  96.59 [96.44-96.72]  3877   96.77 [96.65-96.89]  5286  4799   4113   3976    80.20  3867   3765    96.72 [96.61-96.85]  96.78 [96.65-96.92]  96.75 [96.63-96.88]  3621   95.57 [95.11-96.01]  96.00 [95.71-96.28]  95.78 [95.41-96.14]  0.720 [0.705-0.736]  0.774 [0.758-0.790]  0.747 [0.731-0.763]  4255   4053    94.68 [94.26-95.05]  94.75 [94.26-95.12]  94.71 [94.26-95.08]  3644   0.722   97.67 [97.40-97.91]
    01.Escherichia.coli.O157.H7.Sakai.fasta  05.Shigella.flexneri.2a.2457T.fasta          5498578 4599326  5390  4507   3753   3764    68.97  84.82   97.12 [96.93-97.28]  97.75 [97.64-97.85]  97.43 [97.28-97.56]  3710   97.85 [97.75-97.94]  5286  4702   3922   4185    81.16  3553   3973    97.33 [97.15-97.48]  97.60 [97.46-97.74]  97.46 [97.30-97.61]  3552   93.01 [92.26-93.73]  97.00 [96.74-97.21]  95.00 [94.50-95.47]  0.711 [0.695-0.722]  0.826 [0.809-0.841]  0.768 [0.752-0.781]  3964   4252    94.19 [93.79-94.67]  96.94 [96.65-97.20]  95.56 [95.22-95.93]  3486   0.698   98.40 [98.23-98.55]
    01.Escherichia.coli.O157.H7.Sakai.fasta  06.Shigella.flexneri.2a.301.fasta            5498578 4607196  5390  4516   3735   3757    68.77  84.71   97.14 [96.98-97.31]  97.71 [97.57-97.82]  97.42 [97.27-97.56]  3715   97.83 [97.74-97.91]  5286  4715   3907   4207    81.13  3523   4000    97.32 [97.15-97.45]  97.58 [97.43-97.70]  97.45 [97.29-97.57]  3549   92.89 [92.28-93.50]  96.95 [96.75-97.18]  94.92 [94.51-95.34]  0.710 [0.695-0.727]  0.822 [0.805-0.840]  0.766 [0.750-0.783]  3951   4277    94.15 [93.73-94.59]  97.01 [96.75-97.24]  95.58 [95.24-95.91]  3485   0.696   98.39 [98.23-98.51]
    01.Escherichia.coli.O157.H7.Sakai.fasta  07.Salmonella.enterica.Typhimurium.LT2.fasta 5498578 4857450  5390  4762   2961   2924    3.10   3.56    80.19 [79.96-80.42]  80.30 [80.08-80.55]  80.24 [80.02-80.48]  3102   80.68 [80.45-80.91]  5286  4504   3493   3361    70.01  3109   3030    80.81 [80.58-81.01]  80.98 [80.78-81.22]  80.89 [80.68-81.11]  2922   80.54 [80.29-80.81]  80.20 [79.93-80.51]  80.37 [80.11-80.66]  0.591 [0.579-0.605]  0.674 [0.661-0.691]  0.632 [0.620-0.648]  3653   3497    82.45 [81.87-82.98]  83.49 [82.98-84.05]  82.97 [82.42-83.51]  3162   0.645   87.08 [86.67-87.44]
    01.Escherichia.coli.O157.H7.Sakai.fasta  08.Salmonella.enterica.Typhi.Ty2.fasta       5498578 4791950  5390  4695   2841   2864    3.02   3.44    80.42 [80.21-80.68]  80.32 [80.10-80.57]  80.37 [80.15-80.62]  3002   80.85 [80.62-81.10]  5286  4614   3326   3333    67.26  2941   3008    81.13 [80.91-81.37]  81.06 [80.86-81.28]  81.09 [80.88-81.32]  2872   79.75 [79.41-80.11]  80.29 [80.01-80.51]  80.02 [79.71-80.31]  0.581 [0.570-0.596]  0.669 [0.655-0.686]  0.625 [0.612-0.641]  3488   3462    82.79 [82.22-83.38]  83.73 [83.27-84.25]  83.26 [82.74-83.81]  3079   0.622   87.28 [86.84-87.72]
    01.Escherichia.coli.O157.H7.Sakai.fasta  09.Salmonella.enterica.Typhi.PM016.13.fasta  5498578 4793553  5390  4699   2834   2816    3.03   3.71    80.45 [80.19-80.63]  80.40 [80.17-80.64]  80.42 [80.18-80.63]  3008   80.84 [80.60-81.04]  5286  4625   3314   3322    66.95  2932   2999    81.16 [80.97-81.34]  81.10 [80.87-81.31]  81.13 [80.92-81.32]  2867   79.79 [79.48-80.11]  80.33 [80.04-80.59]  80.06 [79.76-80.35]  0.581 [0.565-0.592]  0.667 [0.651-0.681]  0.624 [0.608-0.636]  3478   3452    82.79 [82.15-83.41]  83.74 [83.13-84.20]  83.26 [82.64-83.80]  3070   0.619   87.33 [86.89-87.73]
    01.Escherichia.coli.O157.H7.Sakai.fasta  11.Yersinia.pestis.KIM5.fasta                5498578 4605437  5390  4515   1523   1507    0.59   0.75    71.92 [71.59-72.25]  71.91 [71.52-72.25]  71.91 [71.55-72.25]  2003   72.33 [72.07-72.60]  5286  4040   2542   2505    54.11  1721   1694    72.97 [72.65-73.24]  73.00 [72.70-73.24]  72.98 [72.67-73.24]  1328   71.78 [71.20-72.29]  71.85 [71.34-72.31]  71.81 [71.27-72.30]  0.284 [0.275-0.294]  0.355 [0.343-0.367]  0.319 [0.309-0.330]  2790   2630    68.72 [68.11-69.31]  70.01 [69.34-70.62]  69.36 [68.72-69.96]  2309   0.495   73.68 [73.17-74.12]
    01.Escherichia.coli.O157.H7.Sakai.fasta  12.Yersinia.pestis.CO92.fasta                5498578 4653728  5390  4562   1528   1507    0.59   0.59    71.92 [71.58-72.25]  72.04 [71.66-72.38]  71.98 [71.62-72.31]  1979   72.25 [71.99-72.52]  5286  4090   2555   2519    54.11  1726   1697    72.95 [72.70-73.23]  73.00 [72.76-73.24]  72.97 [72.73-73.23]  1333   71.68 [71.20-72.15]  71.83 [71.31-72.26]  71.75 [71.25-72.20]  0.285 [0.276-0.294]  0.352 [0.341-0.363]  0.318 [0.308-0.328]  2802   2645    68.73 [67.98-69.32]  69.93 [69.31-70.50]  69.33 [68.64-69.91]  2318   0.494   73.64 [73.09-74.15]
    01.Escherichia.coli.O157.H7.Sakai.fasta  14.Yersinia.enterocolitica.8081.fasta        5498578 4615899  5390  4525   1688   1654    0.60   0.73    71.80 [71.46-72.13]  71.79 [71.47-72.07]  71.79 [71.46-72.10]  2136   72.05 [71.74-72.30]  5286  4159   2738   2648    57.02  1870   1803    72.96 [72.72-73.19]  73.11 [72.84-73.36]  73.03 [72.78-73.27]  1425   72.21 [71.88-72.61]  71.87 [71.49-72.31]  72.04 [71.68-72.46]  0.306 [0.297-0.315]  0.377 [0.367-0.389]  0.341 [0.332-0.352]  2992   2811    68.86 [68.22-69.38]  70.10 [69.58-70.64]  69.48 [68.90-70.01]  2495   0.528   73.53 [73.04-74.02]
    01.Escherichia.coli.O157.H7.Sakai.fasta  15.Erwinia.carotovora.PCC21.fasta            5498578 4842771  5390  4747   1652   1641    0.73   0.91    73.16 [72.81-73.51]  72.93 [72.55-73.30]  73.04 [72.68-73.40]  2068   73.31 [73.02-73.57]  5286  4258   2559   2511    53.12  1808   1751    73.91 [73.66-74.14]  74.11 [73.84-74.39]  74.01 [73.75-74.26]  1462   72.99 [72.48-73.39]  72.80 [72.26-73.17]  72.89 [72.37-73.28]  0.315 [0.305-0.324]  0.363 [0.352-0.374]  0.339 [0.328-0.349]  2795   2707    69.04 [68.42-69.53]  69.31 [68.63-69.84]  69.17 [68.52-69.68]  2326   0.487   73.70 [73.16-74.16]
    Restricting fields in tab-delimited output

    The tab-separated format can be useful to restrict the output to some specific fields, e.g. cANI values:

    OGRI_B.sh  -t 48  -r  *.fasta  2>/dev/null  |  cut -f2,23-25

    The above command line leads to the following simplified output:

    GENO2                                        cANI12 [CI_cANI12]   cANI21 [CI_cANI21]   cANI [CI_cANI]
    02.Escherichia.coli.O157.H7.EDL933.fasta     99.54 [99.46-99.63]  99.97 [99.96-99.99]  99.75 [99.71-99.81]
    03.Escherichia.coli.K-12.MG1655.fasta        97.88 [97.74-98.00]  98.03 [97.91-98.12]  97.95 [97.82-98.06]
    04.Escherichia.coli.CFT073.fasta             96.72 [96.61-96.85]  96.78 [96.65-96.92]  96.75 [96.63-96.88]
    05.Shigella.flexneri.2a.2457T.fasta          97.33 [97.15-97.48]  97.60 [97.46-97.74]  97.46 [97.30-97.61]
    06.Shigella.flexneri.2a.301.fasta            97.32 [97.15-97.45]  97.58 [97.43-97.70]  97.45 [97.29-97.57]
    07.Salmonella.enterica.Typhimurium.LT2.fasta 80.81 [80.58-81.01]  80.98 [80.78-81.22]  80.89 [80.68-81.11]
    08.Salmonella.enterica.Typhi.Ty2.fasta       81.13 [80.91-81.37]  81.06 [80.86-81.28]  81.09 [80.88-81.32]
    09.Salmonella.enterica.Typhi.PM016.13.fasta  81.16 [80.97-81.34]  81.10 [80.87-81.31]  81.13 [80.92-81.32]
    11.Yersinia.pestis.KIM5.fasta                72.97 [72.65-73.24]  73.00 [72.70-73.24]  72.98 [72.67-73.24]
    12.Yersinia.pestis.CO92.fasta                72.95 [72.70-73.23]  73.00 [72.76-73.24]  72.97 [72.73-73.23]
    14.Yersinia.enterocolitica.8081.fasta        72.96 [72.72-73.19]  73.11 [72.84-73.36]  73.03 [72.78-73.27]
    15.Erwinia.carotovora.PCC21.fasta            73.91 [73.66-74.14]  74.11 [73.84-74.39]  74.01 [73.75-74.26]

    Each row can be compared to the values reported by Konstantinidis and Tiedje (2005a; Table S1, Enterics section): 99.7%, 97.2%, 95.9%, 96.5%, 96.4%, 79.9%, 80.2%, 80.2%, 71.5%, 71.5%, 82.1%, 72.1%, respectively. Therefore, it is likely that the penultimate reported value (i.e. 82.1%) is a typo in Table S1.

    Restricting computations

    Restricting the computations to the fragment-based pairwise measures (option -x) can be useful to significantly reduce the overall running times, e.g.

    OGRI_B.sh  -t 6  -r  -x  0[1-5].*.fasta  2>/dev/null  |  cut -f2,11-13

    The above command line leads to the following simplified output (i.e. restricted to ANI values):

    GENO2                                        ANI12 [CI_ANI12]     ANI21 [CI_ANI21]     ANI [CI_ANI]
    02.Escherichia.coli.O157.H7.EDL933.fasta     99.90 [99.87-99.92]  99.98 [99.97-99.99]  99.94 [99.92-99.95]
    03.Escherichia.coli.K-12.MG1655.fasta        97.81 [97.68-97.92]  98.02 [97.92-98.11]  97.91 [97.80-98.01]
    04.Escherichia.coli.CFT073.fasta             96.53 [96.39-96.68]  96.65 [96.50-96.77]  96.59 [96.44-96.72]
    05.Shigella.flexneri.2a.2457T.fasta          97.12 [96.93-97.28]  97.75 [97.64-97.85]  97.43 [97.28-97.56]
    06.Shigella.flexneri.2a.301.fasta            97.14 [96.98-97.31]  97.71 [97.57-97.82]  97.42 [97.27-97.56]
    07.Salmonella.enterica.Typhimurium.LT2.fasta 80.19 [79.96-80.42]  80.30 [80.08-80.55]  80.24 [80.02-80.48]
    08.Salmonella.enterica.Typhi.Ty2.fasta       80.42 [80.21-80.68]  80.32 [80.10-80.57]  80.37 [80.15-80.62]
    09.Salmonella.enterica.Typhi.PM016.13.fasta  80.45 [80.19-80.63]  80.40 [80.17-80.64]  80.42 [80.18-80.63]
    11.Yersinia.pestis.KIM5.fasta                71.92 [71.59-72.25]  71.91 [71.52-72.25]  71.91 [71.55-72.25]
    12.Yersinia.pestis.CO92.fasta                71.92 [71.58-72.25]  72.04 [71.66-72.38]  71.98 [71.62-72.31]
    14.Yersinia.enterocolitica.8081.fasta        71.80 [71.46-72.13]  71.79 [71.47-72.07]  71.79 [71.46-72.10]
    15.Erwinia.carotovora.PCC21.fasta            73.16 [72.81-73.51]  72.93 [72.55-73.30]  73.04 [72.68-73.40]

    Each of the four first rows (genomes 2-5) can be compared to the values reported by Goris et al. (2007; Table 2, Escherichia/Shigella hybridization group), i.e. ANI12: 99.68%, 97.53%, 96.00%, 97.36%, respectively, and ANI21: 99.63%, 97.25%, 95.85%, 96.54%, respectively.

    Practical usage

    As the recommended OGRIs are oANI and rAAI (see Methods), the following command line can be useful in many cases:

    OGRI_B.sh  -t 48  -r  -z  *.fasta  2>/dev/null   |  cut -f2,5-8,16-18

    The above command line leads to the following output:

    GENO2                                        nfRBH  oANI [CI_oANI]       nCDS1 nCDS2  naRBH  ProCov  rAAI [CI_rAAI]
    02.Escherichia.coli.O157.H7.EDL933.fasta     4446   99.97 [99.96-99.98]  5286  4246   4009   0.841   99.97 [99.93-99.99]
    03.Escherichia.coli.K-12.MG1655.fasta        4013   98.05 [97.95-98.13]  5286  4295   3760   0.784   98.61 [98.42-98.72]
    04.Escherichia.coli.CFT073.fasta             3877   96.77 [96.65-96.89]  5286  4799   3644   0.722   97.67 [97.40-97.91]
    05.Shigella.flexneri.2a.2457T.fasta          3710   97.85 [97.75-97.94]  5286  4702   3486   0.698   98.40 [98.23-98.55]
    06.Shigella.flexneri.2a.301.fasta            3715   97.83 [97.74-97.91]  5286  4715   3485   0.696   98.39 [98.23-98.51]
    07.Salmonella.enterica.Typhimurium.LT2.fasta 3102   80.68 [80.45-80.91]  5286  4504   3162   0.645   87.08 [86.67-87.44]
    08.Salmonella.enterica.Typhi.Ty2.fasta       3002   80.85 [80.62-81.10]  5286  4614   3079   0.622   87.28 [86.84-87.72]
    09.Salmonella.enterica.Typhi.PM016.13.fasta  3008   80.84 [80.60-81.04]  5286  4625   3070   0.619   87.33 [86.89-87.73]
    11.Yersinia.pestis.KIM5.fasta                2003   72.33 [72.07-72.60]  5286  4040   2309   0.495   73.68 [73.17-74.12]
    12.Yersinia.pestis.CO92.fasta                1979   72.25 [71.99-72.52]  5286  4090   2318   0.494   73.64 [73.09-74.15]
    14.Yersinia.enterocolitica.8081.fasta        2136   72.05 [71.74-72.30]  5286  4159   2495   0.528   73.53 [73.04-74.02]
    15.Erwinia.carotovora.PCC21.fasta            2068   73.31 [73.02-73.57]  5286  4258   2326   0.487   73.70 [73.16-74.16]

    These results suggest that the genomes 2-6 belong to the same species as E. coli O157:H7 Sakai (genome 1), contrary to the genomes 7-15 (i.e. oANI < 95% and rAAI < 95%). The rAAI values for the two Yersinia pestis genomes (11-12) can be compared to (two-way) AAI = 72% observed by Konstantinidis and Tiedje (2005b) between pairs of E. coli and Y. pestis genomes.

    References

    Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. Journal of Molecular Biology, 215(3):403-410. doi:10.1016/S0022-2836(05)80360-2

    Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research, 25(17):3389-3402. doi:10.1093/nar/25.17.3389

    Barco RA, Garrity GM, Scott JJ, Amend JP, Nealson KH, Emerson D (2020) A Genus Definition for Bacteria and Archaea Based on a Standard Genome Relatedness Index. mBio, 11:e02475-19. doi:10.1128/mBio.02475-19

    Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL (2008) BLAST+: architecture and applications. BMC Bioinformatics, 10:421. doi:10.1186/1471-2105-10-421

    Chun J, Rainey FA (2014) Integrating genomics into the taxonomy and systematics of the Bacteria and Archaea. International Journal of Systematic and Evolutionary Biology, 64(Pt_2):316-324. doi:10.1099/ijs.0.054171-0

    Gertz EM, Yu Y-K, Agarwala R, Schäffer AA, Altschul SF (2006) Composition-based statistics and translated nucleotide searches: Improving the TBLASTN module of BLAST. BMC Biology, 4:41. doi:10.1186/1741-7007-4-41

    Goris J, Konstantinidis KT, Klappenbach JA, Coenye T, Vandamme P, Tiedje JM (2007) DNA-DNA hybridization values and their relationship to whole-genome sequence similarities. International Journal of Systematic and Evolutionary Biology, 57(1):81-91. doi:10.1099/ijs.0.64483-0

    Hyatt D, Chen G-L, LoCascio PF, Land ML, Larimer FW, Hauser LJ (2010) Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics, 11:119. doi:10.1186%2F1471-2105-11-119

    Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S (2018) High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nature Communications, 9:5114. doi:10.1038/s41467-018-07641-9

    Kim D, Park S, Chun J (2021) Introducing EzAAI: a pipeline for high throughput calculations of prokaryotic average amino acid identity. Journal of Microbiology, 59(5):476-480. doi:10.1007/s12275-021-1154-0

    Konstantinidis KT, Tiedje JM (2005a) Genomic insights that advance the species definition for prokaryotes. Proceedings of the National Academy of Sciences of the United States of America, 102(7):2567-2572. doi:/10.1073/pnas.0409727102

    Konstantinidis KT, Tiedje JM (2005b) Towards a Genome-Based Taxonomy for Prokaryotes. Journal of Bacteriology, 187(18):6258-6264. doi:10.1128/JB.187.18.6258-6264.2005

    Konstantinidis KT, Rossello-Mora R, Amann R (2017) Uncultivated microbes in need of their own taxonomy. The ISME Journal, 11:2399-2406. doi:10.1038/ismej.2017.113

    Lee I, Kim YO, Park S-C, Chun J (2016) OrthoANI: An improved algorithm and software for calculating average nucleotide identity. International Journal of Systematic and Evolutionary Biology, 66(2):1100-1103. doi:10.1099/ijsem.0.000760

    Luo C, Rodriguez-R LM, Konstantinidis KT (2014) MyTaxa: an advanced taxonomic classifier for genomic and metagenomic sequences. Nucleic Acids Research, 42(8):e73. doi:10.1093/nar/gku169

    Nicholson AC, Gulvik CA, Whitney AM, Humrighouse BW, Bell ME, Holmes B, Steigerwalt AG, Villarma A, Sheth M, Batra D, Rowe LA, Burroughs M, Pryor JC, Bernardet J-F, Hugo C, Kämpfer P, Newman JD, McQuiston JR (2020) Division of the genus Chryseobacterium: Observation of discontinuities in amino acid identity values, a possible consequence of major extinction events, guides transfer of nine species to the genus Epilithonimonas, eleven species to the genus Kaistella, and three species to the genus Halpernia gen. nov., with description of Kaistella daneshvariae sp. nov. and Epilithonimonas vandammei sp. nov. derived from clinical specimens. International Journal of Systematic and Evolutionary Biology, 70:4432-4450. doi:10.1099/ijsem.0.003935

    Novichkov V, Kaznadzey A, Alexandrova N, Kaznadzey D (2016) NSimScan: DNA comparison tool with increased speed, sensitivity and accuracy. Bioinformatics, 32(15):2380-2381. doi:10.1093/bioinformatics/btw126

    Palmer M, Steenkamp ET, Blom J, Hedlund BP, Venter SN (2020) All ANIs are not created equal: implications for prokaryotic species boundaries and integration of ANIs into polyphasic taxonomy. International Journal of Systematic and Evolutionary Biology, 70(4):2937-2948. doi:10.1099/ijsem.0.004124

    Qin Q-L, Xie B-B, Zhang X-Y, Chen X-L, Zhou B-C, Zhou J, Oren A, Zhang Y-Z (2014) A Proposed Genus Boundary for the Prokaryotes Based on Genomic Insights. Journal of Bacteriology, 196(12):2210-2215. doi:10.1128/JB.01688-14

    Richter M, Rossello-Mora R (2009) Shifting the genomic gold standard for the prokaryotic species definition. Proceedings of the National Academy of Sciences of the United States of America, 106(45):19126-19131. doi:10.1073/pnas.0906412106

    Rodriguez-R LM, Konstantinidis KT (2014) Bypassing cultivation to identify bacterial species. Microbe, 9(3):111-118. pdf(https://www.researchgate.net/profile/Luis-Rodriguez-R/publication/304587401_Bypassing_Cultivation_To_Identify_Bacterial_Species_Culture-independent_genomic_approaches_identify_credibly_distinct_clusters_avoid_cultivation_bias_and_provide_true_insights_into_microbial_species/links/58c18324aca272e36dcc8314/Bypassing-Cultivation-To-Identify-Bacterial-Species-Culture-independent-genomic-approaches-identify-credibly-distinct-clusters-avoid-cultivation-bias-and-provide-true-insights-into-microbial-species.pdf)

    Suresh G, Lodha TD, Indu B, Sasikala C, Ramana CV (2019) Taxogenomics Resolves Conflict in the Genus Rhodobacter: A Two and Half Decades Pending Thought to Reclassify the Genus Rhodobacter. Frontiers in Microbiology, 10:2480. doi:10.3389/fmicb.2019.02480

    Varghese NJ, Mukherjee S, Ivanova N, Konstantinidis KT, Mavrommatis K, Kyrpides NC, Pati A (2015) Microbial species delineation using whole genome sequences. Nucleic Acids Research, 43(14):6761-6771. doi:10.1093/nar/gkv657

    Yoon S-H, Ha S-M, Lim J, Kwon S, Chun J (2017) A large-scale evaluation of algorithms to calculate average nucleotide identity. Antonie van Leeuwenhoek, 110(10):1281-1286. doi:10.1007/s10482-017-0844-4

    Zhang Z, Schwartz S, Wagner L, Miller W (2000) A greedy algorithm for aligning DNA sequences. Journal of Computational Biology, 7(1-2):203-214. doi:10.1089/10665270050081478