Skip to content
Snippets Groups Projects
Alexis  CRISCUOLO's avatar
Alexis CRISCUOLO authored
cd56aa62
History
Name Last commit Last update
db
example
ASSU.sh
COPYING
README.md

GPLv3 license Bash

ASSU

ASSU (ASSembling SSU) is a command line tool written in Bash to carry out the reference-guided assembly of small subunit (SSU) 16S ribosomal ribonucleic acid (rRNA) using short high-throughput sequencing (HTS) reads derived from whole genome) sequencing of bacteria or archaea strains.

This tool was developed to compensate the failure of several de novo assembly programs to assemble (at least one) non-fragmented SSU segment when the sequenced genome contains different 16S rRNA copies with sequence variation, especially when using short HTS reads.

Dependencies

You will need to install the required programs and tools listed in the following tables, or to verify that they are already installed with the required version.

Mandatory programs
Optional programs
program package version sources
bzip2 - > 1.0.0 sourceware.org/bzip2/downloads.html
DSRC - ≥ 2.0 github.com/refresh-bio/DSRC
pigz - ≥ 2.4 github.com/madler/pigz
Standard GNU packages and utilities
program package version sources
echo
head
fold
paste
tail
tr
coreutils > 8.0 ftp.gnu.org/gnu/coreutils
gunzip
zgrep
gzip > 1.0 ftp.gnu.org/gnu/gzip
bc - > 1.0 ftp.gnu.org/gnu/bc
gawk - > 4.0.0 ftp.gnu.org/gnu/gawk
grep - > 2.0 ftp.gnu.org/gnu/bc
sed - > 4.2 ftp.gnu.org/gnu/bc

Installation and execution

A. Clone this repository with the following command line:

git clone https://gitlab.pasteur.fr/GIPhy/ASSU.git

B. Go to the created directory and give the execute permission to the file ASSU.sh:

cd ASSU/ 
chmod +x ASSU.sh

C. Check the dependencies (and their version) using the following command line:

./ASSU.sh  -c

D. If at least one of the required program (see Dependencies) is not available on your $PATH variable (or if one compiled binary has a different default name), it should be manually specified. To specify the location of a specific binary, edit the file ASSU.sh and indicate the local path to the corresponding binary(ies) within the code block REQUIREMENTS (approximately lines 60-110). For each required program, the table below reports the corresponding variable assignment instruction to edit (if needed) within the code block REQUIREMENTS

program variable assignment program variable assignment
bwa-mem2 BWAMEM2_BIN=bwa-mem2; gunzip GUNZIP_BIN=gunzip;
bzip2 BZIP2_BIN=bzip2; pigz PIGZ_BIN=pigz;
DSRC DSRC_BIN=dsrc; samtools SAMTOOLS_BIN=samtools;
gawk GAWK_BIN=gawk; zgrep ZGREP_BIN=zgrep;

E. Execute ASSU with the following command line model:

./ASSU.sh  [options]  <infile>  [<infile> ...]

F. ASSU also requires a databank of reference SSU sequences. By default, a version of this databank is provided inside the directory db/ as a file named SSUdb.gz (see details in SSUdb.version.txt). However, a more recent version can be quickly built using the provided script makeSSUdb.sh with the following command line:

./makeSSUdb.sh

After a few seconds, a new SSU databank file named SSUdb.gz will be automatically created from the NCBI RefSeq Targeted Loci Project. Note that the previous command line will overwrite the provided version of SSUdb.gz when run in the same directory.

Usage

Run ASSU without option to read the following documentation:

 USAGE:  ASSU  [options]  <infile> [<infile> ...]

 OPTIONS:
  -d <file>    SSU databank file (default: db/SSUdb.gz in the same directory as ASSU)
  -p <string>  restricts the  SSU databank  to the  specified  (extended regex)  pattern
               (default: none)
  -o <string>  output FASTA-formatted SSU sequence file name (default: ssu.fasta)
  -O <string>  writes the selected  reads into the  specified FASTQ-formatted  file name
               (default: none)
  -l <int>     minimum sequence length (default: 1000)
  -L <int>     minimum read length (default: AUTO)
  -Q <int>     minimum base Phred quality value (default: 20)
  -M <int>     minimum mapping Phred quality value (default: 20)
  -D <int>     minimum coverage depth (default: 50)
  -F <float>   minimum proportion of the majority base to infer that base (default: 0.8)
  -A <float>   minimum ratio of the alternative  base(s) to the majority one to add that
               base(s) to the consensus (default: 0.2)
  -N           set N when multiple bases at a consensus position (default: not set)
  -w <dir>     path to the tmp directory (default: $TMPDIR, otherwise /tmp)
  -t <int>     thread numbers (default: 2)
  -v           verbose mode
  -s           prints the content of the SSU databank and exit
  -c           checks dependencies and exit
  -h           prints this help and exit

 EXAMPLES:
  ASSU  -t 24  -o 16s.fasta  fwd.fastq.gz  rev.fastq.gz  sgl.fastq.gz
  ASSU  -d SSUdb.gz  -O 16s.fastq  -p "Devosia limi"  -L 75  -v  *.fastq
  ASSU  -p "Citrobacter|Escherichia|Shigella"  -N  -v  hts.fastq.bz2

Notes

  • In brief, ASSU first quickly aligns the specified HTS reads against all the reference sequences available in the SSU databank using bwa-mem2. This first step enables to determine the most suited reference sequence (called model), as well as the subset of HTS reads that arise from SSU genome regions. Next, every HTS read from the subset is accurately aligned against the model sequence, and the resulting alignments are processed by samtools to build a final (consensus) sequence.

  • ASSU requires at least one HTS read file. Input file(s) should be in FASTQ format and can be compressed using gzip, bzip2 or DSRC (Roguski and Deorowicz 2014). Note that input files compressed using bzip2 or DSRC require the associated decompression tool to be read (see Dependencies).

  • ASSU is not working with long HTS reads, as bwa-mem2 is not developed to align HTS reads on significantly shorter reference sequences. The source code of ASSU can be easily modified (on request) to deal with such a case, but long HTS reads generally lead to complete SSU segments via de novo assembly.

  • By default, ASSU expects that the SSU databank file SSUdb.gz is located in the directory db/. However, an alternative SSU databank file (e.g. different version, different file name) can be specified using option -d. The content of the specified SSU databank can be summarized using option -s.

  • The running time of ASSU is very dependent on the size of the input files, but faster running times can be obtained using multiple threads (option -t) and/or a temporary directory located on a hard drive with high speed (option -w).

  • The assembled sequence is written in FASTA format into an output file (option -o; default name: ssu.fasta). Optionally, the selected HTS reads can be written in FASTQ format into a specified output file (option -O).

  • The selection of the model sequence can be oriented/forced by using the option -p to set a(n extended-regex) pattern (e.g. accessions, genus, species). It is recommended to specify the pattern between quotation marks.

  • As the assembled SSU sequence is often the consensus of several copies with sequence variation within the sequenced genome (e.g. Větrovský and Baldrian 2013), it may contain ambiguous positions resulting from the consensus of different sequenced bases at those positions. In such cases, degenerated nucleotides are used to represent the consensus of different character states (see e.g. Table 1 in Johnson 2010), or lowercase characters when a deletion (i.e. gap) is involved in the consensus. Note that every degenerated nucleotide can be replaced by the character state N using option -N.

  • The (number of) ambiguous positions can be slightly modified by considering shorter HTS reads (option -L), putative sequencing errors (option -Q), weak alignments (option -M), low coverage depth (option -D) or alternative model sequence (option -p). The consensus definition can be modified by tuning the two options -F and -A, corresponding to the options --call-fract and --het-fract of samtools consensus (mode simple), respectively.

  • No output file is written in several situations:
      • insufficient coverage depth (default: at least 50×; option -D),
      • too short assembled SSU sequence (default: at least 1,000 bps; option -l),
      • too many ambiguous positions (i.e. more than 5%).

Example

In order to illustrate the usefulness of ASSU, the following example describes its usage for assembling the 16S rRNA (consensus) segment of Escherichia coli O113:H21 strain FWSEC0011. Its genome assembly (GCF_005171095.1) consists of one chromosome (NZ_CP031892.1) and one plasmid (NZ_CP031893.1), built from short and long HTS reads (SRS3815841).

Downloading input files

Paired-end sequencing of this genome was performed using Illumina Miseq, and the resulting pair of (compressed) FASTQ files (225 Mb and 249 Mb, respectively) can be downloaded using the following command lines:

wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR789/009/SRR7896249/SRR7896249_1.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR789/009/SRR7896249/SRR7896249_2.fastq.gz

Running ASSU

Use the following command line to run ASSU on these two FASTQ files using 12 threads:

./ASSU.sh  -t 12  -o FWSEC0011.ssu.fasta  -v  SRR7896249_*.fastq.gz

Note that the SSU databank used for this assembly is the version 2024-02-18 (20,404 sequences). As the verbose mode was set (option -v), this command line leads to the following output:

# ASSU v1.1
# Copyright (C) 2024 Institut Pasteur
+ https://gitlab.pasteur.fr/GIPhy/ASSU
> Syst: x86_64-redhat-linux-gnu
> Bash: 4.4.20(1)-release
> SSUdb: /local/bin/ASSU/db/SSUdb.gz
> SSUdb v2024-02-18 (20404 sequences)
[00:00] checking input files ... [ok]
+ SRR7896249_1.fastq.gz
+ SRR7896249_2.fastq.gz
[00:00] creating tmp directory .... [ok]
> TMP_DIR=/tmp/ASSU.uYf5cUoa6R
[00:01] examining SSU databank ...... [ok]
> model:  Bacteria  |  Escherichia fergusonii  |  NR_074902.1  |  1542 bps
[00:27] building SSU sequence .... [ok]
> 3016 selected reads (903953 bases; lgt > 269)
> coverage depth: 586x
> 1543 bps (ambiguous bases: 14)
              10        20        30        40        50        60        70        80        90       100
               |         |         |         |         |         |         |         |         |         |
    1 AAATTGAAGAGTTTGATCATGGCTCAGATTGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAACGGTAACAGRAARCAGCTTGCTGYTTYGCTGACG
                                                                                 *  *          *  *       
  101 AGTGGCGGACGGGTGAGTAATGTCTGGGAAACTGCCTGATGGAGGGGGATAACTACTGGAAACGGTAGCTAATACCGCATAACGTCGCAAGACCAAAGAG
                                                                                                          
  201 GGGGACCTTCGGGCCTCTTGCCATCGGATGTGCCCAGATGGGATTAGCTWGTWGGTGGGGTAACGGCTCACCWAGGCGACGATCCCTAGCTGGTCTGAGA
                                                       *  *                   *                           
  301 GGATGACCAGCCACACTGGAACTGAGACACGGTCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGCACAATGGGCGCAAGCCTGATGCAGCCATGC
                                                                                                          
  401 CGCGTGTATGAAGAAGGCCTTCGGGTTGTAAAGTACTTTCAGCGGGGAGGAAGGGAGTAAAGTTAATACCTTTGCTCATTGACGTTACCCGCAGAAGAAG
                                                                                                          
  501 CACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCACGCAGGCGGTTTGTTAAGTCA
                                                                                                          
  601 GATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATCTGATACTGGCAAGCTTGAGTCTCGTAGAGGGGGGTAGAATTCCAGGTGTAGCGGTGAAATGCG
                                                                                                          
  701 TAGAGATCTGGAGGAATACCGGTGGCGAAGGCGGCCCCCTGGACGAAGACTGACGCTCAGGTGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGG
                                                                                                          
  801 TAGTCCACGCCGTAAACGATGTCGACTTGGAGGTTGTGCCCTTGAGGCGTGGCTTCCGGAGCTAACGCGTTAAGTCGACCGCCTGGGGAGTACGGCCGCA
                                                                                                          
  901 AGGTTAAAACTCAAATGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGATGCAACGCGAAGAACCTTACCTGGTCTTGACATCCA
                                                                                                          
 1001 CRGAASTTTYCAGAGATGaGAWTgGTGCCTTCGGGAACYGTGAGACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTTGTGAAATGTTGGGTTAAGTCCCG
       *   *   *        *  * *              *                                                             
 1101 CAACGAGCGCAACCCTTATCCTTTGTTGCCAGCGGTCCGGCCGGGAACTCAAAGGAGACTGCCAGTGATAAACTGGAGGAAGGTGGGGATGACGTCAAGT
                                                                                                          
 1201 CATCATGGCCCTTACGACCAGGGCTACACACGTGCTACAATGGCGCATACAAAGAGAAGCGACCTCGCGAGAGCAAGCGGACCTCATAAAGTGCGTCGTA
                                                                                                          
 1301 GTCCGGATTGGAGTCTGCAACTCGACTCCATGAAGTCGGAATCGCTAGTAATCGTGGATCAGAATGCCACGGTGAATACGTTCCCGGGCCTTGTACACAC
                                                                                                          
 1401 CGCCCGTCACACCATGGGAGTGGGTTGCAAAAGAAGTAGGTAGCTTAACCTTCGGGAGGGCGCTTACCACTTTGTGATTCATGACTGGGGTGAAGTCGTA
                                                                                                          
 1501 ACAAGGTAACCGTAGGGGAACCTGCGGTTGGATCACCTCCTTA
                                                 
[00:30] writing output file ... [ok]
+ FASTA: FWSEC0011.ssu.fasta
[00:30] exit

The SSU sequence NR_074902.1 was selected as a model to carry out the reference-guided assembly using 3,016 HTS reads, leading to an assembled (consensus) sequence of length 1,543 bps (coverage depth: 586×) written into the FASTA file FWSEC0011.ssu.fasta. The overall running time was < 30 seconds.

The assembled SSU sequence contains 14 ambiguous bases, highlighted with a * in the above output. This suggests that the genome of E. coli O113:H21 strain FWSEC0011 contains different 16S rRNA copies with sequence variations.

In fact, its chromosome (NZ_CP031892.1) contains seven 16S rRNA segments labeled with the following locus tags:
  • C8202_RS02200
  • C8202_RS06240
  • C8202_RS19645
  • C8202_RS23325
  • C8202_RS23525
  • C8202_RS24135
  • C8202_RS24625

Below is represented a multiple sequence alignment (MSA) of these seven 16S rRNA segments together with the assembled SSU sequence, showing that the 14 ambiguous bases (*) reflects as expected the variability between the different copies.

                10        20        30        40        50        60        70        80        90       100
                 |         |         |         |         |         |         |         |         |         |
SSU     AAATTGAAGAGTTTGATCATGGCTCAGATTGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAACGGTAACAGRAARCAGCTTGCTGYTTYGCTGACG
        |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||*||*||||||||||*||*|||||||
RS02200 AAATTGAAGAGTTTGATCATGGCTCAGATTGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAACGGTAACAGGAAACAGCTTGCTGTTTCGCTGACG
RS06240 AAATTGAAGAGTTTGATCATGGCTCAGATTGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAACGGTAACAGGAAACAGCTTGCTGTTTCGCTGACG
RS19645 AAATTGAAGAGTTTGATCATGGCTCAGATTGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAACGGTAACAGGAAGCAGCTTGCTGCTTCGCTGACG
RS23325 AAATTGAAGAGTTTGATCATGGCTCAGATTGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAACGGTAACAGAAAGCAGCTTGCTGCTTTGCTGACG
RS23525 AAATTGAAGAGTTTGATCATGGCTCAGATTGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAACGGTAACAGAAAGCAGCTTGCTGCTTTGCTGACG
RS24135 AAATTGAAGAGTTTGATCATGGCTCAGATTGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAACGGTAACAGGAAGCAGCTTGCTGCTTCGCTGACG
RS24625 AAATTGAAGAGTTTGATCATGGCTCAGATTGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAACGGTAACAGGAAGCAGCTTGCTGCTTTGCTGACG

               110       120       130       140       150       160       170       180       190       200
                 |         |         |         |         |         |         |         |         |         |
SSU     AGTGGCGGACGGGTGAGTAATGTCTGGGAAACTGCCTGATGGAGGGGGATAACTACTGGAAACGGTAGCTAATACCGCATAACGTCGCAAGACCAAAGAG
        ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
RS02200 AGTGGCGGACGGGTGAGTAATGTCTGGGAAACTGCCTGATGGAGGGGGATAACTACTGGAAACGGTAGCTAATACCGCATAACGTCGCAAGACCAAAGAG
RS06240 AGTGGCGGACGGGTGAGTAATGTCTGGGAAACTGCCTGATGGAGGGGGATAACTACTGGAAACGGTAGCTAATACCGCATAACGTCGCAAGACCAAAGAG
RS19645 AGTGGCGGACGGGTGAGTAATGTCTGGGAAACTGCCTGATGGAGGGGGATAACTACTGGAAACGGTAGCTAATACCGCATAACGTCGCAAGACCAAAGAG
RS23325 AGTGGCGGACGGGTGAGTAATGTCTGGGAAACTGCCTGATGGAGGGGGATAACTACTGGAAACGGTAGCTAATACCGCATAACGTCGCAAGACCAAAGAG
RS23525 AGTGGCGGACGGGTGAGTAATGTCTGGGAAACTGCCTGATGGAGGGGGATAACTACTGGAAACGGTAGCTAATACCGCATAACGTCGCAAGACCAAAGAG
RS24135 AGTGGCGGACGGGTGAGTAATGTCTGGGAAACTGCCTGATGGAGGGGGATAACTACTGGAAACGGTAGCTAATACCGCATAACGTCGCAAGACCAAAGAG
RS24625 AGTGGCGGACGGGTGAGTAATGTCTGGGAAACTGCCTGATGGAGGGGGATAACTACTGGAAACGGTAGCTAATACCGCATAACGTCGCAAGACCAAAGAG

               210       220       230       240       250       260       270       280       290       300
                 |         |         |         |         |         |         |         |         |         |
SSU     GGGGACCTTCGGGCCTCTTGCCATCGGATGTGCCCAGATGGGATTAGCTWGTWGGTGGGGTAACGGCTCACCWAGGCGACGATCCCTAGCTGGTCTGAGA
        |||||||||||||||||||||||||||||||||||||||||||||||||*||*|||||||||||||||||||*|||||||||||||||||||||||||||
RS02200 GGGGACCTTCGGGCCTCTTGCCATCGGATGTGCCCAGATGGGATTAGCTAGTAGGTGGGGTAACGGCTCACCTAGGCGACGATCCCTAGCTGGTCTGAGA
RS06240 GGGGACCTTCGGGCCTCTTGCCATCGGATGTGCCCAGATGGGATTAGCTAGTAGGTGGGGTAACGGCTCACCTAGGCGACGATCCCTAGCTGGTCTGAGA
RS19645 GGGGACCTTCGGGCCTCTTGCCATCGGATGTGCCCAGATGGGATTAGCTAGTAGGTGGGGTAACGGCTCACCTAGGCGACGATCCCTAGCTGGTCTGAGA
RS23325 GGGGACCTTCGGGCCTCTTGCCATCGGATGTGCCCAGATGGGATTAGCTTGTTGGTGGGGTAACGGCTCACCAAGGCGACGATCCCTAGCTGGTCTGAGA
RS23525 GGGGACCTTCGGGCCTCTTGCCATCGGATGTGCCCAGATGGGATTAGCTTGTTGGTGGGGTAACGGCTCACCAAGGCGACGATCCCTAGCTGGTCTGAGA
RS24135 GGGGACCTTCGGGCCTCTTGCCATCGGATGTGCCCAGATGGGATTAGCTTGTTGGTGGGGTAACGGCTCACCAAGGCGACGATCCCTAGCTGGTCTGAGA
RS24625 GGGGACCTTCGGGCCTCTTGCCATCGGATGTGCCCAGATGGGATTAGCTTGTTGGTGGGGTAACGGCTCACCAAGGCGACGATCCCTAGCTGGTCTGAGA

               310       320       330       340       350       360       370       380       390       400
                 |         |         |         |         |         |         |         |         |         |
SSU     GGATGACCAGCCACACTGGAACTGAGACACGGTCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGCACAATGGGCGCAAGCCTGATGCAGCCATGC
        ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
RS02200 GGATGACCAGCCACACTGGAACTGAGACACGGTCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGCACAATGGGCGCAAGCCTGATGCAGCCATGC
RS06240 GGATGACCAGCCACACTGGAACTGAGACACGGTCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGCACAATGGGCGCAAGCCTGATGCAGCCATGC
RS19645 GGATGACCAGCCACACTGGAACTGAGACACGGTCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGCACAATGGGCGCAAGCCTGATGCAGCCATGC
RS23325 GGATGACCAGCCACACTGGAACTGAGACACGGTCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGCACAATGGGCGCAAGCCTGATGCAGCCATGC
RS23525 GGATGACCAGCCACACTGGAACTGAGACACGGTCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGCACAATGGGCGCAAGCCTGATGCAGCCATGC
RS24135 GGATGACCAGCCACACTGGAACTGAGACACGGTCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGCACAATGGGCGCAAGCCTGATGCAGCCATGC
RS24625 GGATGACCAGCCACACTGGAACTGAGACACGGTCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGCACAATGGGCGCAAGCCTGATGCAGCCATGC

               410       420       430       440       450       460       470       480       490       500
                 |         |         |         |         |         |         |         |         |         |
SSU     CGCGTGTATGAAGAAGGCCTTCGGGTTGTAAAGTACTTTCAGCGGGGAGGAAGGGAGTAAAGTTAATACCTTTGCTCATTGACGTTACCCGCAGAAGAAG
        ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
RS02200 CGCGTGTATGAAGAAGGCCTTCGGGTTGTAAAGTACTTTCAGCGGGGAGGAAGGGAGTAAAGTTAATACCTTTGCTCATTGACGTTACCCGCAGAAGAAG
RS06240 CGCGTGTATGAAGAAGGCCTTCGGGTTGTAAAGTACTTTCAGCGGGGAGGAAGGGAGTAAAGTTAATACCTTTGCTCATTGACGTTACCCGCAGAAGAAG
RS19645 CGCGTGTATGAAGAAGGCCTTCGGGTTGTAAAGTACTTTCAGCGGGGAGGAAGGGAGTAAAGTTAATACCTTTGCTCATTGACGTTACCCGCAGAAGAAG
RS23325 CGCGTGTATGAAGAAGGCCTTCGGGTTGTAAAGTACTTTCAGCGGGGAGGAAGGGAGTAAAGTTAATACCTTTGCTCATTGACGTTACCCGCAGAAGAAG
RS23525 CGCGTGTATGAAGAAGGCCTTCGGGTTGTAAAGTACTTTCAGCGGGGAGGAAGGGAGTAAAGTTAATACCTTTGCTCATTGACGTTACCCGCAGAAGAAG
RS24135 CGCGTGTATGAAGAAGGCCTTCGGGTTGTAAAGTACTTTCAGCGGGGAGGAAGGGAGTAAAGTTAATACCTTTGCTCATTGACGTTACCCGCAGAAGAAG
RS24625 CGCGTGTATGAAGAAGGCCTTCGGGTTGTAAAGTACTTTCAGCGGGGAGGAAGGGAGTAAAGTTAATACCTTTACTCATTGACGTTACCCGCAGAAGAAG

               510       520       530       540       550       560       570       580       590       600
                 |         |         |         |         |         |         |         |         |         |
SSU     CACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCACGCAGGCGGTTTGTTAAGTCA
        ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
RS02200 CACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCACGCAGGCGGTTTGTTAAGTCA
RS06240 CACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCACGCAGGCGGTTTGTTAAGTCA
RS19645 CACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCACGCAGGCGGTTTGTTAAGTCA
RS23325 CACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCACGCAGGCGGTTTGTTAAGTCA
RS23525 CACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCACGCAGGCGGTTTGTTAAGTCA
RS24135 CACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCACGCAGGCGGTTTGTTAAGTCA
RS24625 CACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCACGCAGGCGGTTTGTTAAGTCA

               610       620       630       640       650       660       670       680       690       700
                 |         |         |         |         |         |         |         |         |         |
SSU     GATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATCTGATACTGGCAAGCTTGAGTCTCGTAGAGGGGGGTAGAATTCCAGGTGTAGCGGTGAAATGCG
        ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
RS02200 GATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATCTGATACTGGCAAGCTTGAGTCTCGTAGAGGGGGGTAGAATTCCAGGTGTAGCGGTGAAATGCG
RS06240 GATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATCTGATACTGGCAAGCTTGAGTCTCGTAGAGGGGGGTAGAATTCCAGGTGTAGCGGTGAAATGCG
RS19645 GATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATCTGATACTGGCAAGCTTGAGTCTCGTAGAGGGGGGTAGAATTCCAGGTGTAGCGGTGAAATGCG
RS23325 GATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATCTGATACTGGCAAGCTTGAGTCTCGTAGAGGGGGGTAGAATTCCAGGTGTAGCGGTGAAATGCG
RS23525 GATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATCTGATACTGGCAAGCTTGAGTCTCGTAGAGGGGGGTAGAATTCCAGGTGTAGCGGTGAAATGCG
RS24135 GATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATCTGATACTGGCAAGCTTGAGTCTCGTAGAGGGGGGTAGAATTCCAGGTGTAGCGGTGAAATGCG
RS24625 GATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATCTGATACTGGCAAGCTTGAGTCTCGTAGAGGGGGGTAGAATTCCAGGTGTAGCGGTGAAATGCG

               710       720       730       740       750       760       770       780       790       800
                 |         |         |         |         |         |         |         |         |         |
SSU     TAGAGATCTGGAGGAATACCGGTGGCGAAGGCGGCCCCCTGGACGAAGACTGACGCTCAGGTGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGG
        ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
RS02200 TAGAGATCTGGAGGAATACCGGTGGCGAAGGCGGCCCCCTGGACGAAGACTGACGCTCAGGTGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGG
RS06240 TAGAGATCTGGAGGAATACCGGTGGCGAAGGCGGCCCCCTGGACGAAGACTGACGCTCAGGTGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGG
RS19645 TAGAGATCTGGAGGAATACCGGTGGCGAAGGCGGCCCCCTGGACGAAGACTGACGCTCAGGTGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGG
RS23325 TAGAGATCTGGAGGAATACCGGTGGCGAAGGCGGCCCCCTGGACGAAGACTGACGCTCAGGTGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGG
RS23525 TAGAGATCTGGAGGAATACCGGTGGCGAAGGCGGCCCCCTGGACGAAGACTGACGCTCAGGTGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGG
RS24135 TAGAGATCTGGAGGAATACCGGTGGCGAAGGCGGCCCCCTGGACGAAGACTGACGCTCAGGTGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGG
RS24625 TAGAGATCTGGAGGAATACCGGTGGCGAAGGCGGCCCCCTGGACGAAGACTGACGCTCAGGTGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGG

               810       820       830       840       850       860       870       880       890       900
                 |         |         |         |         |         |         |         |         |         |
SSU     TAGTCCACGCCGTAAACGATGTCGACTTGGAGGTTGTGCCCTTGAGGCGTGGCTTCCGGAGCTAACGCGTTAAGTCGACCGCCTGGGGAGTACGGCCGCA
        ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
RS02200 TAGTCCACGCCGTAAACGATGTCGACTTGGAGGTTGTGCCCTTGAGGCGTGGCTTCCGGAGCTAACGCGTTAAGTCGACCGCCTGGGGAGTACGGCCGCA
RS06240 TAGTCCACGCCGTAAACGATGTCGACTTGGAGGTTGTGCCCTTGAGGCGTGGCTTCCGGAGCTAACGCGTTAAGTCGACCGCCTGGGGAGTACGGCCGCA
RS19645 TAGTCCACGCCGTAAACGATGTCGACTTGGAGGTTGTGCCCTTGAGGCGTGGCTTCCGGAGCTAACGCGTTAAGTCGACCGCCTGGGGAGTACGGCCGCA
RS23325 TAGTCCACGCCGTAAACGATGTCGACTTGGAGGTTGTGCCCTTGAGGCGTGGCTTCCGGAGCTAACGCGTTAAGTCGACCGCCTGGGGAGTACGGCCGCA
RS23525 TAGTCCACGCCGTAAACGATGTCGACTTGGAGGTTGTGCCCTTGAGGCGTGGCTTCCGGAGCTAACGCGTTAAGTCGACCGCCTGGGGAGTACGGCCGCA
RS24135 TAGTCCACGCCGTAAACGATGTCGACTTGGAGGTTGTGCCCTTGAGGCGTGGCTTCCGGAGCTAACGCGTTAAGTCGACCGCCTGGGGAGTACGGCCGCA
RS24625 TAGTCCACGCCGTAAACGATGTCGACTTGGAGGTTGTGCCCTTGAGGCGTGGCTTCCGGAGCTAACGCGTTAAGTCGACCGCCTGGGGAGTACGGCCGCA

               900       920       930       940       950       960       970       980       990      1000
                 |         |         |         |         |         |         |         |         |         |
SSU     AGGTTAAAACTCAAATGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGATGCAACGCGAAGAACCTTACCTGGTCTTGACATCCA
        ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
RS02200 AGGTTAAAACTCAAATGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGATGCAACGCGAAGAACCTTACCTGGTCTTGACATCCA
RS06240 AGGTTAAAACTCAAATGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGATGCAACGCGAAGAACCTTACCTGGTCTTGACATCCA
RS19645 AGGTTAAAACTCAAATGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGATGCAACGCGAAGAACCTTACCTGGTCTTGACATCCA
RS23325 AGGTTAAAACTCAAATGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGATGCAACGCGAAGAACCTTACCTGGTCTTGACATCCA
RS23525 AGGTTAAAACTCAAATGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGATGCAACGCGAAGAACCTTACCTGGTCTTGACATCCA
RS24135 AGGTTAAAACTCAAATGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGATGCAACGCGAAGAACCTTACCTGGTCTTGACATCCA
RS24625 AGGTTAAAACTCAAATGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGATGCAACGCGAAGAACCTTACCTGGTCTTGACATCCA

              1010      1020      1030      1040      1050      1060      1070      1080      1090      1100
                 |         |         |         |         |         |         |         |         |         |
SSU     CRGAASTTTYCAGAGATGaGAWTgGTGCCTTCGGGAACYGTGAGACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTTGTGAAATGTTGGGTTAAGTCCCG
        |*|||*|||*||||||||*||*|*||||||||||||||*|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
RS02200 CAGAACTTTCCAGAGATG-GATTGGTGCCTTCGGGAACTGTGAGACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTTGTGAAATGTTGGGTTAAGTCCCG
RS06240 CAGAACTTTCCAGAGATG-GATTGGTGCCTTCGGGAACTGTGAGACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTTGTGAAATGTTGGGTTAAGTCCCG
RS19645 CGGAAGTTTTCAGAGATGAGAAT-GTGCCTTCGGGAACCGTGAGACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTTGTGAAATGTTGGGTTAAGTCCCG
RS23325 CGGAAGTTTTCAGAGATGAGAAT-GTGCCTTCGGGAACCGTGAGACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTTGTGAAATGTTGGGTTAAGTCCCG
RS23525 CGGAAGTTTTCAGAGATGAGAAT-GTGCCTTCGGGAACCGTGAGACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTTGTGAAATGTTGGGTTAAGTCCCG
RS24135 CGGAAGTTTTCAGAGATGAGAAT-GTGCCTTCGGGAACCGTGAGACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTTGTGAAATGTTGGGTTAAGTCCCG
RS24625 CGGAAGTTTTCAGAGATGAGAAT-GTGCCTTCGGGAACCGTGAGACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTTGTGAAATGTTGGGTTAAGTCCCG

              1110      1120      1130      1140      1150      1160      1170      1180      1190      1200
                 |         |         |         |         |         |         |         |         |         |
SSU     CAACGAGCGCAACCCTTATCCTTTGTTGCCAGCGGTCCGGCCGGGAACTCAAAGGAGACTGCCAGTGATAAACTGGAGGAAGGTGGGGATGACGTCAAGT
        ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
RS02200 CAACGAGCGCAACCCTTATCCTTTGTTGCCAGCGGTCCGGCCGGGAACTCAAAGGAGACTGCCAGTGATAAACTGGAGGAAGGTGGGGATGACGTCAAGT
RS06240 CAACGAGCGCAACCCTTATCCTTTGTTGCCAGCGGTCCGGCCGGGAACTCAAAGGAGACTGCCAGTGATAAACTGGAGGAAGGTGGGGATGACGTCAAGT
RS19645 CAACGAGCGCAACCCTTATCCTTTGTTGCCAGCGGTCCGGCCGGGAACTCAAAGGAGACTGCCAGTGATAAACTGGAGGAAGGTGGGGATGACGTCAAGT
RS23325 CAACGAGCGCAACCCTTATCCTTTGTTGCCAGCGGTCCGGCCGGGAACTCAAAGGAGACTGCCAGTGATAAACTGGAGGAAGGTGGGGATGACGTCAAGT
RS23525 CAACGAGCGCAACCCTTATCCTTTGTTGCCAGCGGTCCGGCCGGGAACTCAAAGGAGACTGCCAGTGATAAACTGGAGGAAGGTGGGGATGACGTCAAGT
RS24135 CAACGAGCGCAACCCTTATCCTTTGTTGCCAGCGGTCCGGCCGGGAACTCAAAGGAGACTGCCAGTGATAAACTGGAGGAAGGTGGGGATGACGTCAAGT
RS24625 CAACGAGCGCAACCCTTATCCTTTGTTGCCAGCGGTCCGGCCGGGAACTCAAAGGAGACTGCCAGTGATAAACTGGAGGAAGGTGGGGATGACGTCAAGT

              1210      1220      1230      1240      1250      1260      1270      1280      1290      1300
                 |         |         |         |         |         |         |         |         |         |
SSU     CATCATGGCCCTTACGACCAGGGCTACACACGTGCTACAATGGCGCATACAAAGAGAAGCGACCTCGCGAGAGCAAGCGGACCTCATAAAGTGCGTCGTA
        ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
RS02200 CATCATGGCCCTTACGACCAGGGCTACACACGTGCTACAATGGCGCATACAAAGAGAAGCGACCTCGCGAGAGCAAGCGGACCTCATAAAGTGCGTCGTA
RS06240 CATCATGGCCCTTACGACCAGGGCTACACACGTGCTACAATGGCGCATACAAAGAGAAGCGACCTCGCGAGAGCAAGCGGACCTCATAAAGTGCGTCGTA
RS19645 CATCATGGCCCTTACGACCAGGGCTACACACGTGCTACAATGGCGCATACAAAGAGAAGCGACCTCGCGAGAGCAAGCGGACCTCATAAAGTGCGTCGTA
RS23325 CATCATGGCCCTTACGACCAGGGCTACACACGTGCTACAATGGCGCATACAAAGAGAAGCGACCTCGCGAGAGCAAGCGGACCTCATAAAGTGCGTCGTA
RS23525 CATCATGGCCCTTACGACCAGGGCTACACACGTGCTACAATGGCGCATACAAAGAGAAGCGACCTCGCGAGAGCAAGCGGACCTCATAAAGTGCGTCGTA
RS24135 CATCATGGCCCTTACGACCAGGGCTACACACGTGCTACAATGGCGCATACAAAGAGAAGCGACCTCGCGAGAGCAAGCGGACCTCATAAAGTGCGTCGTA
RS24625 CATCATGGCCCTTACGACCAGGGCTACACACGTGCTACAATGGCGCATACAAAGAGAAGCGACCTCGCGAGAGCAAGCGGACCTCATAAAGTGCGTCGTA

              1310      1320      1330      1340      1350      1360      1370      1380      1390      1400
                 |         |         |         |         |         |         |         |         |         |
SSU     GTCCGGATTGGAGTCTGCAACTCGACTCCATGAAGTCGGAATCGCTAGTAATCGTGGATCAGAATGCCACGGTGAATACGTTCCCGGGCCTTGTACACAC
        ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
RS02200 GTCCGGATTGGAGTCTGCAACTCGACTCCATGAAGTCGGAATCGCTAGTAATCGTGGATCAGAATGCCACGGTGAATACGTTCCCGGGCCTTGTACACAC
RS06240 GTCCGGATTGGAGTCTGCAACTCGACTCCATGAAGTCGGAATCGCTAGTAATCGTGGATCAGAATGCCACGGTGAATACGTTCCCGGGCCTTGTACACAC
RS19645 GTCCGGATTGGAGTCTGCAACTCGACTCCATGAAGTCGGAATCGCTAGTAATCGTGGATCAGAATGCCACGGTGAATACGTTCCCGGGCCTTGTACACAC
RS23325 GTCCGGATTGGAGTCTGCAACTCGACTCCATGAAGTCGGAATCGCTAGTAATCGTGGATCAGAATGCCACGGTGAATACGTTCCCGGGCCTTGTACACAC
RS23525 GTCCGGATTGGAGTCTGCAACTCGACTCCATGAAGTCGGAATCGCTAGTAATCGTGGATCAGAATGCCACGGTGAATACGTTCCCGGGCCTTGTACACAC
RS24135 GTCCGGATTGGAGTCTGCAACTCGACTCCATGAAGTCGGAATCGCTAGTAATCGTGGATCAGAATGCCACGGTGAATACGTTCCCGGGCCTTGTACACAC
RS24625 GTCCGGATTGGAGTCTGCAACTCGACTCCATGAAGTCGGAATCGCTAGTAATCGTGGATCAGAATGCCACGGTGAATACGTTCCCGGGCCTTGTACACAC

              1410      1410      1430      1440      1450      1460      1470      1480      1490      1500
                 |         |         |         |         |         |         |         |         |         |
SSU     CGCCCGTCACACCATGGGAGTGGGTTGCAAAAGAAGTAGGTAGCTTAACCTTCGGGAGGGCGCTTACCACTTTGTGATTCATGACTGGGGTGAAGTCGTA
        ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
RS02200 CGCCCGTCACACCATGGGAGTGGGTTGCAAAAGAAGTAGGTAGCTTAACCTTCGGGAGGGCGCTTACCACTTTGTGATTCATGACTGGGGTGAAGTCGTA
RS06240 CGCCCGTCACACCATGGGAGTGGGTTGCAAAAGAAGTAGGTAGCTTAACCTTCGGGAGGGCGCTTACCACTTTGTGATTCATGACTGGGGTGAAGTCGTA
RS19645 CGCCCGTCACACCATGGGAGTGGGTTGCAAAAGAAGTAGGTAGCTTAACCTTCGGGAGGGCGCTTACCACTTTGTGATTCATGACTGGGGTGAAGTCGTA
RS23325 CGCCCGTCACACCATGGGAGTGGGTTGCAAAAGAAGTAGGTAGCTTAACCTTCGGGAGGGCGCTTACCACTTTGTGATTCATGACTGGGGTGAAGTCGTA
RS23525 CGCCCGTCACACCATGGGAGTGGGTTGCAAAAGAAGTAGGTAGCTTAACCTTCGGGAGGGCGCTTACCACTTTGTGATTCATGACTGGGGTGAAGTCGTA
RS24135 CGCCCGTCACACCATGGGAGTGGGTTGCAAAAGAAGTAGGTAGCTTAACCTTCGGGAGGGCGCTTACCACTTTGTGATTCATGACTGGGGTGAAGTCGTA
RS24625 CGCCCGTCACACCATGGGAGTGGGTTGCAAAAGAAGTAGGTAGCTTAACCTTCGGGAGGGCGCTTACCACTTTGTGATTCATGACTGGGGTGAAGTCGTA

              1510      1520      1530      1540   
                 |         |         |         |   
SSU     ACAAGGTAACCGTAGGGGAACCTGCGGTTGGATCACCTCCTTA
        |||||||||||||||||||||||||||||||||||||||||||
RS02200 ACAAGGTAACCGTAGGGGAACCTGCGGTTGGATCACCTCCTTA
RS06240 ACAAGGTAACCGTAGGGGAACCTGCGGTTGGATCACCTCCTTA
RS19645 ACAAGGTAACCGTAGGGGAACCTGCGGTTGGATCACCTCCTTA
RS23325 ACAAGGTAACCGTAGGGGAACCTGCGGTTGGATCACCTCCTTA
RS23525 ACAAGGTAACCGTAGGGGAACCTGCGGTTGGATCACCTCCTTA
RS24135 ACAAGGTAACCGTAGGGGAACCTGCGGTTGGATCACCTCCTTA
RS24625 ACAAGGTAACCGTAGGGGAACCTGCGGTTGGATCACCTCCTTA

As the HTS reads arise from an E. coli genome, ASSU can also be run by using option -p to specify this species as model (don't forget to use quotation marks when specifying a multiple word pattern with option -p):

./ASSU.sh  -t 12  -p "Escherichia coli"  -v  SRR7896249_*.fastq.gz

This command line leads to the following output:

# ASSU v1.1
# Copyright (C) 2024 Institut Pasteur
+ https://gitlab.pasteur.fr/GIPhy/ASSU
> Syst: x86_64-redhat-linux-gnu
> Bash: 4.4.20(1)-release
> SSUdb: /local/bin/ASSU/db/SSUdb.gz
> SSUdb v2024-02-18 (20404 sequences)
[00:00] checking input files ... [ok]
+ SRR7896249_1.fastq.gz
+ SRR7896249_2.fastq.gz
[00:00] creating tmp directory .... [ok]
> TMP_DIR=/tmp/ASSU.TxJar4O6ZP
[00:00] examining SSU databank ...... [ok]
> selection pattern: Escherichia coli
> model:  Bacteria  |  Escherichia coli  |  NR_114042.1  |  1467 bps
[00:25] building SSU sequence .... [ok]
> 3016 selected reads (903953 bases; lgt > 269)
> coverage depth: 616x
> 1468 bps (ambiguous bases: 14)
              10        20        30        40        50        60        70        80        90       100
               |         |         |         |         |         |         |         |         |         |
    1 ATTGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAACGGTAACAGRAARCAGCTTGCTGYTTYGCTGACGAGTGGCGGACGGGTGAGTAATGTCTGG
                                                      *  *          *  *                                  
  101 GAAACTGCCTGATGGAGGGGGATAACTACTGGAAACGGTAGCTAATACCGCATAACGTCGCAAGACCAAAGAGGGGGACCTTCGGGCCTCTTGCCATCGG
                                                                                                          
  201 ATGTGCCCAGATGGGATTAGCTWGTWGGTGGGGTAACGGCTCACCWAGGCGACGATCCCTAGCTGGTCTGAGAGGATGACCAGCCACACTGGAACTGAGA
                            *  *                   *                                                      
  301 CACGGTCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGCACAATGGGCGCAAGCCTGATGCAGCCATGCCGCGTGTATGAAGAAGGCCTTCGGGTT
                                                                                                          
  401 GTAAAGTACTTTCAGCGGGGAGGAAGGGAGTAAAGTTAATACCTTTGCTCATTGACGTTACCCGCAGAAGAAGCACCGGCTAACTCCGTGCCAGCAGCCG
                                                                                                          
  501 CGGTAATACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCACGCAGGCGGTTTGTTAAGTCAGATGTGAAATCCCCGGGCTCAACCTGG
                                                                                                          
  601 GAACTGCATCTGATACTGGCAAGCTTGAGTCTCGTAGAGGGGGGTAGAATTCCAGGTGTAGCGGTGAAATGCGTAGAGATCTGGAGGAATACCGGTGGCG
                                                                                                          
  701 AAGGCGGCCCCCTGGACGAAGACTGACGCTCAGGTGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGTCGACT
                                                                                                          
  801 TGGAGGTTGTGCCCTTGAGGCGTGGCTTCCGGAGCTAACGCGTTAAGTCGACCGCCTGGGGAGTACGGCCGCAAGGTTAAAACTCAAATGAATTGACGGG
                                                                                                          
  901 GGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGATGCAACGCGAAGAACCTTACCTGGTCTTGACATCCACRGAASTTTYCAGAGATGaGAWTgGTG
                                                                                *   *   *        *  * *   
 1001 CCTTCGGGAACYGTGAGACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTTGTGAAATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTATCCTTTGTT
                 *                                                                                        
 1101 GCCAGCGGTCCGGCCGGGAACTCAAAGGAGACTGCCAGTGATAAACTGGAGGAAGGTGGGGATGACGTCAAGTCATCATGGCCCTTACGACCAGGGCTAC
                                                                                                          
 1201 ACACGTGCTACAATGGCGCATACAAAGAGAAGCGACCTCGCGAGAGCAAGCGGACCTCATAAAGTGCGTCGTAGTCCGGATTGGAGTCTGCAACTCGACT
                                                                                                          
 1301 CCATGAAGTCGGAATCGCTAGTAATCGTGGATCAGAATGCCACGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACACCATGGGAGTGGGTTG
                                                                                                          
 1401 CAAAAGAAGTAGGTAGCTTAACCTTCGGGAGGGCGCTTACCACTTTGTGATTCATGACTGGGGTGAAG
                                                                          
[00:29] writing output file ... [ok]
+ FASTA: ssu.fasta
[00:29] exit

As expected, ASSU assembles a similar 16S rRNA sequence using E. coli as a model (e.g. same ambiguous positions). However, as the E. coli model sequence (NR_114042.1; 1,467 bps) from the SSU databank is shorter than the E. fergusonii one (NR_074902.1; 1,542 bps), the last assembled SSU sequence (1,468 bps) is also shorter than the previously assembled one (1,543 bps).

References

Johnson AD (2010) An extended IUPAC nomenclature code for polymorphic nucleic acids. Bioinformatics, 26(10):1386-1389. doi:10.1093/bioinformatics/btq098.

Roguski L, Deorowicz S (2014) DSRC 2: Industry-oriented compression of FASTQ files. Bioinformatics, 30(15):2213-2215. doi:10.1093/bioinformatics/btu208.

Větrovský T, Baldrian P (2013) The Variability of the 16S rRNA Gene in Bacterial Genomes and Its Consequences for Bacterial Community Analyses. PLoS One, 8(2):e57923. doi:10.1371/journal.pone.0057923.

Citations

Kämpfer P, Glaeser SP, McInroy JA, Busse H-J, Clermont D, Criscuolo A (2024) Description of Cohnella rhizoplanae sp. nov., isolated from the root surface of soybean (Glycine max). Antonie van Leeuwenhoek, 118:41. doi:10.1007/s10482-024-02051-y