GenoLayout
GenoLayout is a command line program written in Bash to create SVG figures that represent linear maps (showing conserved orthologous fragments) between genomes.
GenoLayout tool runs on UNIX, Linux and most OS X operating systems.
Dependencies
You will need to install the required programs listed in the following table, or to verify that they are already installed with the required version.
program | package | version | sources |
---|---|---|---|
gawk | - | > 4.0.0 | ftp.gnu.org/gnu/gawk |
makeblastdb blastn |
blast+ | ≥ 2.12.0 | ftp.ncbi.nlm.nih.gov/blast/executables/blast+ |
Installation and execution
Clone this repository with the following command line:
git clone https://gitlab.pasteur.fr/GIPhy/GenoLayout.git
Go to the directory GenoLayout/
to give the execute permission to the file:
cd GenoLayout/
chmod +x GenoLayout.sh
and run it with the following command line model:
./GenoLayout.sh [options]
If at least one of the indicated programs (see Dependencies) is not available on your $PATH
variable (or if one compiled binary has a different default name), GenoLayout will exit with an error message (when the requisite programs are missing).
To set a required program that is not available on your $PATH
variable, edit the file and indicate the local path to the corresponding binary(ies) within the code block REQUIREMENTS
.
Usage
Run GenoLayout without option to read the following documentation:
USAGE: GenoLayout [OPTIONS] <fasta1> <fasta2> <fasta3> [<fasta4> ...]
OPTIONS:
-o <file> SVG outfile name (mandatory)
-w <int> window size (bp; default: 1000)
-k <int> blastn word size (bp; default: 25)
-j <int> draw every j lines (default: 1)
-a <string> font color (default: ghostwhite)
-b <string> box color (default: midnightblue)
-c <string> line color (default: tomato)
-d <int> contig delimiter width (px; default: 1)
-s <real> span (width) factor (default: 1.0)
-x <int> box height (px; default: auto)
-y <int> gap height (px; default: auto)
-z <int> font size (px; default: auto)
-t <int> number of threads (default: 2)
-h prints this help and exits
Notes
-
In brief, for each pair of genomes, GenoLayout decomposes each of them into overlapping fragments. Fragment length is set by option
-w
(default: 1000). Each set of fragments is searched against the other using blastn (Altschul et al. 1990; Camacho et al. 2008) with tuned parameters (as suggested by Goris et al. 2007). Orthologous fragments are assessed by reciprocal BLAST hits showing ≥ 30 % overall fragment identity on an alignable region ≥ 35% fragment lengths (as suggested by Lee et al. 2016). Each genome is graphically represented by a box, and each pair of orthologous fragments is represented by a line. -
Each input file should be in FASTA format, not compressed, and may contain nucleotide sequences. At least two input files should be specified. If more than two files are specified, the file order is followed to draw the pairwise linear maps.
-
Faster running times can be obtained in three ways: (i) by using large BLAST k-mer lengths (option
-k
; default: 25), but at the cost of a reduced accuracy; (ii) by using larger fragment lengths (option-w
; default: 1000), but at the cost of a reduced number of orthologous fragment pairs; (iii) by using a large number of threads (option-t
; default: 2; recommended: 12). -
For distantly-related genomes (e.g. expected average nucleotide identity ≤ 80%), it is recommended to use short k-mers for BLAST searches (e.g.
-k 11
). -
Figure width is automatically determined from both the maximum genome length g and the fragment length w (option
-w
). However, a span factor s (option-s
; default: 1.0) can be used to increase (s > 1.0) or decrease (s < 1.0) the figure width. The overall figure width (in px) is: 100 + 2 g × s ∕ w. Of note, a final scale factor of 0.1 is applied to the overall figure dimensions. -
Figure height can be controlled using options
-x
(genome box height, in px) and-y
(gap height between boxes where the lines are drawn, in px). Default values (in px) are x = 0.1 g ∕ w, y = 5 x and z = 0.75 x, respectively. When inputing n files, the overall figure height (in px) is: 100 + n x + (n − 1) y. Of note, a final scale factor of 0.1 is applied to the overall figure dimensions. -
Font (i.e. file names), box (i.e. genome) and line (i.e. orthologous fragments) colors can be modified using options
-a
,-b
and-c
, respectively. Color names should correspond to the SVG specification. -
To reduce the size of the output file (and sometimes obtain a better reading), it is possible to draw a periodic subset of lines between genome boxes using option
-j
. For example, setting-j 2
draws only half lines (i.e. one every two) and enables to divide the file size by two, without significantly altering the final render. Of note, as some conversion tools can lead to thicker lines (e.g. Inkscape, rsvg-convert), setting option-j
with a somewhat large value (e.g.-j 9
) can help obtaining clearer figures in e.g. PDF or PNG format.
Examples
The directory example/ contains different SVG files created by GenoLayout from Klebsiella genomes.
The five genome files can be downloaded with the following command lines:
EUTILS="https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&rettype=fasta&id=";
t="K.pneumoniae"; s="MGH78578"; a="CP000647"; wget -q -O $t.$s.fasta $EUTILS$a ;
t="K.pneumoniae"; s="NTUH-K2044"; a="AP006725"; wget -q -O $t.$s.fasta $EUTILS$a ;
t="K.quasivariicola"; s="KPN1705"; a="CP022823"; wget -q -O $t.$s.fasta $EUTILS$a ;
t="K.variicola"; s="342"; a="CP000964"; wget -q -O $t.$s.fasta $EUTILS$a ;
t="K.variicola"; s="At-22"; a="CP001891"; wget -q -O $t.$s.fasta $EUTILS$a ;
• Default parameters on the two K. pneumoniae genomes, using 12 threads:
GenoLayout.sh -t 12 -o genolayout.svg K.pneumoniae*.fasta
• Same as above, with 3 times less lines:
GenoLayout.sh -t 12 -j 3 -o genolayout-j3.svg K.pneumoniae*.fasta
• Same as above, but 1.5 wider:
GenoLayout.sh -t 12 -j 3 -s 1.5 -o genolayout-j3-s1.5.svg K.pneumoniae*.fasta
• Same as above on the five genomes, with other colors:
GenoLayout.sh -t 48 -j 3 -s 1.5 -a gold -b black -c steelblue -o genolayout-j3-s1.5-abc.svg K.*.fasta
• Alternative color scheme, and more details using windows of size 500 bps:
GenoLayout.sh -t 48 -t 48 -w 500 -j 2 -z 300 -a black -b snow -c darkmagenta -o genolayout-w500-j2-z300-abc.svg K.*.fasta
• Alternative representation without names:
GenoLayout.sh -t 48 -w 500 -j 2 -x 50 -y 4000 -z 0 -b black -c darkorange -o genolayout-w500-j2-x50-y4000-z0-bc.svg K.*.fasta
• Converting SVG files
Inkscape can be used to convert SVG files into e.g. PDF or PNG files:
inkscape --export-png=genolayout.png genolayout.svg
inkscape --export-pdf=genolayout.pdf genolayout.svg
Alternatively, rsvg-convert (from librsvg) can also be used:
rsvg-convert -f png -o genolayout.png genolayout.svg
rsvg-convert -f pdf -o genolayout.pdf genolayout.svg
References
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. Journal of Molecular Biology, 215(3):403-410. doi:10.1016/S0022-2836(05)80360-2
Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL (2008) BLAST+: architecture and applications. BMC Bioinformatics, 10:421. doi:10.1186/1471-2105-10-421
Goris J, Konstantinidis KT, Klappenbach JA, Coenye T, Vandamme P, Tiedje JM (2007) DNA-DNA hybridization values and their relationship to whole-genome sequence similarities. International Journal of Systematic and Evolutionary Biology, 57(1):81-91. doi:10.1099/ijs.0.64483-0
Lee I, Kim YO, Park S-C, Chun J (2016) OrthoANI: An improved algorithm and software for calculating average nucleotide identity. International Journal of Systematic and Evolutionary Biology, 66(2):1100-1103. doi:10.1099/ijsem.0.000760