@@ -59,6 +59,7 @@ Run _SimiPlot_ without option to read the following documentation:
-X <int> x-axis end (default: reference length)
-y <int> y-axis start (default: 0)
-Y <int> y-axis end (default: 100)
-d <int> dot size factor (default: 1.0)
-a <real> aspect ratio (detault: 3.0)
-t <int> number of threads (default: 2)
-h prints this help and exits
...
...
@@ -66,17 +67,17 @@ Run _SimiPlot_ without option to read the following documentation:
## Notes
* For each input file, _SimiPlot_ decomposes the nucleotide sequence(s) into overlapping fragments (step = half the fragment length). Fragment length is set by option `-w` (default: reference sequence length divided by 1,000). Each fragment is searched against the reference sequence (option `-r`) using blastn (Altschul et al. 1990; Camacho et al. 2008) with tuned parameters (as suggested by Goris et al. 2007). For each fragment, only the best BLAST hit is considered (E-value threshold = 0.5). All BLAST hits are graphically represented as a scatter plot, where _x_ is the hit BLAST position within the reference, _y_ is the percentage of similarity, and the dot radius is proportional to the aligned part of the fragment.
* For each non-reference input file, _SimiPlot_ decomposes the nucleotide sequence(s) into overlapping equal-length fragments (step = half the fragment length). Each fragment is searched against the reference sequence (option `-r`) using blastn (Altschul et al. 1990; Camacho et al. 2008) with tuned parameters (as suggested by Goris et al. 2007). For each fragment, only the best BLAST hit is considered (E-value threshold = 0.5). All BLAST hits are graphically represented as a scatter plot, where _x_ is the hit BLAST position within the reference, _y_ is the percentage of similarity, and the dot radius is proportional to the aligned part of the fragment.
* Each input file should be in FASTA format, not compressed, and may contain nucleotide sequences. At least one input files should be specified.
* Faster running times can be obtained by using a large number of threads (option `-t`; default: 2; recommended: ≥ 10).
* Fragment length can be modified using option `-w`. By default, the fragment length is the reference sequence length divided by 1,000.
*The smoothing option `-s` can sometimes be useful to reduce variability between neighbor dots, leading to clearer similarity representations.
*Faster running times can be obtained by using a large number of threads (option `-t`; default: 2; recommended: ≥ 10).
* Specific regions can be represented by specifying start and end positions within the reference sequence using options `-x` and `-X`, respectively. By default, the whole reference sequence is represented. Y-axis range can be also modified using options `-y` and `Y` (default: 0% and 100% similarity, respectively).
* To obtain convenient figures, the aspect ratio (i.e. width/heigth) of the scatter plot can be modified using option `-a` (default: 3.0). Dot size can be controlled using option `-d`. Fragment length can be also modified using option `-w`, but at the risk of obtaining a less legible figure.
* To obtain convenient and more readable figures with clearer similarity representation, the smoothing option `-s` can often be useful to reduce variability between neighbor dots. Another way is to increase the aspect ratio (i.e. width/heigth) of the scatter plot using option `-a` (default: 3.0). Dot size can be also controlled using option `-d`.
* A different dot color is used for each input file. The first colors are: (1) red, (2) blue, (3) orange, (4) green, (5) gray, (6) brown, (7) dark green, (8) pink, (9) light blue. To associate a given input file to a specific color, change the input file order.