diff --git a/README.md b/README.md index 39f54ca289c3e6d16a3b86e6812de4f5ed87f77f..2679b0fb59bf2978805880b14d8f47949943aa07 100644 --- a/README.md +++ b/README.md @@ -59,6 +59,7 @@ Run _SimiPlot_ without option to read the following documentation: -X <int> x-axis end (default: reference length) -y <int> y-axis start (default: 0) -Y <int> y-axis end (default: 100) + -d <int> dot size factor (default: 1.0) -a <real> aspect ratio (detault: 3.0) -t <int> number of threads (default: 2) -h prints this help and exits @@ -66,17 +67,17 @@ Run _SimiPlot_ without option to read the following documentation: ## Notes -* For each input file, _SimiPlot_ decomposes the nucleotide sequence(s) into overlapping fragments (step = half the fragment length). Fragment length is set by option `-w` (default: reference sequence length divided by 1,000). Each fragment is searched against the reference sequence (option `-r`) using blastn (Altschul et al. 1990; Camacho et al. 2008) with tuned parameters (as suggested by Goris et al. 2007). For each fragment, only the best BLAST hit is considered (E-value threshold = 0.5). All BLAST hits are graphically represented as a scatter plot, where _x_ is the hit BLAST position within the reference, _y_ is the percentage of similarity, and the dot radius is proportional to the aligned part of the fragment. +* For each non-reference input file, _SimiPlot_ decomposes the nucleotide sequence(s) into overlapping equal-length fragments (step = half the fragment length). Each fragment is searched against the reference sequence (option `-r`) using blastn (Altschul et al. 1990; Camacho et al. 2008) with tuned parameters (as suggested by Goris et al. 2007). For each fragment, only the best BLAST hit is considered (E-value threshold = 0.5). All BLAST hits are graphically represented as a scatter plot, where _x_ is the hit BLAST position within the reference, _y_ is the percentage of similarity, and the dot radius is proportional to the aligned part of the fragment. * Each input file should be in FASTA format, not compressed, and may contain nucleotide sequences. At least one input files should be specified. -* Faster running times can be obtained by using a large number of threads (option `-t`; default: 2; recommended: ≥ 10). +* Fragment length can be modified using option `-w`. By default, the fragment length is the reference sequence length divided by 1,000. -* The smoothing option `-s` can sometimes be useful to reduce variability between neighbor dots, leading to clearer similarity representations. +* Faster running times can be obtained by using a large number of threads (option `-t`; default: 2; recommended: ≥ 10). * Specific regions can be represented by specifying start and end positions within the reference sequence using options `-x` and `-X`, respectively. By default, the whole reference sequence is represented. Y-axis range can be also modified using options `-y` and `Y` (default: 0% and 100% similarity, respectively). -* To obtain convenient figures, the aspect ratio (i.e. width/heigth) of the scatter plot can be modified using option `-a` (default: 3.0). Dot size can be controlled using option `-d`. Fragment length can be also modified using option `-w`, but at the risk of obtaining a less legible figure. +* To obtain convenient and more readable figures with clearer similarity representation, the smoothing option `-s` can often be useful to reduce variability between neighbor dots. Another way is to increase the aspect ratio (i.e. width/heigth) of the scatter plot using option `-a` (default: 3.0). Dot size can be also controlled using option `-d`. * A different dot color is used for each input file. The first colors are: (1) red, (2) blue, (3) orange, (4) green, (5) gray, (6) brown, (7) dark green, (8) pink, (9) light blue. To associate a given input file to a specific color, change the input file order.