Skip to content
Snippets Groups Projects
Commit 47420d78 authored by Alexis  CRISCUOLO's avatar Alexis CRISCUOLO :black_circle:
Browse files

1.0

parent f7dace6d
No related branches found
No related tags found
No related merge requests found
# YACO
_YACO_ (_Yet Another Contig Ordering_) is a command line program written in [Bash](https://www.gnu.org/software/bash/) to determine both the orientation and order of a set of contigs (generally, a draft genome assembly) according to a complete reference genome. Based on reciprocal BLAST searches, _YACO_ is a simple and practical tool, that generally achieves accurate results with acceptable running times (e.g. less than 10 seconds to process a 5 Mbp draft genome using 12 threads).
_YACO_ (_Yet Another Contig Ordering_) is a command line program written in [Bash](https://www.gnu.org/software/bash/) for orienting and ordering contigs (generally, a draft genome assembly) according to a closed reference genome. Based on reciprocal BLAST searches, _YACO_ is a simple and practical tool, which generally achieves accurate results with acceptable running times (e.g. less than 10 seconds to process a 5 Mbp draft genome using 12 threads).
_YACO_ runs on UNIX, Linux and most OS X operating systems.
......@@ -66,7 +66,7 @@ Run _YACO_ without option to read the following documentation:
## Notes
* First, given a fixed fragment length _w_ (option `-w`; default: 400 bps), _YACO_ partitions each contig (option `-i`) into consecutive fragments _f<sub>i</sub>_, and decomposes the reference sequence(s) (option `-r`) into overlapping fragments _f<sub>r</sub>_ (step _w_&nbsp;&#8725;&nbsp;2). Next, each set of fragments is searched against the other using _blastn_ (Altschul et al. 1990; Camacho et al. 2008) with tuned parameters (as suggested by Goris et al. 2007). Orthologous fragments are assessed by reciprocal best BLAST hits showing &geq;&nbsp;30 % overall fragment identity on an alignable region &geq;&nbsp;35% fragment length (as suggested by Lee et al. 2016). Every fragment _f<sub>i</sub>_ associated with an orthologous one _f<sub>r</sub>_ is ranked by the position of _f<sub>r</sub>_ within the reference. A contig is localized when a sufficient proportion of its fragments _f<sub>i</sub>_ are ranked (as set by option `-p`, default: &geq;50%). Every localized contig is replaced by its reverse-complement when most of its ranked fragments _f<sub>i</sub>_ are in opposite strand against their orthologous fragments _f<sub>r</sub>_. Finally, the localized contigs are sorted according to the median rank of its fragments _f<sub>i</sub>_. Oriented and ordered contigs are finally written into the specified output file.
* First, given a fixed fragment length _w_ (option `-w`; default: 400 bps), _YACO_ partitions each contig (option `-i`) into consecutive fragments _f<sub>i</sub>_, and decomposes the reference sequence(s) (option `-r`) into overlapping fragments _f<sub>r</sub>_ (step _w_&nbsp;&#8725;&nbsp;2). Next, each set of fragments is searched against the other using _blastn_ (Altschul et al. 1990; Camacho et al. 2008) with tuned parameters (as suggested by Goris et al. 2007). Orthologous fragments are assessed by reciprocal best BLAST hits showing &geq;&nbsp;30 % overall fragment identity on an alignable region &geq;&nbsp;35% fragment length (as suggested by Lee et al. 2016). Every fragment _f<sub>i</sub>_ associated with an orthologous one _f<sub>r</sub>_ is ranked by the position of _f<sub>r</sub>_ within the reference. A contig is localized when a sufficient proportion of its fragments _f<sub>i</sub>_ is ranked (as set by option `-p`, default: &geq;50%). Every localized contig is replaced by its reverse-complement when most of its ranked fragments _f<sub>i</sub>_ are in opposite strand against their orthologous fragments _f<sub>r</sub>_. Finally, the localized contigs are sorted according to the median rank of its fragments _f<sub>i</sub>_. Oriented and ordered contigs are finally written into the specified output file.
* When the reference file (option `-r`) contains more than one sequence, (reciprocal) BLAST searches are performed against all of them, but the ordering/orienting procedure (see above) is carried out according to only the first one. This approach can be useful to make a better distinction between a reference chromosome and e.g. several reference plasmids.
......@@ -112,7 +112,7 @@ YACO.sh -t 12 -i Kp.SB1139.fa -r Kp.SB612.chr.fa -o Kp.SB1139
The output file _Kp.SB1139.into.txt_ (reproduced below) shows that _YACO_ is able to localize 36 contigs:
<pre style="font-size: 0.5em">#ordered/oriented sequences
<pre style="font-size: 0.3em">#ordered/oriented sequences
CAAHFT010000037.1 Klebsiella pneumoniae isolate SB1139 genome assembly, contig: SB1139_Kp1_42, whole genome shotgun sequence - 2/2
CAAHFT010000017.1 Klebsiella pneumoniae isolate SB1139 genome assembly, contig: SB1139_Kp1_24, whole genome shotgun sequence - 664/667
CAAHFT010000016.1 Klebsiella pneumoniae isolate SB1139 genome assembly, contig: SB1139_Kp1_23, whole genome shotgun sequence - 171/174
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment