_contig_info_ is a command line program written in [Bash](https://www.gnu.org/software/bash/) that allows estimating several standard descriptive statistics from a FASTA-formatted contig file inferred by a _de novo_ genome assembly method. Estimated statistics are sequence number, residue counts, sequence length distribution, N50 (Lander et al. 2001), NG50 (Earl et al. 2011), and its related N75, NG75, N90, and NG90.
_contig_info_ is a command line program written in [Bash](https://www.gnu.org/software/bash/) that allows several standard descriptive statistics to be quickly estimated from FASTA-formatted contig files inferred by _de novo_ genome assembly methods.
Estimated statistics are sequence number, residue counts, AT- and GC-content, sequence lengths, N50 (Lander et al. 2001), NG50 (Earl et al. 2011), and the related N75, NG75, N90, NG90, L50, LG50, L75, LG75, L90, LG90.
## Installation and execution
...
...
@@ -18,20 +19,118 @@ and launch it with the following command line model:
Launch _contig_info_ without option to read the following documentation:
```
USAGE: contig_info.sh [options] <contig_file>
USAGE: contig_info.sh [options] <contig_files>
where 'options' are:
-m <int> minimum contig length; every contig sequence of length
shorter than this cutoff will be discarded (default: 0)
-g <int> expected genome size for computing NG50, NG75 and NG90
values instead of N50, N75 and N90 ones, respectively
-d print contig sequence length distribution
-l print length of each contig sequence
-r print residue counts
shorter than this cutoff will be discarded (default: 1)
-g <int> expected genome size for computing {N,L}G{50,75,90}
values instead of {N,L}{50,75,90} ones, respectively
-t tab-delimited output
```
## Examples
The following [Bash](https://www.gnu.org/software/bash/) command lines allows the genome sequences of the 5 _Mucor circinelloides_ strains 1006PhL, CBS 277.49, WJ11, B8987 and JCM 22480 to be downloaded from the [NCBI genome repository](https://www.ncbi.nlm.nih.gov/genome):
Earl D, Bradnam K, St John J, Darling A, Lin D, Fass J, Yu HO, Buffalo V, Zerbino DR, Diekhans M, Nguyen N, Ariyaratne PN, Sung WK, Ning Z, Haimel M, Simpson JT, Fonseca NA, Birol İ, Docking TR, Ho IY, Rokhsar DS, Chikhi R, Lavenier D, Chapuis G, Naquin D, Maillet N, Schatz MC, Kelley DR, Phillippy AM, Koren S, Yang SP, Wu W, Chou WC, Srivastava A, Shaw TI, Ruby JG, Skewes-Cox P, Betegon M, Dimon MT, Solovyev V, Seledtsov I, Kosarev P, Vorobyev D, Ramirez-Gonzalez R, Leggett R, MacLean D, Xia F, Luo R, Li Z, Xie Y, Liu B, Gnerre S, MacCallum I, Przybylski D, Ribeiro FJ, Yin S, Sharpe T, Hall G, Kersey PJ, Durbin R, Jackman SD, Chapman JA, Huang X, DeRisi JL, Caccamo M, Li Y, Jaffe DB, Green RE, Haussler D, Korf I, Paten B (2011) Assemblathon 1: a competitive assessment of de novo short read assembly methods. Genome Research, 21(12):2224-2241. [doi:10.1101/gr.126599.111](https://genome.cshlp.org/content/21/12/2224).