diff --git a/README.md b/README.md
index 143e1036b25c5b3978b806dfccbfc2ca46e81547..d20509b2820f0ffa23417d34151330a392bbe5c3 100644
--- a/README.md
+++ b/README.md
@@ -1,49 +1,55 @@
 # contig_info
 
-_contig_info_ is a command line program written in [Bash](https://www.gnu.org/software/bash/) that allows several standard descriptive statistics to be quickly estimated from FASTA-formatted contig files inferred by _de novo_ genome assembly methods.
-Estimated statistics are sequence number, residue counts, AT- and GC-content, sequence lengths, N50 (Lander et al. 2001), NG50 (Earl et al. 2011), and the related N75, NG75, N90, NG90, L50, LG50, L75, LG75, L90, LG90.
+_contig_info_ is a command line program written in [Bash](https://www.gnu.org/software/bash/) for quickly estimating several standard descriptive statistics from FASTA-formatted contig files inferred by _de novo_ genome assembly methods.
+Estimated statistics are sequence number, residue counts, AT- and GC-content, sequence lengths, [auN](https://lh3.github.io/2020/04/08/a-new-metric-on-assembly-contiguity) (also called E-size, Salzberg et al. 2012), [N50](https://en.wikipedia.org/wiki/N50,_L50,_and_related_statistics) (Lander et al. 2001), [NG50](https://en.wikipedia.org/wiki/N50,_L50,_and_related_statistics) (Earl et al. 2011), and the related N75, NG75, N90, NG90, L50, LG50, L75, LG75, L90, LG90.
 
 ## Installation and execution
 
 Give the execute permission to the file `contig_info.sh` by typing:
+
 ```bash
 chmod +x contig_info.sh
 ```
-and launch it with the following command line model:
+and run it with the following command line model:
+
 ```bash
 ./contig_info.sh [options]
 ```
 
 ## Usage
 
-Launch _contig_info_ without option to read the following documentation:
+Run _contig_info_ without option to read the following documentation:
 
 ```
- USAGE:  contig_info.sh  [options]  <contig_files>
+ USAGE:  contig_info.sh  [options]  <contig_files> 
 
   where 'options' are:
 
-   -m <int>    minimum contig length;  every contig sequence of length
-               shorter than this cutoff will be discarded (default: 1)
-   -g <int>    expected  genome  size  for  computing {N,L}G{50,75,90}
-               values instead of {N,L}{50,75,90} ones, respectively
+   -m <int>    minimum contig length; every contig sequence of length shorter
+               than this cutoff will be discarded (default: 1)
+   -g <int>    expected genome size  for computing auNG  and {N,L}G{50,75,90}
+               values instead of auN and {N,L}{50,75,90} ones, respectively
    -t          tab-delimited output
 ```
 
 ## Examples
 
 The following [Bash](https://www.gnu.org/software/bash/) command lines allows the genome sequences of the 5 _Mucor circinelloides_ strains 1006PhL, CBS 277.49, WJ11, B8987 and JCM 22480 to be downloaded from the [NCBI genome repository](https://www.ncbi.nlm.nih.gov/genome):
+
 ```bash
 NCBIFTP="wget -q -O- https://ftp.ncbi.nlm.nih.gov/sra/wgs_aux/"; Z=".1.fsa_nt.gz";
 echo -e "1006PhL\tAOCY01\nCBS277.49\tAMYB01\nWJ11\tLGTF01\nB8987\tJNDM01\nJCM22480\tBCHG01" |
   while read -r s a; do echo -n "$s ... ";$NCBIFTP${a:0:2}/${a:2:2}/$a/$a$Z|zcat>Mucor.$s.fasta;echo "[ok]";done
 ```
 
-The following command line allows the script `contig_info.sh` to be launched to analyze the first downloaded file _Mucor.1006PhL.fasta_:
+The following command line runs `contig_info.sh` to analyze the first downloaded file _Mucor.1006PhL.fasta_:
+
 ```bash
 ./contig_info.sh  Mucor.1006PhL.fasta
 ```
+
 leading to the following standard output:
+
 ```
 File                           Mucor.1006PhL.fasta
 
@@ -69,6 +75,7 @@ Sequence lengths:
   Average                      23395.89
 
 Contiguity statistics:
+  auN                          65329
   N50                          58982
   N75                          36291
   N90                          18584
@@ -77,63 +84,74 @@ Contiguity statistics:
   L90                          562
 ```
 
-The same results could be outputted in tab-delimited format with the following command line:
+The same results can be outputted in tab-delimited format using option `-t`:
+
 ```bash
 ./contig_info.sh  -t  Mucor.1006PhL.fasta
 ```
 
 ```
-#File               Nseq   Nres     A        C       G       T        N    %A     %C     %G     %T     %N   %AT    %GC     Min   Q25   Med   Q75   Max    Avg       N50   N75   N90   L50 L75 L90
-Mucor.1006PhL.fasta 1459   34134616 10320010 6747611 6731530 10335465 0    30.23% 19.76% 19.72% 30.27% 0%   60.52% 39.48%  410   1660  6176  37608 213712 23395.89  58982 36291 18584 194 376 562
+#File               Nseq   Nres     A        C       G       T        N    %A     %C     %G     %T     %N   %AT    %GC     Min   Q25   Med   Q75   Max    Avg       auN    N50   N75   N90   L50 L75 L90
+Mucor.1006PhL.fasta 1459   34134616 10320010 6747611 6731530 10335465 0    30.23% 19.76% 19.72% 30.27% 0%   60.52% 39.48%  410   1660  6176  37608 213712 23395.89  65329  58982 36291 18584 194 376 562
 ```
 
-Of note, the five downloaded FASTA files could be analyzed with a single command line:
+Of note, the five downloaded FASTA files can be analyzed with a single command line:
+
 ```bash
 ./contig_info.sh  -t  Mucor.*.fasta
 ```
 
 ```
-#File                 Nseq   Nres      A        C       G       T        N       %A     %C     %G     %T     %N    %AT    %GC     Min   Q25   Med    Q75     Max     Avg         N50      N75     N90      L50 L75 L90
-Mucor.1006PhL.fasta   1459   34134616  10320010 6747611 6731530 10335465 0       30.23% 19.76% 19.72% 30.27% 0%    60.52% 39.48%  410   1660  6176   37608   213712  23395.89    58982    36291   18584    194 376 562
-Mucor.B8987.fasta     2210   36700617  11096810 7247117 7233795 11122895 0       30.23% 19.74% 19.71% 30.30% 0%    60.55% 39.45%  206   839   2482   20727   258792  16606.61    58460    30025   13274    193 416 674
-Mucor.CBS277.49.fasta 21     36567582  10571030 7715901 7705901 10574750 0       28.90% 21.10% 21.07% 28.91% 0%    57.83% 42.17%  4155  41542 934259 3187354 6050249 1741313.42  4318338  3096690 1074709  4   7   9
-Mucor.JCM22480.fasta  401    36616466  10586281 6882218 6899109 10581984 1659222 28.91% 18.79% 18.84% 28.89% 4.53% 60.57% 39.43%  1038  4814  50332  135940  659822  91312.88    197059   109360  63107    61  121 183
-Mucor.WJ11.fasta      2519   33065171  9974064  6559358 6556539 9975210  0       30.16% 19.83% 19.82% 30.16% 0%    60.34% 39.66%  430   3275  7692   18010   118704  13126.30    24148    12884   5672     429 898 1455
+#File                 Nseq   Nres      A        C       G       T        N       %A     %C     %G     %T     %N    %AT    %GC     Min   Q25   Med    Q75     Max     Avg         auN     N50      N75     N90      L50 L75 L90
+Mucor.1006PhL.fasta   1459   34134616  10320010 6747611 6731530 10335465 0       30.23% 19.76% 19.72% 30.27% 0%    60.52% 39.48%  410   1660  6176   37608   213712  23395.89    65329   58982    36291   18584    194 376 562
+Mucor.B8987.fasta     2210   36700617  11096810 7247117 7233795 11122895 0       30.23% 19.74% 19.71% 30.30% 0%    60.55% 39.45%  206   839   2482   20727   258792  16606.61    69144   58460    30025   13274    193 416 674
+Mucor.CBS277.49.fasta 21     36567582  10571030 7715901 7705901 10574750 0       28.90% 21.10% 21.07% 28.91% 0%    57.83% 42.17%  4155  41542 934259 3187354 6050249 1741313.42  3912950 4318338  3096690 1074709  4   7   9
+Mucor.JCM22480.fasta  401    36616466  10586281 6882218 6899109 10581984 1659222 28.91% 18.79% 18.84% 28.89% 4.53% 60.57% 39.43%  1038  4814  50332  135940  659822  91312.88    229712  197059   109360  63107    61  121 183
+Mucor.WJ11.fasta      2519   33065171  9974064  6559358 6556539 9975210  0       30.16% 19.83% 19.82% 30.16% 0%    60.34% 39.66%  430   3275  7692   18010   118704  13126.30    28368   24148    12884   5672     429 898 1455
 ```
 
-The tab-delimited output format could be useful for focusing on specific fields like, e.g. the six contiguity statistics:
+The tab-delimited output format can be useful for focusing on specific fields like, e.g. the seven contiguity statistics:
+
 ```bash
 ./contig_info.sh  -t  Mucor.*.fasta  |  cut -f1,22-
 ```
 
 ```
-#File                 N50      N75     N90      L50  L75  L90
-Mucor.1006PhL.fasta   58982    36291   18584    194  376  562
-Mucor.B8987.fasta     58460    30025   13274    193  416  674
-Mucor.CBS277.49.fasta 4318338  3096690 1074709  4    7    9
-Mucor.JCM22480.fasta  197059   109360  63107    61   121  183
-Mucor.WJ11.fasta      24148    12884   5672     429  898  1455
+#File                 auN     N50      N75     N90      L50  L75  L90
+Mucor.1006PhL.fasta   65329   58982    36291   18584    194  376  562
+Mucor.B8987.fasta     69144   58460    30025   13274    193  416  674
+Mucor.CBS277.49.fasta 3912950 4318338  3096690 1074709  4    7    9
+Mucor.JCM22480.fasta  229712  197059   109360  63107    61   121  183
+Mucor.WJ11.fasta      28368   24148    12884   5672     429  898  1455
 ```
 
 
-Finally, the option -g could be used to set an expected genome size for obtaining {N,L}G{50,75,90} statistics instead of {N,L}{50,75,90} ones:
+Finally, the option `-g` can be used to set an expected genome size for obtaining auNG and {N,L}G{50,75,90} statistics instead of auN and {N,L}{50,75,90} ones:
+
 ```bash
 ./contig_info.sh  -t  -g 36000000  Mucor.*.fasta | cut -f1,22-
 ```
 
 ```
-#File                 N50      N75     N90      L50  L75  L90  ExpSize
-Mucor.1006PhL.fasta   57499    32472   7652     210  417  692  36000000
-Mucor.B8987.fasta     59771    30857   15730    187  399  631  36000000
-Mucor.CBS277.49.fasta 4318338  3096690 1074709  4    7    9    36000000
-Mucor.JCM22480.fasta  197663   113006  69531    59   117  175  36000000
-Mucor.WJ11.fasta      21799    9865    2445     493  1092 2146 36000000
+#File                 auN     N50      N75     N90      L50  L75  L90  ExpSize
+Mucor.1006PhL.fasta   61944   57499    32472   7652     210  417  692  36000000
+Mucor.B8987.fasta     70490   59771    30857   15730    187  399  631  36000000
+Mucor.CBS277.49.fasta 3974642 4318338  3096690 1074709  4    7    9    36000000
+Mucor.JCM22480.fasta  233645  197663   113006  69531    59   117  175  36000000
+Mucor.WJ11.fasta      26055   21799    9865    2445     493  1092 2146 36000000
 ```
 
 
 ## References
 
-Earl D, Bradnam K, St John J, Darling A, Lin D, Fass J, Yu HO, Buffalo V, Zerbino DR, Diekhans M, Nguyen N, Ariyaratne PN, Sung WK, Ning Z, Haimel M, Simpson JT, Fonseca NA, Birol Ä°, Docking TR, Ho IY, Rokhsar DS, Chikhi R, Lavenier D, Chapuis G, Naquin D, Maillet N, Schatz MC, Kelley DR, Phillippy AM, Koren S, Yang SP, Wu W, Chou WC, Srivastava A, Shaw TI, Ruby JG, Skewes-Cox P, Betegon M, Dimon MT, Solovyev V, Seledtsov I, Kosarev P, Vorobyev D, Ramirez-Gonzalez R, Leggett R, MacLean D, Xia F, Luo R, Li Z, Xie Y, Liu B, Gnerre S, MacCallum I, Przybylski D, Ribeiro FJ, Yin S, Sharpe T, Hall G, Kersey PJ, Durbin R, Jackman SD, Chapman JA, Huang X, DeRisi JL, Caccamo M, Li Y, Jaffe DB, Green RE, Haussler D, Korf I, Paten B (2011) Assemblathon 1: a competitive assessment of de novo short read assembly methods. Genome Research, 21(12):2224-2241. [doi:10.1101/gr.126599.111](https://genome.cshlp.org/content/21/12/2224).
+<sub>
+Earl D, Bradnam K, St John J, Darling A, Lin D, Fass J, Yu HO, Buffalo V, Zerbino DR, Diekhans M, Nguyen N, Ariyaratne PN, Sung WK, Ning Z, Haimel M, Simpson JT, Fonseca NA, Birol Ä°, Docking TR, Ho IY, Rokhsar DS, Chikhi R, Lavenier D, Chapuis G, Naquin D, Maillet N, Schatz MC, Kelley DR, Phillippy AM, Koren S, Yang SP, Wu W, Chou WC, Srivastava A, Shaw TI, Ruby JG, Skewes-Cox P, Betegon M, Dimon MT, Solovyev V, Seledtsov I, Kosarev P, Vorobyev D, Ramirez-Gonzalez R, Leggett R, MacLean D, Xia F, Luo R, Li Z, Xie Y, Liu B, Gnerre S, MacCallum I, Przybylski D, Ribeiro FJ, Yin S, Sharpe T, Hall G, Kersey PJ, Durbin R, Jackman SD, Chapman JA, Huang X, DeRisi JL, Caccamo M, Li Y, Jaffe DB, Green RE, Haussler D, Korf I, Paten B (2011) _Assemblathon 1: a competitive assessment of de novo short read assembly methods_. **Genome Research**, 21(12):2224-2241. [doi:10.1101/gr.126599.111](https://genome.cshlp.org/content/21/12/2224).
+</sub>
 
-Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, Stange-Thomann Y, Stojanovic N, Subramanian A, Wyman D, Rogers J, Sulston J, Ainscough R, Beck S, Bentley D, Burton J, Clee C, Carter N, Coulson A, Deadman R, Deloukas P, Dunham A, Dunham I, Durbin R, French L, Grafham D, Gregory S, Hubbard T, Humphray S, Hunt A, Jones M, Lloyd C, McMurray A, Matthews L, Mercer S, Milne S, Mullikin JC, Mungall A, Plumb R, Ross M, Shownkeen R, Sims S, Waterston RH, Wilson RK, Hillier LW, McPherson JD, Marra MA, Mardis ER, Fulton LA, Chinwalla AT, Pepin KH, Gish WR, Chissoe SL, Wendl MC, Delehaunty KD, Miner TL, Delehaunty A, Kramer JB, Cook LL, Fulton RS, Johnson DL, Minx PJ, Clifton SW, Hawkins T, Branscomb E, Predki P, Richardson P, Wenning S, Slezak T, Doggett N, Cheng JF, Olsen A, Lucas S, Elkin C, Uberbacher E, Frazier M, Gibbs RA, Muzny DM, Scherer SE, Bouck JB, Sodergren EJ, Worley KC, Rives CM, Gorrell JH, Metzker ML, Naylor SL, Kucherlapati RS, Nelson DL, Weinstock GM, Sakaki Y, Fujiyama A, Hattori M, Yada T, Toyoda A, Itoh T, Kawagoe C, Watanabe H, Totoki Y, Taylor T, Weissenbach J, Heilig R, Saurin W, Artiguenave F, Brottier P, Bruls T, Pelletier E, Robert C, Wincker P, Smith DR, Doucette-Stamm L, Rubenfield M, Weinstock K, Lee HM, Dubois J, Rosenthal A, Platzer M, Nyakatura G, Taudien S, Rump A, Yang H, Yu J, Wang J, Huang G, Gu J, Hood L, Rowen L, Madan A, Qin S, Davis RW, Federspiel NA, Abola AP, Proctor MJ, Myers RM, Schmutz J, Dickson M, Grimwood J, Cox DR, Olson MV, Kaul R, Raymond C, Shimizu N, Kawasaki K, Minoshima S, Evans GA, Athanasiou M, Schultz R, Roe BA, Chen F, Pan H, Ramser J, Lehrach H, Reinhardt R, McCombie WR, de la Bastide M, Dedhia N, BlÃ¶cker H, Hornischer K, Nordsiek G, Agarwala R, Aravind L, Bailey JA, Bateman A, Batzoglou S, Birney E, Bork P, Brown DG, Burge CB, Cerutti L, Chen HC, Church D, Clamp M, Copley RR, Doerks T, Eddy SR, Eichler EE, Furey TS, Galagan J, Gilbert JG, Harmon C, Hayashizaki Y, Haussler D, Hermjakob H, Hokamp K, Jang W, Johnson LS, Jones TA, Kasif S, Kaspryzk A, Kennedy S, Kent WJ, Kitts P, Koonin EV, Korf I, Kulp D, Lancet D, Lowe TM, McLysaght A, Mikkelsen T, Moran JV, Mulder N, Pollara VJ, Ponting CP, Schuler G, Schultz J, Slater G, Smit AF, Stupka E, Szustakowki J, Thierry-Mieg D, Thierry-Mieg J, Wagner L, Wallis J, Wheeler R, Williams A, Wolf YI, Wolfe KH, Yang SP, Yeh RF, Collins F, Guyer MS, Peterson J, Felsenfeld A, Wetterstrand KA, Patrinos A, Morgan MJ, de Jong P, Catanese JJ, Osoegawa K, Shizuya H, Choi S, Chen YJ, Szustakowki J; International Human Genome Sequencing Consortium (2001) Initial sequencing and analysis of the human genome. Nature, 409(6822):860-921. [doi:10.1038/35057062](https://www.nature.com/articles/35057062).
+<sub>
+Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, Stange-Thomann Y, Stojanovic N, Subramanian A, Wyman D, Rogers J, Sulston J, Ainscough R, Beck S, Bentley D, Burton J, Clee C, Carter N, Coulson A, Deadman R, Deloukas P, Dunham A, Dunham I, Durbin R, French L, Grafham D, Gregory S, Hubbard T, Humphray S, Hunt A, Jones M, Lloyd C, McMurray A, Matthews L, Mercer S, Milne S, Mullikin JC, Mungall A, Plumb R, Ross M, Shownkeen R, Sims S, Waterston RH, Wilson RK, Hillier LW, McPherson JD, Marra MA, Mardis ER, Fulton LA, Chinwalla AT, Pepin KH, Gish WR, Chissoe SL, Wendl MC, Delehaunty KD, Miner TL, Delehaunty A, Kramer JB, Cook LL, Fulton RS, Johnson DL, Minx PJ, Clifton SW, Hawkins T, Branscomb E, Predki P, Richardson P, Wenning S, Slezak T, Doggett N, Cheng JF, Olsen A, Lucas S, Elkin C, Uberbacher E, Frazier M, Gibbs RA, Muzny DM, Scherer SE, Bouck JB, Sodergren EJ, Worley KC, Rives CM, Gorrell JH, Metzker ML, Naylor SL, Kucherlapati RS, Nelson DL, Weinstock GM, Sakaki Y, Fujiyama A, Hattori M, Yada T, Toyoda A, Itoh T, Kawagoe C, Watanabe H, Totoki Y, Taylor T, Weissenbach J, Heilig R, Saurin W, Artiguenave F, Brottier P, Bruls T, Pelletier E, Robert C, Wincker P, Smith DR, Doucette-Stamm L, Rubenfield M, Weinstock K, Lee HM, Dubois J, Rosenthal A, Platzer M, Nyakatura G, Taudien S, Rump A, Yang H, Yu J, Wang J, Huang G, Gu J, Hood L, Rowen L, Madan A, Qin S, Davis RW, Federspiel NA, Abola AP, Proctor MJ, Myers RM, Schmutz J, Dickson M, Grimwood J, Cox DR, Olson MV, Kaul R, Raymond C, Shimizu N, Kawasaki K, Minoshima S, Evans GA, Athanasiou M, Schultz R, Roe BA, Chen F, Pan H, Ramser J, Lehrach H, Reinhardt R, McCombie WR, de la Bastide M, Dedhia N, BlÃ¶cker H, Hornischer K, Nordsiek G, Agarwala R, Aravind L, Bailey JA, Bateman A, Batzoglou S, Birney E, Bork P, Brown DG, Burge CB, Cerutti L, Chen HC, Church D, Clamp M, Copley RR, Doerks T, Eddy SR, Eichler EE, Furey TS, Galagan J, Gilbert JG, Harmon C, Hayashizaki Y, Haussler D, Hermjakob H, Hokamp K, Jang W, Johnson LS, Jones TA, Kasif S, Kaspryzk A, Kennedy S, Kent WJ, Kitts P, Koonin EV, Korf I, Kulp D, Lancet D, Lowe TM, McLysaght A, Mikkelsen T, Moran JV, Mulder N, Pollara VJ, Ponting CP, Schuler G, Schultz J, Slater G, Smit AF, Stupka E, Szustakowki J, Thierry-Mieg D, Thierry-Mieg J, Wagner L, Wallis J, Wheeler R, Williams A, Wolf YI, Wolfe KH, Yang SP, Yeh RF, Collins F, Guyer MS, Peterson J, Felsenfeld A, Wetterstrand KA, Patrinos A, Morgan MJ, de Jong P, Catanese JJ, Osoegawa K, Shizuya H, Choi S, Chen YJ, Szustakowki J; International Human Genome Sequencing Consortium (2001) _Initial sequencing and analysis of the human genome_. **Nature**, 409(6822):860-921. [doi:10.1038/35057062](https://www.nature.com/articles/35057062).
+</sub>
 
+<sub>
+Salzberg SL, Phillippy AM, Zimin A, Puiu D, Magoc T, Koren S, Treangen TJ, Schatz MC, Delcher AL, Roberts M, MarÃ§ais G, Pop M, Yorke JA (2012) _GAGE: A critical evaluation of genome assemblies and assembly algorithms_. **Genome Research**, 22(3):557-567. [doi:10.1101/gr.131383.111](https://genome.cshlp.org/content/22/3/557.long). 
+</sub>
diff --git a/contig_info.sh b/contig_info.sh
index a9a081a68a8da7e26fe0f8c9451838d9ca74423b..c22e1c1a2adf1b09586cd6157e1ad4241ce2b215 100755
--- a/contig_info.sh
+++ b/contig_info.sh
@@ -1,114 +1,115 @@
 #!/bin/bash
 
-########################################################################################
-#                                                                                      #
-#  contig_info: a BASH script to estimate standard statistics from FASTA contig files  #
-#                                                                                      #
-#  Copyright (C) 2015,2018,2019  Alexis Criscuolo                                      #
-#                                                                                      #
-#  This program is free software:  you can redistribute it and/or modify it under the  #
-#  terms  of  the GNU  General  Public  License as  published by  the  Free  Software  #
-#  Foundation, either version 3 of the License, or (at your option) any later version  #
-#                                                                                      #
-#  This program is distributed  in the hope that  it will be useful,  but WITHOUT ANY  #
-#  WARRANTY;  without even the  implied warranty of  MERCHANTABILITY or FITNESS FOR A  #
-#  PARTICULAR PURPOSE. See the GNU General Public License for more details.            #
-#                                                                                      #
-#  You should have received a copy of the  GNU General Public License along with this  #
-#  program. If not, see <http://www.gnu.org/licenses/>.                                #
-#                                                                                      #
-#  Contact:                                                                            #
-#  Institut Pasteur                                                                    #
-#  Bioinformatics and Biostatistics Hub                                                #
-#  C3BI, USR 3756 IP CNRS                                                              #
-#  Paris, FRANCE                                                                       #
-#                                                                                      #
-#  alexis.criscuolo@pasteur.fr                                                         #
-#                                                                                      #
-########################################################################################
-
-########################################################################################
-#                                                                                      #
-# ============                                                                         #
-# = VERSIONS =                                                                         #
-# ============                                                                         #
-#                                                                                      #
-  VERSION=1.0.190426ac                                                                 #
-# + options -l and -d (i.e. printing sequence lengths and length distribution, resp.)  #
-#   are no longer supported                                                            #
-# + residue count always computed (option -r discarded)                                #
-# + ultrafast residue count (based on tr + wc)                                         #
-# + estimating %AT, %GC, L50, L75, L90                                                 #
-# + faster estimation of the sequence length statistics (100% awk)                     #
-# + ability to read multiple input files                                               #
-#                                                                                      #
-# VERSION=0.3.180515ac                                                                 #
-#                                                                                      #
-########################################################################################
+##############################################################################################################
+#                                                                                                            #
+#  contig_info: a BASH script to estimate standard statistics from FASTA contig files                        #
+#                                                                                                            #
+#  Copyright (C) 2018-2021  Institut Pasteur                                                                 #
+#                                                                                                            #
+#  This program  is free software:  you can  redistribute it  and/or modify it  under the terms  of the GNU  #
+#  General Public License as published by the Free Software Foundation, either version 3 of the License, or  #
+#  (at your option) any later version.                                                                       #
+#                                                                                                            #
+#  This program is distributed in the hope that it will be useful,  but WITHOUT ANY WARRANTY;  without even  #
+#  the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public  #
+#  License for more details.                                                                                 #
+#                                                                                                            #
+#  You should have received a copy of the  GNU General Public License along with this program.  If not, see  #
+#  <http://www.gnu.org/licenses/>.                                                                           #
+#                                                                                                            #
+#  Contact:                                                                                                  #
+#   Alexis Criscuolo                                                            alexis.criscuolo@pasteur.fr  #
+#   Genome Informatics & Phylogenetics (GIPhy)                                             giphy.pasteur.fr  #
+#   Bioinformatics and Biostatistics Hub                              research.pasteur.fr/en/team/hub-giphy  #
+#   USR 3756 IP CNRS                          research.pasteur.fr/team/bioinformatics-and-biostatistics-hub  #
+#   Dpt. Biologie Computationnelle                     research.pasteur.fr/department/computational-biology  #
+#   Institut Pasteur, Paris, FRANCE                                                     research.pasteur.fr  #
+#                                                                                                            #
+##############################################################################################################
+
+##############################################################################################################
+#                                                                                                            #
+# ============                                                                                               #
+# = VERSIONS =                                                                                               #
+# ============                                                                                               #
+#                                                                                                            #
+  VERSION=1.1.201007ac                                                                                       #
+# + estimating auN (also called E-size)                                                                      #
+#                                                                                                            #
+# VERSION=1.0.190426ac                                                                                       #
+# + options -l and -d (i.e. printing sequence lengths and length distribution) are no longer supported       #
+# + residue count always computed (option -r discarded)                                                      #
+# + ultrafast residue count (based on tr + wc)                                                               #
+# + estimating %AT, %GC, L50, L75, L90                                                                       #
+# + faster estimation of the sequence length statistics (100% awk)                                           #
+# + ability to read multiple input files                                                                     #
+#                                                                                                            #
+# VERSION=0.3.180515ac                                                                                       #
+#                                                                                                            #
+##############################################################################################################
   
-########################################################################################
-#                                                                                      #
-#  ================                                                                    #
-#  = INSTALLATION =                                                                    #
-#  ================                                                                    #
-#                                                                                      #
-#  Just give the execute permission to the script contig_info.sh with the following    #
-#  command line:                                                                       #
-#                                                                                      #
-#   chmod +x contig_info.sh                                                            #
-#                                                                                      #
-########################################################################################
-
-########################################################################################
-#                                                                                      #
-#  ================                                                                    #
-#  = MANUAL       =                                                                    #
-#  ================                                                                    #
-#                                                                                      #
-#                                                                                      #
+##############################################################################################################
+#                                                                                                            #
+#  ================                                                                                          #
+#  = INSTALLATION =                                                                                          #
+#  ================                                                                                          #
+#                                                                                                            #
+#  Just give the execute permission to the script contig_info.sh with the following command line:            #
+#                                                                                                            #
+#   chmod +x contig_info.sh                                                                                  #
+#                                                                                                            #
+##############################################################################################################
+
+#############################################################################################################
+#                                                                                                           #
+#  ================                                                                                         #
+#  = MANUAL       =                                                                                         #
+#  ================                                                                                         #
+#                                                                                                           #
+#                                                                                                           #
 if [ "$1" = "-?" ] || [ $# -lt 1 ]
 then
   cat <<EOF
 
- contig_info v.$VERSION
+ contig_info v.$VERSION         Copyright (C) 2018-2021  Institut Pasteur
 
  USAGE:  contig_info.sh  [options]  <contig_files> 
 
   where 'options' are:
 
-   -m <int>    minimum contig length;  every contig sequence of length
-               shorter than this cutoff will be discarded (default: 1)
-   -g <int>    expected  genome  size  for  computing {N,L}G{50,75,90}
-               values instead of {N,L}{50,75,90} ones, respectively
+   -m <int>    minimum contig length; every contig sequence of length shorter
+               than this cutoff will be discarded (default: 1)
+   -g <int>    expected genome size  for computing auNG  and {N,L}G{50,75,90}
+               values instead of auN and {N,L}{50,75,90} ones, respectively
    -t          tab-delimited output
 
 EOF
-  exit
-fi
-#                                                                                      #
-########################################################################################
-
-########################################################################################
-#                                                                                      #
-#  ================                                                                    #
-#  = FUNCTIONS    =                                                                    #
-#  ================                                                                    #
-#                                                                                      #
-# = randomfile() ====================================================================  #
-#   returns a random file name within /tmp/                                            #
-#                                                                                      #
+  exit                                                                                                      #
+fi                                                                                                          #
+#                                                                                                           #
+#############################################################################################################
+
+#############################################################################################################
+#                                                                                                           #
+#  ================                                                                                         #
+#  = FUNCTIONS    =                                                                                         #
+#  ================                                                                                         #
+#                                                                                                           #
+# = randomfile() =========================================================================================  #
+#   returns a random file name within /tmp/                                                                 #
+#                                                                                                           #
 randomfile() {
   rdmf=/tmp/$RANDOM; while [ -e $rdmf ]; do rdmf=/tmp/$RANDOM ; done
   echo $rdmf ;
 }
-#                                                                                      #
-########################################################################################
-
-########################################################################################
-####                                                                                ####
-#### INITIALIZING PARAMETERS AND READING OPTIONS                                    ####
-####                                                                                ####
-########################################################################################
+#                                                                                                           #
+#############################################################################################################
+
+#############################################################################################################
+####                                                                                                     ####
+#### INITIALIZING PARAMETERS AND READING OPTIONS                                                         ####
+####                                                                                                     ####
+#############################################################################################################
 MIN_CONTIG_LGT=1;
 GENOME_SIZE=0;
 TSVOUT=false;
@@ -125,14 +126,14 @@ done
 if [ $MIN_CONTIG_LGT -lt 1 ]; then echo "   the min contig length threshold must be a positive integer (option -m)" ; exit 1 ; fi
 if [ $GENOME_SIZE -lt 0 ];    then echo "   the expected genome size must be a positive integer (option -g)" ;        exit 1 ; fi
 
-########################################################################################
-####                                                                                ####
-#### CONTIG INFO                                                                    ####
-####                                                                                ####
-########################################################################################
+#############################################################################################################
+####                                                                                                     ####
+#### CONTIG INFO                                                                                         ####
+####                                                                                                     ####
+#############################################################################################################
 if $TSVOUT
 then
-  CSVCAPT="#File\tNseq\tNres\tA\tC\tG\tT\tN\t%A\t%C\t%G\t%T\t%N\t%AT\t%GC\tMin\tQ25\tMed\tQ75\tMax\tAvg\tN50\tN75\tN90\tL50\tL75\tL90";
+  CSVCAPT="#File\tNseq\tNres\tA\tC\tG\tT\tN\t%A\t%C\t%G\t%T\t%N\t%AT\t%GC\tMin\tQ25\tMed\tQ75\tMax\tAvg\tauN\tN50\tN75\tN90\tL50\tL75\tL90";
   [ $GENOME_SIZE -ne 0 ]&&CSVCAPT="$CSVCAPT\tExpSize";
   echo -e "$CSVCAPT" ;
 fi
@@ -150,12 +151,11 @@ do
   N=$(tr -cd N < $SEQS | wc -c); fN=$(bc -l <<<"scale=2;100*$N/$R" | sed 's/^\./0./');
   fGC=$(bc -l <<<"scale=2;100*($C+$G)/($A+$C+$G+$T)" | sed 's/^\./0./'); fAT=$(bc -l <<<"scale=2;100-$fGC" | sed 's/^\./0./');
   ER=$R; [ $GENOME_SIZE != 0 ] && ER=$GENOME_SIZE;
-  STATS=$(awk '{print length}' $SEQS | sort -rn | awk -v g=$ER '{l[++n]=$0}END{g50=g/2;g75=3*g/4;g90=9*g/10;i=s=n50=n75=n90=0;while(++i<=n&&n90==0){s+=l[i];n50==0&&s>=g50&&n50=l[i]+(l50=i);n75==0&&s>=g75&&n75=l[i]+(l75=i);n90==0&&s>=g90&&n90=l[i]+(l90=i)}print (n50-l50)"\t"(n75-l75)"\t"(n90-l90)"\t"l50"\t"l75"\t"l90"\t"l[1]"\t"l[int(n/4+1)]"\t"l[int(n/2+1)]"\t"l[int(3*n/4+1)]"\t"l[n]}');
-  N50=$(cut -f1 <<<"$STATS"); N75=$(cut -f2 <<<"$STATS"); N90=$(cut -f3 <<<"$STATS"); 
-  L50=$(cut -f4 <<<"$STATS"); L75=$(cut -f5 <<<"$STATS"); L90=$(cut -f6 <<<"$STATS"); 
-  MAX=$(cut -f7 <<<"$STATS"); 
-  Q75=$(cut -f8 <<<"$STATS"); Q50=$(cut -f9 <<<"$STATS"); Q25=$(cut -f10 <<<"$STATS"); 
-  MIN=$(cut -f11 <<<"$STATS");
+  STATS=$(awk '{print length}' $SEQS | sort -rn | awk -v g=$ER '{l[++n]=$0;aun+=$0*$0}END{g50=g/2;g75=3*g/4;g90=9*g/10;i=s=n50=n75=n90=0;while(++i<=n&&n90==0){s+=l[i];n50==0&&s>=g50&&n50=l[i]+(l50=i);n75==0&&s>=g75&&n75=l[i]+(l75=i);n90==0&&s>=g90&&n90=l[i]+(l90=i)}print (n50-l50)"\t"(n75-l75)"\t"(n90-l90)"\t"l50"\t"l75"\t"l90"\t"l[1]"\t"l[int(n/4+1)]"\t"l[int(n/2+1)]"\t"l[int(3*n/4+1)]"\t"l[n]"\t"int(0.5+aun/g)}');
+  N50=$(cut -f1 <<<"$STATS");  N75=$(cut -f2 <<<"$STATS");  N90=$(cut -f3 <<<"$STATS"); 
+  L50=$(cut -f4 <<<"$STATS");  L75=$(cut -f5 <<<"$STATS");  L90=$(cut -f6 <<<"$STATS"); 
+  Q75=$(cut -f8 <<<"$STATS");  Q50=$(cut -f9 <<<"$STATS");  Q25=$(cut -f10 <<<"$STATS"); 
+  MIN=$(cut -f11 <<<"$STATS"); MAX=$(cut -f7 <<<"$STATS");  AUN=$(cut -f12 <<<"$STATS");
 
   if ! $TSVOUT
   then
@@ -184,6 +184,7 @@ do
     echo "  Average                      $AVG" ;
     echo ;
     echo "Contiguity statistics:" ;
+    echo "  auN                          $AUN" ;
     echo "  N50                          $N50" ;
     echo "  N75                          $N75" ;
     echo "  N90                          $N90" ;
@@ -193,7 +194,7 @@ do
     if [ $GENOME_SIZE -ne 0 ]; then echo "  Expected genome size         $GENOME_SIZE"; fi
     echo ;
   else
-    CSVLINE="$(basename $INFILE)\t$S\t$R\t$A\t$C\t$G\t$T\t$N\t$fA%\t$fC%\t$fG%\t$fT%\t$fN%\t$fAT%\t$fGC%\t$MIN\t$Q25\t$Q50\t$Q75\t$MAX\t$AVG\t$N50\t$N75\t$N90\t$L50\t$L75\t$L90";
+    CSVLINE="$(basename $INFILE)\t$S\t$R\t$A\t$C\t$G\t$T\t$N\t$fA%\t$fC%\t$fG%\t$fT%\t$fN%\t$fAT%\t$fGC%\t$MIN\t$Q25\t$Q50\t$Q75\t$MAX\t$AVG\t$AUN\t$N50\t$N75\t$N90\t$L50\t$L75\t$L90";
     [ $GENOME_SIZE -ne 0 ]&&CSVLINE="$CSVLINE\t$ER";
     echo -e "$CSVLINE" ;
   fi