25.03

f4eee4b5 · Alexis CRISCUOLO · f6ab69a6 · f4eee4b5 · f4eee4b5
Commit f4eee4b5 authored 4 months ago by Alexis CRISCUOLO
--- a/README.md
+++ b/README.md
@@ -44,7 +44,7 @@ You will need to install the required programs listed in the following table, or
 | _fqconvert_ <br> _fqduplicate_ <br> _fqextract_ <br> _fqstats_                                                          | [fqtools](http://ftp.pasteur.fr/pub/gensoft/projects/fqtools/)   | &ge; 1.1a  | [ftp.pasteur.fr/pub/gensoft/projects/fqtools](http://ftp.pasteur.fr/pub/gensoft/projects/fqtools/) |
 | [_Musket_](http://musket.sourceforge.net/homepage.htm)<sup>&nbsp;&#x2726;</sup>                                         | -                                                                | &ge; 1.1   | [sourceforge.net/projects/musket](https://sourceforge.net/projects/musket/)                        |
 | [_ntCard_](https://www.bcgsc.ca/resources/software/ntcard)                                                              | -                                                                | > 1.2      | [github.com/bcgsc/ntCard](https://github.com/bcgsc/ntCard)                                         |
-| [_ROCK_](https://research.pasteur.fr/en/software/rock)                                                                  | -                                                                | &ge; 1.9.3 | [gitlab.pasteur.fr/vlegrand/ROCK](https://gitlab.pasteur.fr/vlegrand/ROCK)                         |
+| [_ROCK_](https://research.pasteur.fr/en/software/rock)                                                                  | -                                                                | &ge; 2.1   | [gitlab.pasteur.fr/vlegrand/ROCK](https://gitlab.pasteur.fr/vlegrand/ROCK)                         |

 </div>

@@ -125,7 +125,7 @@ Run _fqCleanER_ without option to read the following documentation:
  -b <string>   base name for output files (mandatory option)
  -a <infile>   to set a file containing every alien oligonucleotide sequence (one per line) to
                be clipped during step 'T' (see below)
-  -a <string>   one or several key words  (separated with commas),  each corresponding to a set 
+  -a <string>   one or several key words  (separated with commas),  each corresponding to a set
                of alien oligonucleotide sequences to be clipped during step 'T' (see below):
                   POLY                nucleotide homopolymers
                   NEXTERA             Illumina Nextera index Kits
@@ -138,22 +138,22 @@ Run _fqCleanER_ without option to read the following documentation:
                   TRUSEQ_SMALLRNA     Illumina TruSeq Small RNA Kits
                Note that  these sets  of alien  sequences are  not  exhaustive  and will never
                replace the exact oligos used for library preparation  (default: "POLY")
-  -a AUTO       to perform  de novo  inference of  3' alien  oligonucleotide sequence(s)  of at 
-                least 20 nucleotide length;  selected sequences  are completed  with those from 
-                "POLY" (see above)                
-  -A <infile>   to set sequence or k-mer  model file(s)  to carry out  contaminant read removal 
-                during step 'C';  several comma-separated file names can be specified;  allowed 
+  -a AUTO       to perform  de novo  inference of  3' alien  oligonucleotide sequence(s)  of at
+                least 20 nucleotide length;  selected sequences  are completed  with those from
+                "POLY" (see above)
+  -A <infile>   to set sequence or k-mer  model file(s)  to carry out  contaminant read removal
+                during step 'C';  several comma-separated file names can be specified;  allowed
                file extensions: .fa, .fasta, .fna, .kmr or .kmz (default: phiX174 genome)
  -d <string>   displays the alien oligonucleotide sequences corresponding to the specified key
                word(s); see option -a for the list of available key words
-  -q <int>      quality score threshold;  all bases with Phred  score below  this threshold are 
+  -q <int>      quality score threshold;  all bases with Phred  score below  this threshold are
                considered as non-confident (default: 15)
  -l <int>      minimum required length for a read (default: half the average read length)
-  -p <int>      maximum allowed percentage  of non-confident bases  (as ruled by option -q) per 
+  -p <int>      maximum allowed percentage  of non-confident bases  (as ruled by option -q) per
                read (default: 50) 
  -c <int>      minimum allowed coverage depth for step 'L' or 'N' (default: 4)
  -C <int>      maximum allowed coverage depth for step 'R' or 'N' (default: 90)
-  -s <string>   a sequence of tasks  to be iteratively performed,  each being defined by one of 
+  -s <string>   a sequence of tasks  to be iteratively performed,  each being defined by one of
                the following uppercase characters:
                   C   discarding [C]ontaminating reads (as ruled by option -A)
                   E   correcting sequencing [E]rrors
@@ -199,7 +199,7 @@ Run _fqCleanER_ without option to read the following documentation:
  <span style="color:navy; font-size:1.1em;">**[T]**</span> &nbsp; Trimming and clipping (`-s T`) are performed using [_AlienTrimmer_](https://research.pasteur.fr/en/software/alientrimmer/) (Criscuolo and Brisse 2013). Clipping is carried out based on the specified alien oligonucleotides (option `-a`), where alien oligonucleotide sequences can be (i) set using precomputed standard library names, (ii) specified via user-defined FASTA-formatted file, or (iii) directly estimated from the input files using [_AlienDiscover_](https://gitlab.pasteur.fr/GIPhy/AlienDiscover) (option `-a AUTO`). When step T is run without setting option `-a`, clipping is carried out with the four homopolymers (`POLY`) as alien oligonucleotides. Trimming is carried out by deleting 5' and 3' regions containing many non-confident bases, where a base is considered as non-confident when its Phred score is lower than a Phred score threshold (set using option `-q`; default: 15). After trimming/clipping an HTS read, it can be discarded when the number of remaining bases is lower than a specified length threshold (option `-l`; default: half the average read length) or when the percentage of remaining non-confident bases is higher than another specified threshold (option `-p`; default: 50%). Note that when HTS read discarding breaks PE, singletons are written into dedicated output files ( _.S.fastq_ file extension).


-* Each predefined set of alien oligonucleotide sequences can be displayed using option `-d`. Some sets of alien oligonucleotide sequences are derived from _'Illumina Adapter Sequences'_  [Document # 1000000002694 v16](https://emea.support.illumina.com/downloads/illumina-adapter-sequences-document-1000000002694.html), i.e. options `-a NEXTERA` (_Nextera DNA Indexes_), `-a  IUDI` (_IDT for Illumina UD Indexes_), `-a AMPLISEQ` (_AmpliSeq for Illumina Panels_), `-a TRUSIGHT_PANCANCER` (_TruSight RNA Pan-Cancer Panel_), `-a TRUSEQ_UD` (_IDT for Illumina-TruSeq DNA and RNA UD Indexes_), `-a TRUSEQ_CD` (_TruSeq DNA and RNA CD Indexes_), `-a TRUSEQ_SINGLE` (_TruSeq Single Indexes_), and `-a TRUSEQ_SMALLRNA` (_TruSeq Small RNA_). <br> <sup><sub>**[Oligonucleotide sequences © 2021 Illumina, Inc. All rights reserved. Derivative works created by Illumina customers are authorized for use with Illumina instruments and products only. All other uses are strictly prohibited.]**</sub></sup>
+* Each predefined set of alien oligonucleotide sequences can be displayed using option `-d`. Some sets of alien oligonucleotide sequences are derived from _'Illumina Adapter Sequences'_  [Document # 1000000002694 v20](https://support-docs.illumina.com/SHARE/AdapterSequences/Content/SHARE/FrontPages/AdapterSeq.htm), i.e. options `-a NEXTERA` (_Nextera DNA Indexes_), `-a  IUDI` (_IDT for Illumina UD Indexes_), `-a AMPLISEQ` (_AmpliSeq for Illumina Panels_), `-a TRUSIGHT_PANCANCER` (_TruSight RNA Pan-Cancer Panel_), `-a TRUSEQ_UD` (_IDT for Illumina-TruSeq DNA and RNA UD Indexes_), `-a TRUSEQ_CD` (_TruSeq DNA and RNA CD Indexes_), `-a TRUSEQ_SINGLE` (_TruSeq Single Indexes_), and `-a TRUSEQ_SMALLRNA` (_TruSeq Small RNA_). <br> <sup><sub>**[Oligonucleotide sequences © 2021-2025 Illumina, Inc. All rights reserved. Derivative works created by Illumina customers are authorized for use with Illumina instruments and products only. All other uses are strictly prohibited.]**</sub></sup>

 ## References


--- a/fqCleanER.sh
+++ b/fqCleanER.sh