_wgetENAHTS_ is a command line program written in [Bash](https://www.gnu.org/software/bash/) to download gzipped FASTQ files from the [European Nucleotide Archive](https://www.ebi.ac.uk/ena/browser/home)(ENA)[ftp repository](ftp://ftp.sra.ebi.ac.uk/vol1/fastq/).
Every download is performed by the standard tool [_wget_](https://www.gnu.org/software/wget/).
Every download is performed using the standard tool [_wget_](https://www.gnu.org/software/wget/).
## Installation and execution
...
...
@@ -27,8 +27,8 @@ Execute _wgetENAHTS_ with the following command line model:
Run _wgetENAHTS_ without option to read the following documentation:
Downloads FASTQ files corresponding to the specified DRR/ERR/SRR accession(s)
Files are downloaded from the ENA ftp repository ftp.sra.ebi.ac.uk/vol1/fastq
...
...
@@ -37,11 +37,9 @@ Run _wgetENAHTS_ without option to read the following documentation:
-o <dir> output directory (default: .)
-f <file> to read accession(s) from the specified file (default: all the last
arguments)
-t <int> maximum number of concurrent download(s) (default: 2)
-t <int> number of thread(s) (default: 2)
-r <int> maximum download rate per file (in kb per seconds; default: entire
available bandwidth)
-w <int> waiting time between each successive download (in seconds; default:
same as the specified value for option -t)
-n no file download, only check (default: not set)
-h prints this help and exits
...
...
@@ -66,17 +64,18 @@ Run _wgetENAHTS_ without option to read the following documentation:
+ same as above with 9 parallel downloads and 500kb/sec download rate per file:
wgetENAHTS.sh -t 9 -r 500 -f accn.txt
```
## Notes
* The HTS read accessions should starts with DRR, ERR or SRR (specified as arguments, or via a text file using option `-f`). The output file names are identical to those available in the repository corresponding to each specified accession identifier. Every downloaded file has file extension `.fastq.gz`.
* The HTS read accessions should starts with DRR, ERR or SRR (specified as final arguments, or in a text file using option `-f`). The output file names are identical to those available in the repository corresponding to each specified accession identifier. Every downloaded file has file extension `.fastq.gz`.
* After checking the existence of a repository for each sêcified accession, a first step of (parallel) downloading is performed. Each downloaded file that seems incomplete (or missing) is downloaded a second time.
*For a given DRR/ERR/SRR accession, the existence of arepository within the ENA can be easily assessed using option `-n`.
*No download is performed when the output directory already contains files named with the specified accessions.
*After a first step of (parallel) downloading, the integrity of each gathered file is assessed. Each downloaded file that seems incomplete (or missing) are downloaded a second time.
*For a given DRR/ERR/SRR accession, the existence of a repository within the ENA can be easily assessed using option `-n` (i.e. no file download).
* Fast running times are expected when running _wgetENAHTS_ on multiple threads (option `-t`). Depending on the bandwidth, the maximum download rate per file can be restricted using option `-r`.
trap"rm -f $TMPF;echo;for a in $(echo$ACCNLIST);do if ls $OUTDIR/\$a*.fastq.gz&>/dev/null;then for f in $OUTDIR/\$a*.fastq.gz;do if ! $GZIP -t \$f&>/dev/null;then echo removing \$f;rm -f \$f;fi;done;fi;done;exit 1;" SIGINT ;
trap"echo;echo interrupting;for a in \$(echo $ACCNLIST);do rm -f $OUTDIR/\$a.weh;done;exit 1;" SIGINT ;