GIPhy
AlienRemover

Repository

git clone https://gitlab.pasteur.fr/GIPhy/AlienRemover.git
javac LHBF.java AlienRemover.java
echo Main-Class: AlienRemover > MANIFEST.MF
jar -cmvf MANIFEST.MF AlienRemover.jar AlienRemover.class LHBF.class
rm MANIFEST.MF AlienRemover.class LHBF.class
java -jar AlienRemover.jar [options]
AlienRemover

 Fast removal of alien reads (contaminant, host, ...) from FASTQ file(s)

 USAGE:
   AlienRemover -a <alienfile> [-b <modelfile>]       [-o <basename>] [-k <int>]
   AlienRemover -a <alienfile>  -i <FASTQ>            [-o <basename>] [-k <int>] [-c <float>] [-p <float>] [...]
   AlienRemover -a <alienfile>  -1 <FASTQ> -2 <FASTQ> [-o <basename>] [-k <int>] [-c <float>] [-p <float>] [...]

 OPTIONS:
    -a <infile>   FASTA file containing alien sequence(s); filename should end with .gz when gzipped
    -a <infile>   input file  containing alien  k-mers generated  by AlienRemover  from  FASTA-formatted  alien
                  sequence(s); filename should end with .kmr or .kmz
    -i <infile>   [SE] FASTQ-formatted input file; filename should end with .gz when gzipped
    -1 <infile>   [PE] FASTQ-formatted R1 input file; filename should end with .gz when gzipped
    -2 <infile>   [PE] FASTQ-formatted R2 input file; filename should end with .gz when gzipped
    -o <name>     outfile basename; output files have the following extensions:
                   + alien k-mers: <name>.km<r|z>
                   + SE reads:     <name>.fastq[.gz]                        (.gz is added when using option -z)
                   + PE reads:     <name>.1.fastq[.gz] <name>.2.fastq[.gz]  (.gz is added when using option -z)
    -k [10-31]    k-mer length for alien sequence occurence searching; must lie between 10 and 31 (default: 25)
    -p <float>    Bloom filter false positive probability cutoff (default: 0.05)
    -n <integer>  expected number of canonical k-mers (default: estimated from the alien file size)
    -l            use less bits and more hashing functions, whenever possible (default: not set)
    -c <float>    criterion to remove a read (default: 0.15)
    -s            compute Bloom filter statistics (default: not set)
    -w            write Bloom filter into output file (default: not set)
    -r            write removed reads into output file(s) (default: not set)
    -z            gzipped output files (default: not set)

 EXAMPLES:
   AlienRemover  -a alien.fasta                   -o alien      -k 30
   AlienRemover  -a alien.kmr   -i reads.fastq    -o flt_reads  --p64   -z
   AlienRemover  -a alien.kmr   -1 r1.fq -2 r2.fq               -c 0.3  -r