This project contains the source code and openCL/GPU bench for the phageterm project. I will leave it to the authors (Marc and Julian) to describe it and will just talk of the openCL benchmark part.
This project contains the source code and openCL/GPU bench for the phageterm project. I will leave it to the authors (Marc and Julian) to describe it and will just talk of the openCL benchmark part.
1. Mapping or reads
1. Mapping of reads
In the original version of phageterm, mapping reads than randomly choosing a mapping position for each read took approximatiely 80% of the execution time. It was done with the regexp python package. I studied several possibilities to reduce execution time.
In the original version of phageterm, mapping reads than randomly choosing a mapping position for each read took approximatiely 80% of the execution time. It was done with the regexp python package. I considered several possibilities to reduce execution time. here are they.
* optimize regexp or use it differently
* optimize regexp or use it differently
I first thought to that by I didn't find anything relevant except compiling the regexp element which is not of any use since it changes with each read.
I first thought to that by I didn't find anything relevant except compiling the regexp element which is not of any use since it changes with each read.
* use another python package with special text searching algorithms.
* use another python package with special text searching algorithms.
I thought of the Knuth Morris Pratt algorithm which I found implemented in the tryalgo package.
I thought of the Knuth Morris Pratt algorithm which I found implemented in the tryalgo package. There is also an algorithms package implementing it but it doesn't seem to be maintained. So I didn't try it (we want middle/long term solutions).
\ No newline at end of file
* use the string.find() method (not faster than regexp according to forums).Not sure it is worth a try.
* use GPU technology with openCL/pyOpenCL.
Here are the results of the tests with the different implementation.
For the tests, I used the files in the data-virome directory.
My aim was to bench execution time and to see how openCL or another python package could improve things compared to regexp. So, I didn' bother with Pair-end. I searched for the first 20 characters (default value for seed in phageterm) of each SRR4295172_1_div6.fastq in all the sequences in Contigs_30min.fasta.