Update home authored by Véronique  LEGRAND's avatar Véronique LEGRAND
This project contains the source code and openCL/GPU bench for the phageterm project. I will leave it to the authors (Marc and Julian) to describe it and will just talk of the openCL benchmark part. This project contains the source code and openCL/GPU bench for the phageterm project. I will leave it to the authors (Marc and Julian) to describe phageterm and will just talk of the openCL benchmark part.
1. Mapping of reads 1. Mapping of reads
In the original version of phageterm, mapping reads than randomly choosing a mapping position for each read took approximatiely 80% of the execution time. It was done with the regexp python package. I considered several possibilities to reduce execution time. here are they. In the original version of phageterm, mapping reads than randomly choosing a mapping position for each read took approximatiely 80% of the execution time. It was done with the regexp python package. I considered several possibilities to reduce execution time. here are they.
* optimize regexp or use it differently * optimize regexp or use it differently
I first thought to that by I didn't find anything relevant except compiling the regexp element which is not of any use since it changes with each read. I first thought to that by I didn't find anything relevant in the python litterature nor on google.
* use another python package with special text searching algorithms. * use another python package with special text searching algorithms.
I thought of the Knuth Morris Pratt algorithm which I found implemented in the tryalgo package. There is also an algorithms package implementing it but it doesn't seem to be maintained. So I didn't try it (we want middle/long term solutions). I thought of the Knuth Morris Pratt algorithm which I found implemented in the tryalgo package. There is also an algorithms package implementing it but it doesn't seem to be maintained. So I didn't try it (we want middle/long term solutions).
...@@ -15,9 +15,9 @@ I thought of the Knuth Morris Pratt algorithm which I found implemented in the t ...@@ -15,9 +15,9 @@ I thought of the Knuth Morris Pratt algorithm which I found implemented in the t
Here are the results of the tests with the different implementation. Here are the results of the tests with the different implementation.
For the tests, I used the files in the data-virome directory. For the tests, I used the files in the data-virome directory.
My aim was to bench execution time and to see how openCL or another python package could improve things compared to regexp. So, I didn' bother with Pair-end. I searched for the first 20 characters (default value for seed in phageterm) of each SRR4295172_1_div6.fastq in all the sequences in Contigs_30min.fasta. My aim was to bench execution time and to see how openCL or another python package could improve things compared to regexp. So, I didnt' bother with Pair-end. I searched for the first 20 characters (default value for seed in phageterm) of each SRR4295172_1_div6.fastq in all the sequences in Contigs_30min.fasta.
Tests ran on myriad-n403 or on my machine (openCL part). Tests ran on myriad-n403 or on my machine (openCL part).
| original regexp | tryalgo | openCL | | original regexp | tryalgo | openCL |
| --------------- |:---------:| ------:| | --------------- |:---------:| ------:|
| 17min10s | >24hours | 30s | | 17min10s | >16 hours | 64s |