Changes

Véronique LEGRAND · 79f69e2b
--- a/home.md
+++ b/home.md
 This project contains the source code and openCL/GPU bench for the phageterm project. I will leave it to the authors (Marc and Julian) to describe it and will just talk of the openCL benchmark part.
-1.  Mapping or reads
+1.  Mapping of reads
-In the original version of phageterm, mapping reads than randomly choosing a mapping position for each read took approximatiely 80% of the execution time. It was done with the regexp python package. I studied several possibilities to reduce execution time.
+In the original version of phageterm, mapping reads than randomly choosing a mapping position for each read took approximatiely 80% of the execution time. It was done with the regexp python package. I considered several possibilities to reduce execution time. here are they.
 *  optimize regexp or use it differently
 I first thought to that by I didn't find anything relevant except compiling the regexp element which is not of any use since it changes with each read.
 *  use another python package with special text searching algorithms.
-I thought of the Knuth Morris Pratt algorithm which I found implemented in the tryalgo package.
+I thought of the Knuth Morris Pratt algorithm which I found implemented in the tryalgo package. There is also an algorithms package implementing it but it doesn't seem to be maintained. So I didn't try it (we want middle/long term solutions).
\ No newline at end of file
+* use the string.find() method (not faster than regexp according to forums).Not sure it is worth a try.
+* use GPU technology with openCL/pyOpenCL.
+Here are the results of the tests with the different implementation.
+For the tests, I used the files in the data-virome directory.
+My aim was to bench execution time and to see how openCL or another python package could improve things compared to regexp. So, I didn' bother with Pair-end. I searched for the first 20 characters (default value for seed in phageterm) of each SRR4295172_1_div6.fastq in all the sequences in Contigs_30min.fasta.
+| original regexp |  tryalgo  | openCL |
+| --------------- |:---------:| ------:|
+|                 |  >24hours | 120s   |